var str = '<div part="1">
<div>
...
<p class="so">text</p>
...
</div>
</div><span></span>';
I got a long string stored in var str, I need to extract the the strings inside div part="1". Can you help me please?
you could create a DOM element and set its innerHTML to your string.
Then you can iterate through the childNodes and read the attributes you want ;)
example
var str = "<your><html>";
var node = document.createElement("div");
node.innerHTML = str;
for(var i = 0; i < node.childNodes.length; i++){
console.log(node.childNodes[i].getAttribute("part"));
}
If you're using a library like JQuery, this is trivially easy without having to go through the horrors of parsing HTML with regex.
Simply load the string into a JQuery object; then you'll be able to query it using selectors. It's as simple as this:
var so = $(str).find('.so');
to get the class='so' elememnt.
If you want to get all the text in part='1', then it would be this:
var part1 = $(str).find('[part=1]').text();
Similar results can be achieved with Prototype library, or others. Without any library, you can still do the same thing using the DOM, but it'll be much harder work.
Just to clarify why it's a bad idea to do this sort of thing in regex:
Yes, it can be done. It is possible to scan a block of HTML code with regex and find things within the string.
However, the issue is that HTML is too variable -- it is defined as a non-regular language (bear in mind that the 'reg' in 'regex' is for 'regular').
If you know that your HTML structure is always going to look the same, it's relatively easy. However if it's ever going to be possible that the incoming HTML might contain elements or attributes other than the exact ones you're expecting, suddenly writing the regex becomes extremely difficult, because regex is designed for searching in predictable strings. When you factor in the possibility of being given invalid HTML code to parse, the difficulty factor increases even more.
With a lot of effort and good understanding of the more esoteric parts of regex, it can be done, with a reasonable degree of reliability. But it's never going to be perfect -- there's always going to be the possibility of your regex not working if it's fed with something it doesn't expect.
By contrast, parsing it with the DOM is much much simpler -- as demonstrated, with the right libraries, it can be a single line of code (and very easy to read, unlike the horrific regex you'd need to write). It'll also be much more efficient to run, and gives you the ability to do other search operations on the same chunk of HTML, without having to re-parse it all again.
Related
Found some things like in Visual Basic but not Javascript and exactly what I'm trying to do. It's a tad bit different. I'm trying to figured out how to rearrange characters in a string, it's in a for loop as well in order to cut the string in half. Now I need to rearrange that.
First I have:
12345678910111213141516
then in the for loop
12345678
I'm trying fix it so now I get
72648531
But I have to do it in a way so people can't read the code and know that there's 8 characters at this point in the string without hard work and trouble. My for loop is also jumbled up and screwy so it can't be figured out. Something like this. I really cannot post the code though.
var con = "";
for (var i = complex math that equals 0; i < complex math to equal 8; i++) {
var newStr = word[i]; // I need it to come out to the rearranged somewhere close by
var con = con+""+newStr;
}
Two commonly used techniques come to mind:
A common approach to things is by doing some XOR calculations: Look at this unrelated examples:
http://www.javascriptsource.com/passwords/xor-encryption4.html
Extracting information from page with Jsoup
You can use tools like http://www.javascriptobfuscator.com/Default.aspx to make it harder for people to figure your code.
Pretty simple question that I couldn't find an answer to, maybe because it's a non-issue, but I'm wondering if there is a difference between creating an HTML object using Javascript or using a string to build an element. Like, is it a better practice to declare any HTML elements in JS as JS objects or as strings and let the browser/library/etc parse them? For example:
jQuery('<div />', {'class': 'example'});
vs
jQuery('<div class="example></div>');
(Just using jQuery as an example, but same question applies for vanilla JS as well.)
It seems like a non-issue to me but I'm no JS expert, and I want to make sure I'm doing it right. Thanks in advance!
They're both "correct". And both are useful at different times for different purposes.
For instance, in terms of page-speed, these days it's faster to just do something like:
document.body.innerHTML = "<header>....big string o' html text</footer>";
The browser will spit it out in an instant.
As a matter of safety, when dealing with user-input, it's safer to build elements, attach them to a documentFragment and then append them to the DOM (or replace a DOM node with your new version, or whatever).
Consider:
var userPost = "My name is Bob.<script src=\"//bad-place.com/awful-things.js\"></script>",
paragraph = "<p>" + userPost + "</p>";
commentList.innerHTML += paragraph;
Versus:
var userPost = "My name is Bob.<script src=\"//bad-place.com/awful-things.js\"></script>",
paragraph = document.createElement("p");
paragraph.appendChild( document.createTextNode(userPost) );
commentList.appendChild(paragraph);
One does bad things and one doesn't.
Of course, you don't have to create textNodes, you could use innerText or textContent or whatever (the browser will create the text node on its own).
But it's always important to consider what you're sharing and how.
If it's coming from anywhere other than a place you trust (which should be approximately nowhere, unless you're serving static pages, in which case, why are you building html?), then you should keep injection in mind -- only the things you WANT to be injected should be.
Either can be preferable depending on your particular scenario—ie, if everything is hard-coded, option 2 is probably better, as #camus said.
One limitation with the first option though, is that this
$("<div data-foo='X' />", { 'class': 'example' });
will not work. That overload expects a naked tag as the first parameter with no attributes at all.
This was reported here
1/ is better if your attribubes depends on variables set before calling the $ function , dont have to concatenate strings and variables. Aside from that fact ,since you can do both , and it's just some js code somebody else wrote , not a C++ DOM API hardcoded in the browser...
I'm trying to replace every image tag in a block of text with a unique string. So far I've tried to get the index of the beginning and end of a tag, create a substring, and then replace the substring. The problem is that I cannot do this an infinite number of times (the text block itself can be long with an n number of image tags).
Here is my code so far:
var txtBlock = currBlock.getElementsByClassName("txtContent")[0];
var imgStartPoint = txtBlock.indexOf("<img ");
var imgEndPoint = txtBlock.indexOf(" />");
var imgstring = txtBlock.substring(imgStartPoint, imgEndPoint);
How can I repeat this process n number of times?
The best way to approach this problem, and most programming problems in general, is to think about what you need to do and write out the steps that you need to perform in order to solve your problem in plain English.
To get you started, you should probably think about the following:
How many times does the code need to execute? How do you determine this?
How does the algorithm know that it is done? Can you think of a couple ways to achieve this?
Once you have a decent logical plan, the code will be much easier to write.
In general, break the problem down to smaller tasks and you should be able to tackle almost any programming problem, regardless of language, etc.
Let me know if you need further help.
It seems that you get your data from a DOM. So you can make yourself familiarly with the DOM operations and replace all image nodes with text nodes.
Helpful methodes:
DOM Document getElementsByTagName Method -
http://w3schools.com/jsref/met_document_getelementsbytagname.asp
DOM Node replaceChild Method -
http://w3schools.com/jsref/met_node_replacechild.asp
DOM Document createTextNode Method -
http://w3schools.com/jsref/met_document_createtextnode.asp
In a tutorial for building a CSS selector engine in JavaScript (visible for Tuts+ members here) the author uses the following code to remove everything in a string before the hash character:
// sel = "div#main li"
if (sel.indexOf("#") > 0) {
sel = sel.split("#");
sel = "#" + sel[sel.length -1];
}
While I'm a JavaScript beginner, I'm not a beginner programmer. And this seem such a overwhelming operation, like killing an ant with a cannon. I'd use something like:
sel.substr(sel.indexOf("#"));
Maybe even not enclosed with the if statement which already uses indexof(). So, as the author even wrote a book on JavaScript, there must be some secret that I'm not aware of: are there any advantages of using the former code? On performance maybe?
There's usually a wide variation of performance between different implementations, so testing would be needed. But if performance is really a consideration, I would bet that .split() is slower.
"Maybe even not enclosed with the if statement..."
But I would say that you should't have it inline as you do. The .indexOf() will return -1 if no match is found, which will cause .substr to give you the last character of the string.
var sel = 'tester';
sel.substr(sel.indexOf("#")); // "r"
So keep the if statement...
var sel = 'tester',
idx = sel.indexOf("#"),
sub;
if( idx !== -1 ) {
sub = sel.substr("#");
}
I'm not sure what the tutorial is trying to do, but sel="div#main li#first" is valid CSS and their code would return #first and sel.substr(sel.indexOf("#")); would return #main li#first. I'm guessing, but that could work in a loop where you work backwards through the CSS selectors.
Real life CSS selector engines use regular expressions for everything and this seems to be the best way. The language provides us with a dedicated powerful tool for string manipulations, so why not to use it. In your case:
sub = sel.replace(/^.+?#/, "#")
does the job fast and without extra clutter.
Performance? In javascript we usually don't care much, because our applications are not time-critical. Nobody cares if it takes 0.1 or 0.01 sec to validate a form or to fade in a div.
Is running something like:
document.body.innerHTML = document.body.innerHTML.replace('old value', 'new value')
dangerous?
I'm worried that maybe some browsers might screw up the whole page, and since this is JS code that will be placed on sites out of my control, who might get visited by who knows what browsers I'm a little worried.
My goal is only to look for an occurrence of a string in the whole body and replace it.
Definitely potentially dangerous - particularly if your HTML code is complex, or if it's someone else's HTML code (i.e. its a CMS or your creating reusable javascript). Also, it will destroy any eventlisteners you have set on elements on the page.
Find the text-node with XPath, and then do a replace on it directly.
Something like this (not tested at all):
var i=0, ii, matches=xpath('//*[contains(text(),"old value")]/text()');
ii=matches.snapshotLength||matches.length;
for(;i<ii;++i){
var el=matches.snapshotItem(i)||matches[i];
el.wholeText.replace('old value','new value');
}
Where xpath() is a custom cross-browser xpath function along the lines of:
function xpath(str){
if(document.evaluate){
return document.evaluate(str,document,null,6,null);
}else{
return document.selectNodes(str);
}
}
I agree with lucideer, you should find the node containing the text you're looking for, and then do a replace. JS frameworks make this very easy. jQuery for example has the powerful :contains('your text') selector
http://api.jquery.com/contains-selector/
If you want rock solid solution, you should iterate over DOM and find value to replace that way.
However, if 'old value' is a long string that never could be mixed up with tag, attribute or attbibute value you are relatively safe by just doing replace.