Modify HTML source on-the-fly with xPath using Javascript - javascript

Let say I'm having a HTML source, something like :
Google
Yahoo
MSN
Is there any way for me to modify this HTML source with xPath using Javascript : find all anchors, prepend text to them and show the new HTML source using an alert box?
Visit Google
Visit Yahoo
Visit MSN

If you need full power of XSLT for making several transformations, you could use something like sarissa.
I think you might be confusing xPath expressions with CSS selectors, so for that case I would recommend to use the following jQuery code:
// Put a script tag including jquery.js here
<div id="container">
Google
Yahoo
MSN
</div>
<script>
$("a").prepend("Visit ");
alert($("#container").html());
</script>
Regards.

The short answer is "no".
XPath is a method for selecting elements in a DOM. It can also be used to read attributes and calculate values, but it can't be used to modify the DOM. You might be getting confused with XSLT, which uses XPath expressions to select elements and can return a modified document. You could use a generic XML document, then use different XSL style sheets using XSLT to generate different documents in various languages, say HTML, XML, postscript, and so on.
In any case, why would you bother with XPath in this case? There is a document.links collection that requires simple property access, no function calls or evaluating XPath expressoins. You can change simple text content by assigning to the W3C textContent or proprietary MS innerText property (again, simple property access rather than function calls):
function modLinks() {
var links = document.links;
var i = links.length;
while (i--) {
setText(links[i], 'Visit ' + getText(links[i]) );
}
}
// Simple helper functions, can be made faster and more robust
// but sufficient for an example.
function getText(el) {
if (typeof el.textContent == 'string') {
return el.textContent;
} else if (typeof el.innerText == 'string') {
return el.innerText;
}
}
function setText(el, text) {
if (typeof el.textContent == 'string') {
el.textContent = text;
} else if (typeof el.innerText == 'string') {
el.innerText = text;
}
}

As mentioned above, XPath doesn't work effectively with unparsed strings.
So one approach would be to set the innerHTML of some element (e.g. an invisible ) to your HTML source string.
This would cause the source to be parsed into a DOM tree. Then you could use myDiv.getElementsByTagName('a') or jQuery $('a', myDiv) to find the links. (You could even use XPath .//a, but why use a more complex tool when a simpler one will do?)
Then once you've modified the strings, e.g. as somebody said using jQuery $('a', myDiv).prepend("Visit "); you could output the modified HTML by retrieving the innerHTML property of the invisible div.

Related

wrap HTML tags in plain string with another HTML tag

I want to wrap a HTML tag with another HTML tag in a string (so not a DOM element, a plain string). I created this function but I wonder if I could do it in one go without a forEach loop.
This is the working function:
function style(content) {
var tempStyledContent = content;
var imgMatches = tempStyledContent.match(/(<img.*?src=[\"'](.+?)[\"'].*?>)/g);
imgMatches.forEach(function (imgMatch) {
var imgTag = imgMatch;
var imgSrc = imgMatch.match(/src\s*=\s*"(.+?)"/)[1];
tempStyledContent = tempStyledContent.replace(imgTag,
"<a href=\"" + imgSrc + "\" data-fancybox>" + imgTag + "</a>");
});
return tempStyledContent;
}
The parameter content is a string with HTML code in it. The function above outputs the same html as the input but with the (fancybox) a tags surrounding all the child img tags.
So an input string like
"<div><img src='example.jpg'/></div>"
will output
"<div><a href='example.jpg' data-fancybox><img src='example.jpg'/></a></div>"
Can anyone improve this? I know too little about regex's to make this better.
Manipulating HTML with regex is notoriously problematic. Changes that would be trivial in a DOM parser can be very difficult to create a robust regex for; and when regex fails, it fails silently, which makes errors easy to miss. When working in regex you also have to be careful to handle all possible variations in markup such as whitespace, attribute order, quoting style, tag closing style, attribute contents that resemble html but which you don't want modified, etc.
As discussed exhaustively in the comment thread below, given enough time and effort it's certainly possible to handle all of these things in regex; but it leads to a complex, difficult to maintain regex -- and most importantly it's difficult to be certain your regex accommodates every possible valid markup variation. DOM parsing handles all of this stuff automatically, and lets you work with the structured data directly instead of having to cope with all the possible variations in its string representation.
Therefore, if you need to make nontrivial changes to an HTML string, it's almost always best to convert your HTML into a true DOM tree, manipulate that using standard DOM methods, then (if necessary) convert it back into a string. Fortunately it doesn't take a lot of code to do so. Here's a simple vanilla JS demo:
var htmlToElement = function(html) {
var template = document.createElement('template');
template.innerHTML = html.trim();
return template.content.firstChild;
};
var elementToHtml = function(el) {
return el.outerHTML;
}
// Usage demo:
var string = "<div>This <b>is some</b> <i>html</i><img src='http://example.com'></div>";
var foo = htmlToElement(string);
// perform your DOM manipulation as needed on foo here. This would look much simpler if I wasn't so stubborn about avoiding jQuery these days, but here we are anyway:
foo.querySelectorAll('img').forEach(function(img) {
var link = document.createElement('a');
link.setAttribute('data-fancybox',true);
link.setAttribute('href', img.getAttribute('src'));
img.parentNode.insertBefore(link,img);
link.appendChild(img);
});
// back to a string:
var bar = elementToHtml(foo);
console.log(bar);
Ok, I'm probably going to do DOM manipulation as #DanielBeck suggested. Once knouckout finished binding I will use $.wrap http://api.jquery.com/wrap/ to do my manipulation. I just hoped there was an easy way without using jquery, so if there are other suggestions please comment them.

Javascript: use xpath in jQuery

I have, for example, the next XPath query:
//div[span="something"]/parent::div/child::div[#class=\"someClass\"]
I want to use this XPath query in JavaScript:
return $("a:contains('Fruits')").mouseover();
I tried this:
return $("div[span=\"something\"]/parent::div/child::div[#class=\"someClass\"]").mouseover();
But it didn't work. Is there another semantic for XPath queries in order to use them in JavaScript?
You could add the results of an existing XPath evaluation to a jQuery selection, I threw together this jquery extension that seems does it all for you.
Example usage:
$(document).xpathEvaluate('//body/div').remove()
Here's the add-in.
$.fn.xpathEvaluate = function (xpathExpression) {
// NOTE: vars not declared local for debug purposes
$this = this.first(); // Don't make me deal with multiples before coffee
// Evaluate xpath and retrieve matching nodes
xpathResult = this[0].evaluate(xpathExpression, this[0], null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
result = [];
while (elem = xpathResult.iterateNext()) {
result.push(elem);
}
$result = jQuery([]).pushStack( result );
return $result;
}
You can re-write your xpath queries as CSS selectors:
$('div:has(> div > span:contains(something)) > div.someClass');
You can achieve the same effect as parent:: using the :has pseduo selector to select an element based on its children: div.foo:has(> div.bar) will select all div elements with class foo that have a child div with class bar. This is equivalent to div[#class="bar"]/parent::div[#class="foo"].
See:
jQuery API: Selectors
Sizzle documentation
You could probably approach this in several other ways using various combinations jQuery's DOM traversal methods. For example, this would be a very direct translation of your xpath query:
$('div:has(> span:contains(something))') // //div[span="something"]
.parent('div') // /parent::div
.children('div.someClass'); // /child::div[#class="someClass"]
It's worth noting that div.someClass in CSS isn't the exact equivalent of div[#class="someClass"] in xpath. The CSS will match <div class='foo someClass bar'>, but the xpath won't. See Brian Suda's article on parsing microformats with XSLT for more detail.
As the co-author of Wicked Good XPath, I certainly recommend it for cross browser XPath support (on HTML documents, you can try using it with XML documents but the support is incomplete).
We welcome any sort of correctness test / performance benchmark on our library. During development, the library has been tested on IE 7 through 10 plus the Android 2.2 browser which doesn't have native XPath support.
if you want to select an element inside an iframe, from parent window, you should change second parameter of evaulate() function to iframe's document element, like :
var iFrameDocument = $('iframe#myPage').get(0).contentWindow.document;
xpathResult = this[0].evaluate(xpathExpression, iFrameDocument, null, XPathResult.ORDERED_NODE_ITERATOR_TYPE, null);
There is no Cross-browser implementation as far as I know. There is a xpath plugin for jQuery which says is still in developement.
Other than that there is a Google-authored pure JavaScript implementation of the DOM Level 3 XPath specification called wicked-good-xpath which is good.
I'm not sure about the parent::div clause, but without it it should look like this:
$('div[span="something"] div.someClass');
read from here about the evaluate method :
https://developer.mozilla.org/en-US/docs/Introduction_to_using_XPath_in_JavaScript
var xpathResult = document.evaluate( xpathExpression, contextNode, namespaceResolver, resultType, result );
jQuery only has limited support for XPath. You can see what it does support here:
http://docs.jquery.com/DOM/Traversing/Selectors#XPath_Selectors
As mentioned by #Ameoo you can use the evaluate method, which is available in most modern browsers - except, predictably, IE:
jquery select element by xpath

DOMstring parser

I have a DOMstring object, text of some web page which I get from server using XMLHttpRequest. I need to cut a substring from it, which lies between some specific tags. Is there any easy way to do this? Such methods as substring() or slice() won't work in my case, because content of the web page is dynamic, so I can't specify the beginning and the end of substring (I only know that it's surrounded by <tag> and </tag>).
yourString.subtring(yourString.indexOf('<tag>') + 5, yourString.indexOf('</tag>'));
This should work, assuming you know the name of the surrounding tags.
A DOMString is just implemented as a string in most (all?) JavaScript browser environments so you can use any parsing technique you like, including regular expressions, DOMParser, and the HTML parser provided by libraries such as jQuery. For example:
function extractText(domString) {
var m = (''+domString).match(/<tag>(.*?)<\/tag>/i);
return (m) ? m[0] : null;
}
Of course, this is a terrible idea; you should really use a DOM parser, for example, with jQuery:
$('tag', htmlString).html();
[Edit] To clarify the above jQuery example, it's the equivalent of doing something like below:
function extractText2(tagName, htmlString) {
var div = document.createElement('div'); // Build a DOM element.
div.innerHTML = htmlString; // Set its contents to the HTML string.
var el = div.getElementsByTagName(tagName) // Find the target tag.
return (el.length > 0) ? el[0].textContent : null; // Return its contents.
}
extractText2('tag', '<tag>Foo</tag>'); // => "Foo"
extractText2('x', '<x><y>Bar</y></x>'); // => "Bar"
extractText2('y', '<x><y>Bar</y></x>'); // => "Bar"
This solution is better than a regex solution since it will handle any HTML syntax nuances on which the regex solution would fail. Of course, it likely needs some cross-browser testing, hence the recommendation to a library like jQuery (or Prototype, ExtJS, etc).
Assuming the surrounding tag is unique in the string...
domString.match(/.*<tag>(.*)<\/tag>.*/)[0]
or
/.*<tag>(.*)<\/tag>.*/.exec(domString)[0]
Seems like it should do the trick
As #Gus but improved, if you only have text and the tags are repited:
"<tag>asd</tag>".match(/<tag>[^<]+<\/tag>/);

Transform URL into a link unless there already was a link

I know this has been talked here, but no solutions were offer to the exact problem. Please, take a look...
I'm using a function to transform plain-text URLs into clickable links. This is what I have:
<script type='text/javascript' language='javascript'>
window.onload = autolink;
function autolink(text) {
var exp = /(\b(https?|ftp):\/\/[-A-Z0-9+&##\/%?=~_|!:,.;]*[-A-Z0-9+&##\/%=~_|])/gim;
document.body.innerHTML = document.body.innerHTML.replace(exp,"<a href='$1'>$1</a>");
}
</script>
This makes
https://stackoverflow.com/
Looks like:
https://stackoverflow.com/
It works, but also replace the existent HTML links with nested links.
So, a valid HTML link like
StackOverflow
Becomes something messy like:
StackOverflow">StackOverflow</a>...
How can I fix the expression to ignore the content of link tags? Thanks!
I'm a newbie... I barely understand the regex code. Please be gentle :) Thanks again.
Using the jQuery JavaScript library, this would look like (demo at http://jsfiddle.net/BRPRH/4):
function autolink() {
var exp = /(\b(https?|ftp):\/\/[-A-Z0-9+\u0026##\/%?=~_|!:,.;]*[-A-Z0-9+\u0026##\/%=~_|])/gi,
lt = '\u003c',
gt = '\u003e';
$('*:not(a, script, style, textarea)').contents().each(function() {
if (this.nodeType == Node.TEXT_NODE) {
var textNode = $(this);
var span = $(lt + 'span/' + gt).text(this.nodeValue);
span.html(span.html().replace(exp, lt + 'a href=\'$1\'' + gt + '$1' + lt + '/a' + gt));
textNode.replaceWith(span);
}
});
}
$(autolink);
Edit: Excluded textareas, scripts, and embedded CSS. I note that this can also be done using pure DOM's splitText, which has the advantage of not adding extra span elements.
Edit 2: Eliminated all ampersands and double quotes.
Edit 3: Got rid of < and > characters as well.
This problem is beyond the power of regular expressions. You might be able to write a regex that could avoid some links, but you wouldn't be able to avoid every existing link.
The good news is that a different approach will make the job much easier. Right now you using document.body.innerHTML to manipulate the HTML as plain text. To do it correctly that way, you will basically need to parse the HTML yourself. But you don't have to, because the browser has already parsed it for you!
The web browser allows you to access an HTML document as a series of object. It's called the Document Object Model (DOM) and if you do some reading on that, you should be able to learn how to traverse through the HTML, skipping over anything inside an A element, and using the regex you have on plain text only.

Read out JavaScript onClick function body and use it for another onClick

I try to copy an onClick function from an image to an span object.
But I don't get it.
I have tried it directly with onClick=img.onClick, onClick=new Function(img.onClick) and more.
if (img.onclick != undefined)
{
var imgClick = img.onclick;
strNewHTML = strNewHTML + " onMouseOver=\"this.style.background"
+ "= '#cecece'\" onMouseOut=\"this.style.background = ''\" onclick=\""+imgClick+"\">";
}
Can anyone help me?
Thanks!
span.onclick= img.onclick;
JavaScript is case-sensitive and DOM event handler properties are all lower-case.
edit:
if (img.onclick != undefined) {
strNewHTML = strNewHTML + " onMouseOver=\"this.style.background"
+ "= '#cecece'\" onMouseOut=\"this.style.background = ''\" onclick=\""+imgClick+"\">";
}
Well that's a completely different thing. You're creating an HTML string. But the onclick DOM property contains a function object. function objects can't be added into strings. (They would get converted to what you get if you call somefunction.toString(), which is not something that will work as an event handler.)
If you wanted to fetch the textual value of the onclick attribute to add into HTML, you'd have to do it with the span.getAttribute('onclick') method. But that won't work in IE due to bugs in its implementation of getAttribute, so you'd have to resort to span.getAttributeNode('onclick').value. And then when you added it into the HTML string, you'd have to HTML-escape it, so that any <, & and " characters in it came out as < etc., otherwise they'll break the markup.
However, this is really ugly; don't do it. In reality, HTML string-slinging invariably sucks. Especially when you've got JavaScript code inside HTML inside a JavaScript string. The escaping rules get insane and if you make a mistake escaping content that comes from user input, you've given yourself a cross-site-scripting security hole.
Instead, use DOM methods. This takes all the escaping out of the equation and it's generally more readable than hacked-together HTML markup strings. Then you can freely assign onclick to whatever function you like. eg.:
var span= document.createElement('span');
if (img.onclick)
span.onclick= img.onclick;
span.onmouseover= function() {
this.style.background= '#CECECE';
};
span.onmouseout= function() {
this.style.background= '';
};
someparentelement.appendChild(span);
Also consider replacing the mouseover/mouseout with a simple CSS :hover rule, for maintainability. The only browser that still needs help with :hover is IE6.

Categories