I'm working on an add-on using Mozilla's Add-on SDK, and I've come across the need to HTML encode some text (swap out ampersands and special characters for their & equivalents). You can do this in JavaScript using the DOM by calling document.createElement() and adding text to it (provoking the browser to encode the text). Trouble is, in the privileged code (main.js) there is no DOM, so no way to access these features, or even use a library like jQuery. Is there a best practice here? How can I get access to features that would typically require a global document object from main.js?
If I understood correctly, you want to replace HTML entities (& and similar) by the actual characters. And your solution so far was:
var text = "foo&bar";
var element = document.createElement('foo');
element.innerHTML = text;
text = element.textContent;
Instead of using the DOM of your document (and risking running some script unintentionally) you can use DOMParser - it will parse text without any side-effects. Unfortunately, accessing DOMParser from main.js requires chrome authority but other than that the code is straightforward:
var text = "foo&bar";
var {Cc, Ci} = require("chrome");
var parser = Cc["#mozilla.org/xmlextras/domparser;1"]
.createInstance(Ci.nsIDOMParser);
text = parser.parseFromString(text, "text/html").documentElement.textContent;
Related
I have a little testcase over at:
http://jsfiddle.net/9xwUx/1/
The code boils down to the following (given a node with id "target"):
var string = '<div class="makeitpink">this should be pink, but is not</div>';
var parser = new DOMParser();
var domNode = parser.parseFromString(string,"text/xml");
document.getElementById("target").appendChild(domNode.firstChild);
If you run the testcase, and then inspect the target node via firebug/chrome web inspector and select any node within the body tag of jsfiddle's iframe, and do "edit as HTML", add a random charachter anywhere as a string [not an attribute to a domnode, to be clear], and "save", the style is applied. but not before that.
To say that i'm confused is an understatement.
Can anybody please clarify what is going on here?
Thanks.
You can change the mime type to text/html and do the following:
var parser = new DOMParser()
var doc = parser.parseFromString(markup, 'text/html')
return doc.body.firstChild
I didn't test on every browser but it works on Chrome and Firefox. I don't see any reason it wouldn't work elsewhere.
a bit late, but the reason is that you have parsed these using the text/xml option, which means that the results are XML nodes, which don't have CSS applied to them. When you right-click and go "edit as HTML" the browser reinterprets them as HTML and the change in the element will cause a redraw, reapplying the CSS.
I've been parsing my using the relatively hack-ish, yet definitely working method of creating a temporary element and manipulating the innerHTML property, making the browser do the parsing instead:
var temp = document.createElement("div")
//assuming you have some HTML partial called 'fragment'
temp.innerHTML = fragment
return temp.firstChild
Which you've noted in your jsfiddle. Basically it boils down to the output of the DOMParser being an instance of XMLDocument when you use the text/xml option.
In an app I receive some HTML text: since the app can't display (interpret) HTML, I need to remove any HTML tag and entity from the string I receive from the server.
I tried the following, but this one removes HTML tags but not entities (eg. &bnsp;):
stringFromServer.replace(/(<([^>]+)>)/ig,"");
Any help is appreciated.
Disclaimer: I need a pure JavaScript solution (no JQuery, Underscore, etc.).
[UPDATE] I'm reading all your answers now and I forgot to mention that I'm using JavaScript BUT the environment is not a web page, so I have no DOM.
You can try something like this:
var placeholder = document.createElement('div');
placeholder.innerHTML = stringFromServer;
var theText = placeholder.innerText;
.innerText only grabs text content from the element.
However, since it appears you don't have access to any DOM manipulation at all, you're probably going to have to use some kind of HTML parser, like these:
https://www.npmjs.org/package/htmlparser
http://ejohn.org/blog/pure-javascript-html-parser/
A solution without using regexes or phantom divs can be found on Mozilla's MDN.
I put the code in a JSfiddle here:
var sMyString = "<a id=\"a\"><b id=\"b\">hey!<\/b><\/a>";
var oParser = new DOMParser();
var oDOM = oParser.parseFromString(sMyString, "text/xml");
// print the name of the root element or error message
alert(oDOM.documentElement.nodeName == "parsererror" ?
"error while parsing" : oDOM.documentElement.textContent);
Alternatively, parse the HTML snippet in a new document and do your dom manipulations from that (if you'd rather keep it separate from the current document):
var tmpDoc=document.implementation.createHTMLDocument("");
tmpDoc.body.innerHTML="<a href='#'>some text</a><p style=''> more text</p>";
tmpDoc.body.textContent;
tmpDoc.body.textContent evaluates to:
some text more text
stringFromServer.replace(/(<([^>]+)>|&[^;]+;)/ig, "")
I need to create a xml document (with JavaScript) containing nodes, which is named in russian.
I get InvalidCharacterError in IE11 when trying run doc.createElement("Выборка")
doc is created with var doc = document.implementation.createDocument("", "", null)
In other browsers this code is working without any issues.
How can be solved? What is the root of an issue?
jsFiddle example: http://jsfiddle.net/e4tUH/1/
My post on connect.microsoft.com: https://connect.microsoft.com/IE/feedback/details/812130/cant-create-xml-node-with-cyrillic-name-in-ie11
Current workaround: Switch IE11 to IE10 with X-UA-Compatible meta-tag and use window.ActiveXObject(...) to create XML documents.
Maybe IE11 has an issue similar to what Firefox had in the past:
https://bugzilla.mozilla.org/show_bug.cgi?id=431701
That means that although your page is loading the correct encoding, IE11 is creating the new document with a default encoding which is not the expected one. There's no way to check that besides looking into IE11 source code, which we don't have.
Have you trying to add non-ASCII characters in other places besides element names? Like an attribute value or a text node?
I searched how to change the created document encoding and haven't found any solution for that.
To solve your problem I would suggest to use a DOMParser and generate a document from a XML string, like the following:
var parser=new DOMParser();
var xmlDoc=parser.parseFromString('<?xml version="1.0" encoding="UTF-8"?><Выборка>Выборка текста</Выборка>',"text/xml");
All browsers seems to support it for XML parsing. More about DOMParser on the following links, including how to provide backward compatibility with older IE versions:
http://www.w3schools.com/dom/dom_parser.asp
https://developer.mozilla.org/en-US/docs/Web/API/DOMParser
If you don't want to generate your XML just by concatenating strings, you can use some kind of XML builder like in this example: http://jsfiddle.net/UGYWx/6/
Then you can easily create your XML in a more safe manner:
var builder = new XMLBuilder("rootElement");
builder.text('Some text');
var element = builder.element("someElement", {'attr':'value'});
element.text("This is a text.");
builder.text('Some more Text');
builder.element("emptyElement");
builder.text('Even some more text');
builder.element("emptyWithAttributes", {'a1': 'val1', 'a2' : 'val2'});
$('div').text(builder.toString());
I have always been very reluctant to use non-ASCII characters inside source code. Try escaping the string; maybe it helps.
doc.createElement("\u0412\u044B\u0431\u043E\u0440\u043A\u0430")
Say I have this XML with about 1000+ bookinfo nodes.
<results>
<books>
<bookinfo>
<name>1</dbname>
</bookinfo>
<bookinfo>
<name>2</dbname>
</bookinfo>
<bookinfo>
<name>3</dbname>
</bookinfo>
</books>
</results>
I'm currently using this to get the name of each book:
var books = this.req.responseXML.getElementsByTagName("books")[0].getElementsByTagName("bookinfo")
Then use a for loop to do something with each book name:
var bookName = books[i].getElementsByTagName("name")[0].firstChild.nodeValue;
I'm finding this really slow when books is really big. Unfortunately, there's no way to limit the result set nor specify a different return type.
Is there a faster way?
You can try fast xml parser to convert XML data to JSON which is implemented in JS. Here is the benchmark against other parser.
var parser = require('fast-xml-parser');
var jsonObj = parser.parse(xmlData);
// when a tag has attributes
var options = {
attrPrefix : "#_" };
var jsonObj = parser.parse(xmlData,options);
If you don't want to use npm library, you can include parser.js in your HTML directly.
Disclaimer: I'm the author of this library.
Presumably you are using XMLHttpRequest, in which case the XML is parsed before you call any methods of responseXML (i.e. the XML has already been parsed and turned into a DOM). If you want a faster parser, you'll probably need a different user agent or a different javascript engine for your current UA.
If you want a faster way to access content in the XML document, consider XPath:
Mozilla documentation
MSDN documentation
I used an XPath expression (like //parentNode/node/text()) on a 134KB local file to extract the text node of 439 elements, put those into an array (because that's what my standard evalXPath() function does), then iterate over that array to put the nodeValue for each text node into another array, doing two replace calls with regular expressions to format the text, then alert() that to the screen with join('\n'). It took 3ms.
A 487KB file with 529 nodes took 4ms (IE 6 reported 15ms but its clock has very poor resolution). Of course my network latency will be nearly zero, but it shows that the XML parser, XPath evaluator and script in general can process that size file quickly.
if you want to parse the information from that xml much faster, try txml. it is very easy to use and for the type of xml you have shown, you can use its simplify method. it will give you very clean objects to work with.
https://www.npmjs.com/package/txml
Disclaimer: I'm the author of this library.
I am actually making a Sidebar Gadget, (which is AJAX-based) and I am looking for a way to extract a single element from an AJAX Request.
The only way I found yet was to do something like that:
var temp = document.createElement("div");
temp.innerHTML = HttpRequest.innerText;
document.body.appendChild(temp);
temp.innerHTML = document.getElementByID("WantedElement").innerText;
But it is pretty ugly, I would like to extract WantedElement directly from the request without adding it to the actual document...
Thank you!
If you're in control of the data, the way you're doing it is probably the best method. Other answers here have their benefits but also they're all rather flawed. For instance, the querySelector() method is only available to Windows Desktop Gadgets running in IE8 mode on the host machine. Regular expressions are particularly unreliable for parsing HTML and should not be used.
If you're not in control of the data or if the data is not transferred over a secure protocol, you should be more concerned about security than code aesthetics -- you may be introducing potential security risks to the gadget and the host machine by inserting unsanitized HTML into the document. Since gadgets run with user or admin level privileges, the obvious security risk is untrusted source/MITM script injection, leaving a hole for malicious scripts to wreak havoc on the machine it's running on.
One potential solution is to use the htmlfile ActiveXObject:
function getElementFromResponse(divId)
{
var h = new ActiveXObject("htmlfile");
h.open();
// disable activex controls
h.parentWindow.ActiveXObject = function () {};
// write the html to the document
h.write(html);
h.close();
return h.getElementById("divID").innerText;
}
You could also make use of IE8's toStaticHTML() method, but your gadget would need to be running in IE8 mode.
One option would be to use regular expressions:
var str = response.match(/<div id="WantedElement">(.+)<\/div>/);
str[0]; // contents of div
However, if your server response is more complex, I'd suggest you to use a data format like JSON for the response. Then it would be much cleaner to parse at the client side.
You could append the response from XMLHttpRequest inside a hidden div, and then call getElementById to get the desired element. Later remove the div when done with it. Or maybe create a function that handles this for you.
function addNinjaNodeToDOM(html) {
var ninjaDiv = document.createElement("div");
ninjaDiv.innerHTML = html;
ninjaDiv.style.display = 'none';
return ninjaDiv;
}
var wrapper = addNinjaNodeToDOM(HttpRequest.innerText);
var requiredNode = wrapper.getElementById("WantedElement");
// do something with requiredNode
document.body.removeChild(wrapper); // remove when done
The only reason for appending it to the DOM was because getElementById will not work unless its part of the DOM tree. See MDC.
However, you can still run selector and XPath queries on detached DOM nodes. That would save you from having you to append elements to the DOM.
var superNinjaDiv = document.createElement('div');
superNinjaDiv.innerHTML = html;
var requiedNode = superNinjaDiv.querySelector("[id=someId]");
I think using getElementById to lookup the element in this case is not a good approach. This is because of extra steps you have to take to use it. You wrap the element in a DIV, inject in DOM, lookup your element using getElementById and then remove the injected DIV from DOM.
DOM manipulation is expensive and injection might cause unnecessary reflow as well. The problem is that you have a document.getElementById and not a element.getElementById which would allow you to query without injection in the document.
To solve this, using querySelector is an obvious solution which is far more easier. Else, I would suggest using getElementsByClassName if you can and if your element has a class defined.
getElementsByClassName is defined on ELEMENT and hence can be used without injecting the element in DOM.
Hope this helps.
It's somewhat unusual to pass HTML through an AJAX request; normally you pass a JSON string that the client can evaluate directly, and work with that
That being said, I don't think there's a way to parse HTML in javascript the way you want that's cross-browser, but here's a way to do it in Mozilla derivatives:
var r = document.createRange();
r.selectNode(document.body);
var domNode = r.createContextualFragment(HTTPRequest.innerText);