Fastest Way to Parse this XML in JS

Fastest Way to Parse this XML in JS - javascript

Say I have this XML with about 1000+ bookinfo nodes.
<results>
<books>
<bookinfo>
<name>1</dbname>
</bookinfo>
<bookinfo>
<name>2</dbname>
</bookinfo>
<bookinfo>
<name>3</dbname>
</bookinfo>
</books>
</results>
I'm currently using this to get the name of each book:
var books = this.req.responseXML.getElementsByTagName("books")[0].getElementsByTagName("bookinfo")
Then use a for loop to do something with each book name:
var bookName = books[i].getElementsByTagName("name")[0].firstChild.nodeValue;
I'm finding this really slow when books is really big. Unfortunately, there's no way to limit the result set nor specify a different return type.
Is there a faster way?

You can try fast xml parser to convert XML data to JSON which is implemented in JS. Here is the benchmark against other parser.
var parser = require('fast-xml-parser');
var jsonObj = parser.parse(xmlData);
// when a tag has attributes
var options = {
attrPrefix : "#_" };
var jsonObj = parser.parse(xmlData,options);
If you don't want to use npm library, you can include parser.js in your HTML directly.
Disclaimer: I'm the author of this library.

Presumably you are using XMLHttpRequest, in which case the XML is parsed before you call any methods of responseXML (i.e. the XML has already been parsed and turned into a DOM). If you want a faster parser, you'll probably need a different user agent or a different javascript engine for your current UA.
If you want a faster way to access content in the XML document, consider XPath:
Mozilla documentation
MSDN documentation
I used an XPath expression (like //parentNode/node/text()) on a 134KB local file to extract the text node of 439 elements, put those into an array (because that's what my standard evalXPath() function does), then iterate over that array to put the nodeValue for each text node into another array, doing two replace calls with regular expressions to format the text, then alert() that to the screen with join('\n'). It took 3ms.
A 487KB file with 529 nodes took 4ms (IE 6 reported 15ms but its clock has very poor resolution). Of course my network latency will be nearly zero, but it shows that the XML parser, XPath evaluator and script in general can process that size file quickly.

if you want to parse the information from that xml much faster, try txml. it is very easy to use and for the type of xml you have shown, you can use its simplify method. it will give you very clean objects to work with.
https://www.npmjs.com/package/txml
Disclaimer: I'm the author of this library.

Related

Decompressing bzip2 data in Javascript

I ultimately have to consume some data from a Javascript file that looks as follows:
Note: The base64 is illustrative only.
function GetTripsDataCompressed() { return 'QlpoOTFBWSZTWdXoWuEDCAgfgBAHf/.....=='; }
GetTripsDataCompressed() returns a base64 string that is derived as an array of objects converted to JSON using JSON.NET and the resulting string then compressed to bzip2 using SharpCompress with the resulting memory stream Base64 encoded.
This is what I have and cannot change it.
I am struggling to find a bzip2 JavaScript implementation that will take the result of:
var rawBzip2Data = atob(GetTripsDataCompressed());
and convert rawBzip2Data back into the string that is the JSON array. I cannot use something like compressjs as I need to support IE 10 and as it uses typed arrays that means IE10 support is out.
So it appears that my best option is https://github.com/antimatter15/bzip2.js however because I have not created an archive and only bzip2 a string it raises an error of Uncaught No magic number found after doing:
var c = GetTripsDataCompressed();
c = atob(c);
var arr = new Uint8Array(c);
var bitstream = bzip2.array(arr);
bzip2.simple(bitstream);
So can anyone help me here to decompress a BZip2, Base64 encoded string from JavaScript using script that is IE 10 compliant? Ultimately I don't care whether it uses https://github.com/antimatter15/bzip2.js or some other native JavaScript implementation.

It seems to me the answer is in the readme:
decompress(bitstream, size[, len]) does the main decompression of a single block. It'll return -1 if it detects that it's the final block, otherwise it returns a string with the decompressed data. If you want to cap the output to a certain number of bytes, set the len argument.
Also, keep in mind the repository doesn't have a license attached. You'll need to reach out to the author if you want to use the code. That might be tricky given that the repository is eight years old.
On the other hand, the Bzip2 algorithm itself is open-source (BSD-like license), so you can just reimplement it yourself in Javascript. It's just a few hundred lines of relatively straight-forward code.

Difference between XmlService and importxml

When trying to parse html as xml in google apps script, this code:
var yahoo= 'http://finance.yahoo.com/q?s=aapl'
var xml = UrlFetchApp.fetch(yahoo).getContentText();
var document = XmlService.parse(xml);
will return an error like this:
Error on line 20: The entity name must immediately follow the '&' in the entity reference. (line 13, file "")
Presumably because the html is not xml-compliant in some way in line 20. What surprises me is that when you do the same thing in google sheets and also supply an xpath, the html will be parsed as xml without problems:
=IMPORTXML("http://finance.yahoo.com/q?s=aapl,"//div[#class='title']")
will return "Apple Inc. (AAPL)". I assume that the sheets function has some way of cleaning the html to make it xml compliant.
do you think that could be the case?
if yes, do you have an idea how I could adapt the xml parser in apps script in such a way that I can access html from yahoo finance and treat it as xml?
thanks in advance!

New XmlService could not do lenient parse. So no way right now. But you can still use old Xml service that is support lenient parse (perhaps IMPORTXML use it as well). The code that works:
var yahoo= 'http://finance.yahoo.com/q?s=aapl'
var xml = UrlFetchApp.fetch(yahoo).getContentText();
var document = Xml.parse(xml, true);
And there is the issue report about no ability to lenient parse in the new XmlService: https://code.google.com/p/google-apps-script-issues/issues/detail?id=3727
So I propose you to use old way and keep an eye on this issue.

Can't create XML node with cyrillic name in IE11

I need to create a xml document (with JavaScript) containing nodes, which is named in russian.
I get InvalidCharacterError in IE11 when trying run doc.createElement("Выборка")
doc is created with var doc = document.implementation.createDocument("", "", null)
In other browsers this code is working without any issues.
How can be solved? What is the root of an issue?
jsFiddle example: http://jsfiddle.net/e4tUH/1/
My post on connect.microsoft.com: https://connect.microsoft.com/IE/feedback/details/812130/cant-create-xml-node-with-cyrillic-name-in-ie11
Current workaround: Switch IE11 to IE10 with X-UA-Compatible meta-tag and use window.ActiveXObject(...) to create XML documents.

Maybe IE11 has an issue similar to what Firefox had in the past:
https://bugzilla.mozilla.org/show_bug.cgi?id=431701
That means that although your page is loading the correct encoding, IE11 is creating the new document with a default encoding which is not the expected one. There's no way to check that besides looking into IE11 source code, which we don't have.
Have you trying to add non-ASCII characters in other places besides element names? Like an attribute value or a text node?
I searched how to change the created document encoding and haven't found any solution for that.
To solve your problem I would suggest to use a DOMParser and generate a document from a XML string, like the following:
var parser=new DOMParser();
var xmlDoc=parser.parseFromString('<?xml version="1.0" encoding="UTF-8"?><Выборка>Выборка текста</Выборка>',"text/xml");
All browsers seems to support it for XML parsing. More about DOMParser on the following links, including how to provide backward compatibility with older IE versions:
http://www.w3schools.com/dom/dom_parser.asp
https://developer.mozilla.org/en-US/docs/Web/API/DOMParser
If you don't want to generate your XML just by concatenating strings, you can use some kind of XML builder like in this example: http://jsfiddle.net/UGYWx/6/
Then you can easily create your XML in a more safe manner:
var builder = new XMLBuilder("rootElement");
builder.text('Some text');
var element = builder.element("someElement", {'attr':'value'});
element.text("This is a text.");
builder.text('Some more Text');
builder.element("emptyElement");
builder.text('Even some more text');
builder.element("emptyWithAttributes", {'a1': 'val1', 'a2' : 'val2'});
$('div').text(builder.toString());

I have always been very reluctant to use non-ASCII characters inside source code. Try escaping the string; maybe it helps.
doc.createElement("\u0412\u044B\u0431\u043E\u0440\u043A\u0430")

How to use DOM features and jQuery in main.js?

I'm working on an add-on using Mozilla's Add-on SDK, and I've come across the need to HTML encode some text (swap out ampersands and special characters for their & equivalents). You can do this in JavaScript using the DOM by calling document.createElement() and adding text to it (provoking the browser to encode the text). Trouble is, in the privileged code (main.js) there is no DOM, so no way to access these features, or even use a library like jQuery. Is there a best practice here? How can I get access to features that would typically require a global document object from main.js?

If I understood correctly, you want to replace HTML entities (& and similar) by the actual characters. And your solution so far was:
var text = "foo&bar";
var element = document.createElement('foo');
element.innerHTML = text;
text = element.textContent;
Instead of using the DOM of your document (and risking running some script unintentionally) you can use DOMParser - it will parse text without any side-effects. Unfortunately, accessing DOMParser from main.js requires chrome authority but other than that the code is straightforward:
var text = "foo&bar";
var {Cc, Ci} = require("chrome");
var parser = Cc["#mozilla.org/xmlextras/domparser;1"]
.createInstance(Ci.nsIDOMParser);
text = parser.parseFromString(text, "text/html").documentElement.textContent;

How do I get my widget not to crash if there is no value in a xml node?

I'm getting an xml file and want to get data from it.
The source of the xml doesn't really matter but what I;ve got to get a certain field is:
tracks = xmlDoc.getElementsByTagName("track");
variable = tracks.item(i).childNodes.item(4).childNodes.item(0).nodeValue;
Now this works like a charm, EXCEPT when there is no value in the node. So if the structure is like this:
<xml>
<one>
<two>nodeValue</two>
</one>
<one>
<two></two>
</one>
</xml>
the widget will crash on the second 'one' node, because there is no value in the 'two' node. The console says:
TypeError: tracks.item(i).childNodes.item(4).childNodes.item(0) has no properties
Any ideas on how to get the widget to just see empty as an empty string (null, empty, or ""), instead of crashing? I'm guessing something along the lines of data, getValue(), text, or something else.
using
var track= xmlDoc.getElementsByTagName('track')[0];
var info= track.getElementsByTagName('artist')[0];
var value= info.firstChild? info.firstChild.data : '';
doesn't work and returns "TypeError: track has no properties". That's from the second line where artist is called.

Test that the ‘two’ node has a child node before accessing its data.
childNodes.item(i) (or the JavaScript simple form childNodes[i]) should generally be avoided, it's a bit fragile relying on whitespace text nodes being in the exact expected place.
I'd do something like:
var tracks= xmlDoc.getElementsByTagName('track')[0];
var track= tracks.getElementsByTagName('one')[0];
var info= track.getElementsByTagName('two')[0];
var value= info.firstChild? info.firstChild.data : '';
(If you don't know the tagnames of ‘one’ and ‘two’ in advance, you could always use ‘getElementsByTagName('*')’ to get all elements, as long as you don't need to support IE5, where this doesn't work.)
An alternative to the last line is to use a method to read all the text inside the node, including any of its child nodes. This doesn't matter if the node only ever contains at most one Text node, but can be useful if the tree can get denormalised or contain EntityReferences or nested elements. Historically one had to write a recurse method to get this information, but these days most browsers support the DOM Level 3 textContent property and/or IE's innerText extension:
var value= info.textContent!==undefined? info.textContent : info.innerText;

without a dtd that allows a one element to contain an empty two element, you will have to parse and fiddle the text of your xml to get a document out of it.
Empty elements are like null values in databases- put in something, a "Nothing" or "0" value, a non breaking space, anything at all- or don't include the two element.
Maybe it could be an attribute of one, instead of an element in its own right.
Attributes can have empty strings for values. Better than phantom elements .

Yahoo! Widgets does not implement all basic javascript functions needed to be able to use browser-code in a widget.
instead of using:
tracks = xmlDoc.getElementsByTagName("track");
variable = tracks.item(i).childNodes.item(4).childNodes.item(0).nodeValue;
to get values it's better to use Xpath with a direct conversion to string. When a string is empty in Yahoo! Widgets it doesn't give any faults, but returns the 'empty'. innerText and textContent (the basic javascript way in browsers, used alongside things like getElementsByTagName) are not fully (or not at all) implemented in the Yahoo! Widgets Engine and make it run slower and quite awfully react to xmlNodes and childNodes. an easy way however to traverse an xml Document structure is using 'evaluate' to get everything you need (including lists of nodes) from the xml.
After finding this out, my solution was to make everything a lot easier and less sensitive to faults and errors. Also I chose to put the objects in an array to make working with them easier.
var entries = xmlDoc.evaluate("lfm/recenttracks/track");
var length = entries.length;
for(var i = 0; i < length; i++) {
var entry = entries.item(i);
var obj = {
artist: entry.evaluate("string(artist)"),
name: entry.evaluate("string(name)"),
url: entry.evaluate("string(url)"),
image: entry.evaluate("string(image[#size='medium'])")
};
posts[i] = obj;
}

We Keep Coding

JavaScript is the programming language of the Web.

Fastest Way to Parse this XML in JS - javascript

if you want to parse the information from that xml much faster, try txml. it is very easy to use and for the type of xml you have shown, you can use its simplify method. it will give you very clean objects to work with. https://www.npmjs.com/package/txml Disclaimer: I'm the author of this library.

Related

Decompressing bzip2 data in Javascript

Difference between XmlService and importxml

Can't create XML node with cyrillic name in IE11

How to use DOM features and jQuery in main.js?

How do I get my widget not to crash if there is no value in a xml node?

Categories

Resources