When trying to parse html as xml in google apps script, this code:
var yahoo= 'http://finance.yahoo.com/q?s=aapl'
var xml = UrlFetchApp.fetch(yahoo).getContentText();
var document = XmlService.parse(xml);
will return an error like this:
Error on line 20: The entity name must immediately follow the '&' in the entity reference. (line 13, file "")
Presumably because the html is not xml-compliant in some way in line 20. What surprises me is that when you do the same thing in google sheets and also supply an xpath, the html will be parsed as xml without problems:
=IMPORTXML("http://finance.yahoo.com/q?s=aapl,"//div[#class='title']")
will return "Apple Inc. (AAPL)". I assume that the sheets function has some way of cleaning the html to make it xml compliant.
do you think that could be the case?
if yes, do you have an idea how I could adapt the xml parser in apps script in such a way that I can access html from yahoo finance and treat it as xml?
thanks in advance!
New XmlService could not do lenient parse. So no way right now. But you can still use old Xml service that is support lenient parse (perhaps IMPORTXML use it as well). The code that works:
var yahoo= 'http://finance.yahoo.com/q?s=aapl'
var xml = UrlFetchApp.fetch(yahoo).getContentText();
var document = Xml.parse(xml, true);
And there is the issue report about no ability to lenient parse in the new XmlService: https://code.google.com/p/google-apps-script-issues/issues/detail?id=3727
So I propose you to use old way and keep an eye on this issue.
Related
Can someone offer some troubleshooting tips here? My PostXML() return value is NULL for a very small subset of data results. This works for 99.9% of usage. I think the failed data may have special characters perhaps, but comparing to a similar dataset, its identical and passes OK? The oXMLDoc.xml in File.asp contains a valid XML string while debugging, but its null when it gets back to my JS call.
Is there any known issues with what looks like a valid XML element getting trashed in the Microsoft.XMLHTTP object?
function PostXML(sXML)
{
var oHTTPPost = new ActiveXObject("Microsoft.XMLHTTP");
oHTTPPost.Open("POST","File.asp", false);
oHTTPPost.send(sXML);
// documentElement is null???
return oHTTPPost.responseXML.documentElement;
}
File.asp
<%
' oXMLDoc.xml contains valid XML here, but is NULL in the calling JS
Response.ContentType = "text/xml"
Response.Write oXMLDoc.xml
%>
Check the response headers. The content type needs to be application/xml.
XMLHttpRequest is available in current IEs. I suggest using the ActiveX only as a fallback.
You can override the content mimetype on it:
xhr = new XMLHttpRequest();
xhr.overrideMimeType("application/xml");
...
This might by possible on the ActiveX object, too. But I am not sure.
Another possibility is using the DOMParser to convert a received string into an Document instance.
Found the issue.
Using IE Dev/Debugger, I found xEFxBFxBF in one of the string attributes. This product uses MS SQL Server and the query output did not reflect these characters even if copy/pasted into Notepad++. I'm assuming Ent Manager filters out unsupported characters... /=
Thanks for the help people!
Ouch man.
Is it possible to use jQuery instead?
The other thing I know is that if the return xml is a little off (poorly formatted xml, case sensitive, non-legal characters) the javascript will trash the return value.
With jQuery you have better debugging options to see the error.
I need to create a xml document (with JavaScript) containing nodes, which is named in russian.
I get InvalidCharacterError in IE11 when trying run doc.createElement("Выборка")
doc is created with var doc = document.implementation.createDocument("", "", null)
In other browsers this code is working without any issues.
How can be solved? What is the root of an issue?
jsFiddle example: http://jsfiddle.net/e4tUH/1/
My post on connect.microsoft.com: https://connect.microsoft.com/IE/feedback/details/812130/cant-create-xml-node-with-cyrillic-name-in-ie11
Current workaround: Switch IE11 to IE10 with X-UA-Compatible meta-tag and use window.ActiveXObject(...) to create XML documents.
Maybe IE11 has an issue similar to what Firefox had in the past:
https://bugzilla.mozilla.org/show_bug.cgi?id=431701
That means that although your page is loading the correct encoding, IE11 is creating the new document with a default encoding which is not the expected one. There's no way to check that besides looking into IE11 source code, which we don't have.
Have you trying to add non-ASCII characters in other places besides element names? Like an attribute value or a text node?
I searched how to change the created document encoding and haven't found any solution for that.
To solve your problem I would suggest to use a DOMParser and generate a document from a XML string, like the following:
var parser=new DOMParser();
var xmlDoc=parser.parseFromString('<?xml version="1.0" encoding="UTF-8"?><Выборка>Выборка текста</Выборка>',"text/xml");
All browsers seems to support it for XML parsing. More about DOMParser on the following links, including how to provide backward compatibility with older IE versions:
http://www.w3schools.com/dom/dom_parser.asp
https://developer.mozilla.org/en-US/docs/Web/API/DOMParser
If you don't want to generate your XML just by concatenating strings, you can use some kind of XML builder like in this example: http://jsfiddle.net/UGYWx/6/
Then you can easily create your XML in a more safe manner:
var builder = new XMLBuilder("rootElement");
builder.text('Some text');
var element = builder.element("someElement", {'attr':'value'});
element.text("This is a text.");
builder.text('Some more Text');
builder.element("emptyElement");
builder.text('Even some more text');
builder.element("emptyWithAttributes", {'a1': 'val1', 'a2' : 'val2'});
$('div').text(builder.toString());
I have always been very reluctant to use non-ASCII characters inside source code. Try escaping the string; maybe it helps.
doc.createElement("\u0412\u044B\u0431\u043E\u0440\u043A\u0430")
I'm working on an add-on using Mozilla's Add-on SDK, and I've come across the need to HTML encode some text (swap out ampersands and special characters for their & equivalents). You can do this in JavaScript using the DOM by calling document.createElement() and adding text to it (provoking the browser to encode the text). Trouble is, in the privileged code (main.js) there is no DOM, so no way to access these features, or even use a library like jQuery. Is there a best practice here? How can I get access to features that would typically require a global document object from main.js?
If I understood correctly, you want to replace HTML entities (& and similar) by the actual characters. And your solution so far was:
var text = "foo&bar";
var element = document.createElement('foo');
element.innerHTML = text;
text = element.textContent;
Instead of using the DOM of your document (and risking running some script unintentionally) you can use DOMParser - it will parse text without any side-effects. Unfortunately, accessing DOMParser from main.js requires chrome authority but other than that the code is straightforward:
var text = "foo&bar";
var {Cc, Ci} = require("chrome");
var parser = Cc["#mozilla.org/xmlextras/domparser;1"]
.createInstance(Ci.nsIDOMParser);
text = parser.parseFromString(text, "text/html").documentElement.textContent;
Say I have this XML with about 1000+ bookinfo nodes.
<results>
<books>
<bookinfo>
<name>1</dbname>
</bookinfo>
<bookinfo>
<name>2</dbname>
</bookinfo>
<bookinfo>
<name>3</dbname>
</bookinfo>
</books>
</results>
I'm currently using this to get the name of each book:
var books = this.req.responseXML.getElementsByTagName("books")[0].getElementsByTagName("bookinfo")
Then use a for loop to do something with each book name:
var bookName = books[i].getElementsByTagName("name")[0].firstChild.nodeValue;
I'm finding this really slow when books is really big. Unfortunately, there's no way to limit the result set nor specify a different return type.
Is there a faster way?
You can try fast xml parser to convert XML data to JSON which is implemented in JS. Here is the benchmark against other parser.
var parser = require('fast-xml-parser');
var jsonObj = parser.parse(xmlData);
// when a tag has attributes
var options = {
attrPrefix : "#_" };
var jsonObj = parser.parse(xmlData,options);
If you don't want to use npm library, you can include parser.js in your HTML directly.
Disclaimer: I'm the author of this library.
Presumably you are using XMLHttpRequest, in which case the XML is parsed before you call any methods of responseXML (i.e. the XML has already been parsed and turned into a DOM). If you want a faster parser, you'll probably need a different user agent or a different javascript engine for your current UA.
If you want a faster way to access content in the XML document, consider XPath:
Mozilla documentation
MSDN documentation
I used an XPath expression (like //parentNode/node/text()) on a 134KB local file to extract the text node of 439 elements, put those into an array (because that's what my standard evalXPath() function does), then iterate over that array to put the nodeValue for each text node into another array, doing two replace calls with regular expressions to format the text, then alert() that to the screen with join('\n'). It took 3ms.
A 487KB file with 529 nodes took 4ms (IE 6 reported 15ms but its clock has very poor resolution). Of course my network latency will be nearly zero, but it shows that the XML parser, XPath evaluator and script in general can process that size file quickly.
if you want to parse the information from that xml much faster, try txml. it is very easy to use and for the type of xml you have shown, you can use its simplify method. it will give you very clean objects to work with.
https://www.npmjs.com/package/txml
Disclaimer: I'm the author of this library.
It seems that there is no xml parsing tool in available JSFL (Adobe Flash-extension Javascript script file) : http://osflash.org/pipermail/flashextensibility_osflash.org/2006-July/000014.html
So, is there an easy and cross-platform way to add a javascript xml parser?
I know this is an old question, but I was looking for a solution to this problem as well (using Flash CS3). I needed to parse XML from a data file on disk. Combining the suggestion of George Profenza I was able to get it to work with using eval(). The key is to remove the first line (the xml declaration):
xmlData = eval( FLfile.read( xmlFile ).split( '\n' ).slice( 1 ).join( '\n' ) );
... and you're good to go!
Well, you can use XML and E4X straight from JSFL using Flash CS3 or newer as the Javascript engine got upgraded to 1.6
Here's a quick snippet that loops through elements in the current selection and traces xml:
var doc = fl.getDocumentDOM();//get the current document ref.
var selection = doc.selection;//get the selection
var layout = <layout />;//create the root node for our xml
var elementsNum = selection.length;//store this for counting*
for(var i = 0 ; i < elementsNum ; i++){
layout.appendChild(<element />);//add an element node
layout.element[i].#name = selection[i].name;//setup attributes
layout.element[i].#x = selection[i].x;
layout.element[i].#y = selection[i].y;
}
fl.trace(layout);
var xml = new XML(FLfile.read(file));
Then you can access all your nodes and attributes by writing this:
xml.nodeName
xml.nodeName.#attribute1
xml.nodeName.#attribute2
nodeName is the name of your node.
attribute1 should be the name of your attribute.
I hope this helps.
If JSFL uses ActionScript at some point, you can just do XML(xml_string) or XMLList(multiple_xml_nodes_string) as long as it's ActionScript 3.0 or higher. ActionScript 3.0 supports E4X witch is native XML in ECMAScript.
What works nicely for me is just creating a SwfWindow. Working in JSFL is nice and fast because you can change out the file without having to restart Flash, but often ActionScript gives you more power.
My current project does a couple tricks:
I will create objects in JSFL and then convert them to XML. I have to serialize them from Object format into a string which I pass to the SwfWindow (Panel). From the Panel, I take the String can convert it into XML. Then you can do anything you want in Actionscript 3.0.
If I just have XML manipulation, I will prompt the User for a XML files path in JSFL code, but hand the URL directly to the Panel, and have the Panel just load the XML directly.
Finally. For saving the XML, I will have to convert the XML to string via, 'xml.toXmlString()', but you also need to remove the '\n' so that you can hand the data to JSFL. I will strip out the '\n' for '|' or whatever you like. Then pass the string to JSFL, and you then can deserialize the string and change the '|' back to '\n' and save the file. Either using the older 'Save Output Panel' method, or using the newer File Write method.
Hope that helps.
function parseXML (xml) {
try { // normal browsers
return (new DOMParser()).parseFromString(xml, "application/xml");
}
catch (e) { // IE
var xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async = "false";
xmlDoc.loadXML(xml);
return xmlDoc;
}
}
You might want to read more on DOMParser and the Microsoft.XMLDOM ActiveX object.