How to create Document objects with JavaScript - javascript

Basically that's the question, how is one supposed to construct a Document object from a string of HTML dynamically in javascript?

There are two methods defined in specifications, createDocument from DOM Core Level 2 and createHTMLDocument from HTML5. The former creates an XML document (including XHTML), the latter creates a HTML document. Both reside, as functions, on the DOMImplementation interface.
var impl = document.implementation,
xmlDoc = impl.createDocument(namespaceURI, qualifiedNameStr, documentType),
htmlDoc = impl.createHTMLDocument(title);
In reality, these methods are rather young and only implemented in recent browser releases. According to http://quirksmode.org and MDN, the following browsers support createHTMLDocument:
Chrome 4
Opera 10
Firefox 4
Internet Explorer 9
Safari 4
Interestingly enough, you can (kind of) create a HTML document in older versions of Internet Explorer, using ActiveXObject:
var htmlDoc = new ActiveXObject("htmlfile");
The resulting object will be a new document, which can be manipulated just like any other document.

Assuming you are trying to create a fully parsed Document object from a string of markup and a content-type you also happen to know (maybe because you got the html from an xmlhttprequest, and thus got the content-type in its Content-Type http header; probably usually text/html) – it should be this easy:
var doc = (new DOMParser).parseFromString(markup, mime_type);
in an ideal future world where browser DOMParser implementations are as strong and competent as their document rendering is – maybe that's a good pipe dream requirement for future HTML6 standards efforts. It turns out no current browsers do, though.
You probably have the easier (but still messy) problem of having a string of html you want to get a fully parsed Document object for. Here is another take on how to do that, which also ought to work in all browsers – first you make a HTML Document object:
var doc = document.implementation.createHTMLDocument('');
and then populate it with your html fragment:
doc.open();
doc.write(html);
doc.close();
Now you should have a fully parsed DOM in doc, which you can run alert(doc.title) on, slice with css selectors like doc.querySelectorAll('p') or ditto XPath using doc.evaluate.
This actually works in modern WebKit browsers like Chrome and Safari (I just tested in Chrome 22 and Safari 6 respectively) – here is an example that takes the current page's source code, recreates it in a new document variable src, reads out its title, overwrites it with a html quoted version of the same source code and shows the result in an iframe: http://codepen.io/johan/full/KLIeE
Sadly, I don't think any other contemporary browsers have quite as solid implementations yet.

Per the spec (doc), one may use the createHTMLDocument method of DOMImplementation, accessible via document.implementation as follows:
var doc = document.implementation.createHTMLDocument('My title');
var body = document.createElementNS('http://www.w3.org/1999/xhtml', 'body');
doc.documentElement.appendChild(body);
// and so on
jsFiddle: http://jsfiddle.net/9Fh7R/
MDN document for DOMImplementation: https://developer.mozilla.org/en/DOM/document.implementation
MDN document for DOMImplementation.createHTMLDocument: https://developer.mozilla.org/En/DOM/DOMImplementation.createHTMLDocument

The following works in most common browsers, but not some. This is how simple it should be (but isn't):
// Fails if UA doesn't support parseFromString for text/html (e.g. IE)
function htmlToDoc(markup) {
var parser = new DOMParser();
return parser.parseFromString(markup, "text/html");
}
var htmlString = "<title>foo bar</title><div>a div</div>";
alert(htmlToDoc(htmlString).title);
To account for user agent vagaries, the following may be better (please note attribution):
/*
* DOMParser HTML extension
* 2012-02-02
*
* By Eli Grey, http://eligrey.com
* Public domain.
* NO WARRANTY EXPRESSED OR IMPLIED. USE AT YOUR OWN RISK.
*
* Modified to work with IE 9 by RobG
* 2012-08-29
*
* Notes:
*
* 1. Supplied markup should be avalid HTML document with or without HTML tags and
* no DOCTYPE (DOCTYPE support can be added, I just didn't do it)
*
* 2. Host method used where host supports text/html
*/
/*! #source https://gist.github.com/1129031 */
/*! #source https://developer.mozilla.org/en-US/docs/DOM/DOMParser */
/*global document, DOMParser*/
(function(DOMParser) {
"use strict";
var DOMParser_proto;
var real_parseFromString;
var textHTML; // Flag for text/html support
var textXML; // Flag for text/xml support
var htmlElInnerHTML; // Flag for support for setting html element's innerHTML
// Stop here if DOMParser not defined
if (!DOMParser) return;
// Firefox, Opera and IE throw errors on unsupported types
try {
// WebKit returns null on unsupported types
textHTML = !!(new DOMParser).parseFromString('', 'text/html');
} catch (er) {
textHTML = false;
}
// If text/html supported, don't need to do anything.
if (textHTML) return;
// Next try setting innerHTML of a created document
// IE 9 and lower will throw an error (can't set innerHTML of its HTML element)
try {
var doc = document.implementation.createHTMLDocument('');
doc.documentElement.innerHTML = '<title></title><div></div>';
htmlElInnerHTML = true;
} catch (er) {
htmlElInnerHTML = false;
}
// If if that failed, try text/xml
if (!htmlElInnerHTML) {
try {
textXML = !!(new DOMParser).parseFromString('', 'text/xml');
} catch (er) {
textHTML = false;
}
}
// Mess with DOMParser.prototype (less than optimal...) if one of the above worked
// Assume can write to the prototype, if not, make this a stand alone function
if (DOMParser.prototype && (htmlElInnerHTML || textXML)) {
DOMParser_proto = DOMParser.prototype;
real_parseFromString = DOMParser_proto.parseFromString;
DOMParser_proto.parseFromString = function (markup, type) {
// Only do this if type is text/html
if (/^\s*text\/html\s*(?:;|$)/i.test(type)) {
var doc, doc_el, first_el;
// Use innerHTML if supported
if (htmlElInnerHTML) {
doc = document.implementation.createHTMLDocument("");
doc_el = doc.documentElement;
doc_el.innerHTML = markup;
first_el = doc_el.firstElementChild;
// Otherwise use XML method
} else if (textXML) {
// Make sure markup is wrapped in HTML tags
// Should probably allow for a DOCTYPE
if (!(/^<html.*html>$/i.test(markup))) {
markup = '<html>' + markup + '<\/html>';
}
doc = (new DOMParser).parseFromString(markup, 'text/xml');
doc_el = doc.documentElement;
first_el = doc_el.firstElementChild;
}
// RG: I don't understand the point of this, I'll leave it here though
// In IE, doc_el is the HTML element and first_el is the HEAD.
//
// Is this an entire document or a fragment?
if (doc_el.childElementCount == 1 && first_el.localName.toLowerCase() == 'html') {
doc.replaceChild(first_el, doc_el);
}
return doc;
// If not text/html, send as-is to host method
} else {
return real_parseFromString.apply(this, arguments);
}
};
}
}(DOMParser));
// Now some test code
var htmlString = '<html><head><title>foo bar</title></head><body><div>a div</div></body></html>';
var dp = new DOMParser();
var doc = dp.parseFromString(htmlString, 'text/html');
// Treat as an XML document and only use DOM Core methods
alert(doc.documentElement.getElementsByTagName('title')[0].childNodes[0].data);
Don't be put off by the amount of code, there are a lot of comments, it can be shortened quite a bit but becomes less readable.
Oh, and if the markup is valid XML, life is much simpler:
var stringToXMLDoc = (function(global) {
// W3C DOMParser support
if (global.DOMParser) {
return function (text) {
var parser = new global.DOMParser();
return parser.parseFromString(text,"application/xml");
}
// MS ActiveXObject support
} else {
return function (text) {
var xmlDoc;
// Can't assume support and can't test, so try..catch
try {
xmlDoc = new ActiveXObject("Microsoft.XMLDOM");
xmlDoc.async="false";
xmlDoc.loadXML(text);
} catch (e){}
return xmlDoc;
}
}
}(this));
var doc = stringToXMLDoc('<books><book title="foo"/><book title="bar"/><book title="baz"/></books>');
alert(
doc.getElementsByTagName('book')[2].getAttribute('title')
);

An updated answer for 2014, as the DOMparser has evolved. This works in all current browsers I can find, and should work too in earlier versions of IE, using ecManaut's document.implementation.createHTMLDocument('') approach above.
Essentially, IE, Opera, Firefox can all parse as "text/html". Safari parses as "text/xml".
Beware of intolerant XML parsing, though. The Safari parse will break down at non-breaking spaces and other HTML characters (French/German accents) designated with ampersands. Rather than handle each character separately, the code below replaces all ampersands with meaningless character string "j!J!". This string can subsequently be re-rendered as an ampersand when displaying the results in a browser (simpler, I have found, than trying to handle ampersands in "false" XML parsing).
function parseHTML(sText) {
try {
console.log("Domparser: " + typeof window.DOMParser);
if (typeof window.DOMParser !=null) {
// modern IE, Firefox, Opera parse text/html
var parser = new DOMParser();
var doc = parser.parseFromString(sText, "text/html");
if (doc != null) {
console.log("parsed as HTML");
return doc
}
else {
//replace ampersands with harmless character string to avoid XML parsing issues
sText = sText.replace(/&/gi, "j!J!");
//safari parses as text/xml
var doc = parser.parseFromString(sText, "text/xml");
console.log("parsed as XML");
return doc;
}
}
else {
// older IE
doc= document.implementation.createHTMLDocument('');
doc.write(sText);
doc.close;
return doc;
}
} catch (err) {
alert("Error parsing html:\n" + err.message);
}
}

Related

Need to JSON stringify an object in ExtendScript

I am working on processing meta data information of my Indesign document links, using ExtentdScript.
I want to convert the object to string using JSON.stringify but when I use it, I am getting error saying:
can't execute script in target engine.
If I remove linkObjStr = JSON.stringify(linksInfObj); from below code, then everything works fine.
What is the equivalent to JSON.stringify in ExtendScript, or is there any other possibilities to display linksInfObj with its proper contents instead [object object]?
for (var i = 0, len = doc.links.length; i < len; i++) {
var linkFilepath = File(doc.links[i].filePath).fsName;
var linkFileName = doc.links[i].name;
var xmpFile = new XMPFile(linkFilepath, XMPConst.FILE_INDESIGN, XMPConst.OPEN_FOR_READ);
var allXMP = xmpFile.getXMP();
// Retrieve values from external links XMP.
var documentID = allXMP.getProperty(XMPConst.NS_XMP_MM, 'DocumentID', XMPConst.STRING);
var instanceID = allXMP.getProperty(XMPConst.NS_XMP_MM, 'InstanceID', XMPConst.STRING);
linksInfObj[linkFileName] = {'docId': documentID, 'insId': instanceID};
linkObjStr = JSON.stringify(linksInfObj);
alert('Object' + linksInfObj, true); // I am getting [Object Object] here
alert('String' + linkObjStr, true);
}
ExtendScript does not include a JSON object with the associated methods for parsing, namely JSON.parse() and JSON.stringify(). Nor does it provide any other builtin feature for parsing JSON.
Solution:
Consider utilizing a polyfill to provide JSON functionality such as JSON-js created by Douglas Crockford.
What you'll need to do:
Download the JavaScript file named json2.js from the Github repo and save it in the same location/folder as your .jsx file.
Note You can just copy and paste the raw version of json2.js from the same Github repo to create the json2.js file manually if you prefer.
Then at the top of your current .jsx file you'll need to #include the json2.js file by adding the following line of code:
#include "json2.js";
This is analogous to how you might utilize the import statement to include a module in modern day JavaScript (ES6).
A pathname to the location of the json2.js can be provided if you decide to save the file in a different location/folder than your .jsx file.
By including json2.js in your .jsx file you'll now have working JSON methods; JSON.parse() and JSON.stringify().
Example:
The following ExtendScript (.jsx) is a working example that generates JSON to indicate all the links associated with the current InDesign document (.indd).
example.jsx
#include "json2.js";
$.level=0;
var doc = app.activeDocument;
/**
* Loads the AdobeXMPScript library.
* #returns {Boolean} True if the library loaded successfully, otherwise false.
*/
function loadXMPLibrary() {
if (!ExternalObject.AdobeXMPScript) {
try {
ExternalObject.AdobeXMPScript = new ExternalObject('lib:AdobeXMPScript');
} catch (e) {
alert('Failed loading AdobeXMPScript library\n' + e.message, 'Error', true);
return false;
}
}
return true;
}
/**
* Obtains the values f XMP properties for `DocumentID` and `instanceID` in
* each linked file associated with an InDesign document (.indd). A returns the
* information formatted as JSON,
* #param {Object} doc - A reference to the .indd to check.
* #returns {String} - The information formatted as JSON.
*/
function getLinksInfoAsJson(doc) {
var linksInfObj = {};
linksInfObj['indd-name'] = doc.name;
linksInfObj.location = doc.filePath.fsName;
linksInfObj.links = [];
for (var i = 0, len = doc.links.length; i < len; i++) {
var linkFilepath = File(doc.links[i].filePath).fsName;
var linkFileName = doc.links[i].name;
var xmpFile = new XMPFile(linkFilepath, XMPConst.FILE_INDESIGN, XMPConst.OPEN_FOR_READ);
var allXMP = xmpFile.getXMP();
// Retrieve values from external links XMP.
var documentID = allXMP.getProperty(XMPConst.NS_XMP_MM, 'DocumentID', XMPConst.STRING);
var instanceID = allXMP.getProperty(XMPConst.NS_XMP_MM, 'InstanceID', XMPConst.STRING);
// Ensure we produce valid JSON...
// - When `instanceID` or `documentID` values equal `undefined` change to `null`.
// - When `instanceID` or `documentID` exist ensure it's a String.
instanceID = instanceID ? String(instanceID) : null;
documentID = documentID ? String(documentID) : null;
linksInfObj.links.push({
'name': linkFileName,
'path': linkFilepath,
'docId': documentID,
'insId': instanceID
});
}
return JSON.stringify(linksInfObj, null, 2);
}
if (loadXMPLibrary()) {
var linksJson = getLinksInfoAsJson(doc);
$.writeln(linksJson);
}
Output:
Running the script above will log JSON formatted something like the following example to your console:
{
"indd-name": "foobar.indd",
"location": "/path/to/the/document",
"links":[
{
"name": "one.psd",
"path": "/path/to/the/document/links/one.psd",
"docId": "5E3AE91C0E2AD0A57A0318E078A125D6",
"insId": "xmp.iid:0480117407206811AFFD9EEDCD311C32"
},
{
"name": "two.jpg",
"path": "/path/to/the/document/links/two.jpg",
"docId": "EDC4CCF902ED087F654B6AB54C57A833",
"insId": "xmp.iid:FE7F117407206811A61394AAF02B0DD6"
},
{
"name": "three.png",
"path": "/path/to/the/document/links/three.png",
"docId": null,
"insId": null
}
]
}
Sidenote: Modelling your JSON:
You'll have noticed that the JSON output (above) is structured differently to how you were attempting to structure it in your given example. The main difference is that you were using link filenames as property/key names, such as the following example:
Example of a problematic JSON structure
{
"one.psd": {
"docId": "5E3AE91C0E2AD0A57A0318E078A125D6",
"insId": "xmp.iid:0480117407206811AFFD9EEDCD311C32"
},
"two.jpg": {
"docId": "EDC4CCF902ED087F654B6AB54C57A833",
"insId": "xmp.iid:FE7F117407206811A61394AAF02B0DD6"
}
...
}
Producing JSON like this example isn't ideal because if you were to have two links, both with the same name, you would only ever report one of them. You cannot have two properties/keys that have the same name within an Object.
Edit:
As a response to the OP's comment:
Hi RobC, other than using #include 'json2.js', is there any other way to include external js file in the JSX file?
There are a couple of alternative ways as follows:
You could utilize $.evalFile(). For instance replace #include "json2.js"; with the following two lines:
var json2 = File($.fileName).path + "/" + "json2.js";
$.evalFile(json2);
Note: This example assumes json2.js resides in the same folder as your .jsx
Alternatively, if you're wanting to avoid the existence of the additional json2.js file completely. You could add a IIFE (Immediately Invoked Function Expression) at the top of your .jsx file. Then copy and paste the content of the json2.js file into it. For instance:
(function () {
// <-- Paste the content of `json2.js` here.
})();
Note: If code size is a concern then consider minifying the content of json2.js before pasting it into the IIFE.
I apply JavaScript Minifier to JSON-js
then put the result to my script.

Invalid argument error when converting an HTML string to a DOM element

I am trying to convert my HTML string to a DOM element but I am getting a script error: Invalid argument. My Internet Explorer version is 9. Please suggest any solution. Below is the code:
function open(htmlstring)
{
var parser = new DOMParser();
var doc = parser.parseFromString(htmlstring,"text/html");
}
Edit: If you're getting an invalid argument, it might be that the argument you're giving is badly formatted HTML, for this case you can just wrap it around a try, catch, and evaluate for the case where htmlstring is badly formatted.
You can use a DOMParser shim or polyfill.
/*
* DOMParser HTML extension
* 2012-09-04
*
* By Eli Grey, http://eligrey.com
* Public domain.
* NO WARRANTY EXPRESSED OR IMPLIED. USE AT YOUR OWN RISK.
*/
/*! #source https://gist.github.com/1129031 */
/*global document, DOMParser*/
(function(DOMParser) {
"use strict";
var
DOMParser_proto = DOMParser.prototype
, real_parseFromString = DOMParser_proto.parseFromString
;
// Firefox/Opera/IE throw errors on unsupported types
try {
// WebKit returns null on unsupported types
if ((new DOMParser).parseFromString("", "text/html")) {
// text/html parsing is natively supported
return;
}
} catch (ex) {}
DOMParser_proto.parseFromString = function(markup, type) {
if (/^\s*text\/html\s*(?:;|$)/i.test(type)) {
var
doc = document.implementation.createHTMLDocument("")
;
if (markup.toLowerCase().indexOf('<!doctype') > -1) {
doc.documentElement.innerHTML = markup;
}
else {
doc.body.innerHTML = markup;
}
return doc;
} else {
return real_parseFromString.apply(this, arguments);
}
};
}(DOMParser));
from: https://gist.github.com/eligrey/1129031
or just look for polyfills on google for internet explorer 9. There is even a complete ES6.js polyfill, the reason it doesn't work in IE9, is because such implementation is just very old, so many modern things don't work there.

OR operator in extendscript

I'm creating some scripts voor InDesign to speed up the process.
I have created a script where a certain line, I think, should work but InDesign disagrees.
It fails on ("Geen"||"None"); in the following
app.changeGrepPreferences.appliedCharacterStyle = myDoc.characterStyles.item("[Geen]"||"[None]");
I expect it to change to a characterStyle [Geen] or [None]. Depending on what is available in the predefined character styles.
What am I doing wrong? This seems kinda basic.
Unfortunately is not that easy. If you use doc.characterStyles.item('foo') it still will give you an [object CharacterStyle]. Even tough it does not exsist.
var doc = app.activeDocument;
$.writeln(doc.characterStyles.item('foo'));
// writes [object CharacterStyle] into the console
What you can do is use a try{}catch(error){} block and ask for the name property of that object. In that case InDesign will throw an error that you can catch. Then you can fall back to the default character style [None]
var doc = app.activeDocument;
try{
$.writeln(doc.characterStyles.item('foo').name);
}catch(e) {
$.writeln(e);
$.writeln(doc.characterStyles.item('[None]').name);
}
Edit: As mentioned by mdomino. You can use the isValid property.
var doc = app.activeDocument;
if(doc.characterStyles.item('foo').isValid === true) {
$.writeln('doc.characterStyles.item(\'foo\') exists');
} else {
$.writeln('use doc.characterStyles.item(\'[None]\') because ');
var defaultStyle = doc.characterStyles.item('[None]');
$.writeln(defaultStyle.name + ' is ' + defaultStyle.isValid);
}

How to get the text value of an XML node in IE8?

So I have an XML feed that returns a bunch of results. First I create an XML parser as outlined in faino's answer here.
The XML parses just fine. Every result looks like this:
<result>
<title>some title</title>
<bid>0.05123</bid>
<description>some desc</description>
</result>
So I have:
// parse
var xmlParser = returnXMLParser();
var resultsDoc = xmlParser(adXML.responseData); // #document
var listings = resultsDoc.getElementsByTagName('listing'); // returns 8-10
// get title node
var title = listings[0].getElementsByTagName('title')[0];
title.nodeType // 1
title.nodeName // "title"
Here's the problem though, I have tried every property imaginable to get the inner text: textConent, innerText, innerHTML, nodeValue - none of them seem to work in IE8.
The same script works perfectly fine in Chrome / FF using .textContent
Help!
A cross-browser backwards-compatible script:
function getXMLContent(obj,action)
{
//cross-browser get and set for xmlContent
if (obj)
{
if (action == "get") //get
{
if(obj.textContent)
{
return obj.textContent;
}
else
{
return obj.text;
}
}
else //set
{
if(obj.textContent)
{
obj.textContent = action;
}
else
{
obj.text = action;
}
}
}
else
{
throw new Error("XML-Element doesn't exist.");
}
}
In my own AJAX-calls where I retrieve XML I always use this function to retrieve to content of the node.

JavaScript DOMParser access innerHTML and other properties

I am using the following code to parse a string into DOM:
var doc = new DOMParser().parseFromString(string, 'text/xml');
Where string is something like <!DOCTYPE html><html><head></head><body>content</body></html>.
typeof doc gives me object. If I do something like doc.querySelector('body') I get a DOM object back. But if I try to access any properties, like you normally can, it gives me undefined:
doc.querySelector('body').innerHTML; // undefined
The same goes for other properties, e.g. id. The attribute retrieval on the other hand goes fine doc.querySelector('body').getAttribute('id');.
Is there a magic function to have access to those properties?
Your current method fails, because HTML properties are not defined for the given XML document. If you supply the text/html MIME-type, the method should work.
var string = '<!DOCTYPE html><html><head></head><body>content</body></html>';
var doc = new DOMParser().parseFromString(string, 'text/html');
doc.body.innerHTML; // or doc.querySelector('body').innerHTML
// ^ Returns "content"
The code below enables the text/html MIME-type for browsers which do not natively support it yet. Is retrieved from the Mozilla Developer Network:
/*
* DOMParser HTML extension
* 2012-02-02
*
* By Eli Grey, http://eligrey.com
* Public domain.
* NO WARRANTY EXPRESSED OR IMPLIED. USE AT YOUR OWN RISK.
*/
/*! #source https://gist.github.com/1129031 */
/*global document, DOMParser*/
(function(DOMParser) {
"use strict";
var DOMParser_proto = DOMParser.prototype
, real_parseFromString = DOMParser_proto.parseFromString;
// Firefox/Opera/IE throw errors on unsupported types
try {
// WebKit returns null on unsupported types
if ((new DOMParser).parseFromString("", "text/html")) {
// text/html parsing is natively supported
return;
}
} catch (ex) {}
DOMParser_proto.parseFromString = function(markup, type) {
if (/^\s*text\/html\s*(?:;|$)/i.test(type)) {
var doc = document.implementation.createHTMLDocument("")
, doc_elt = doc.documentElement
, first_elt;
doc_elt.innerHTML = markup;
first_elt = doc_elt.firstElementChild;
if (doc_elt.childElementCount === 1
&& first_elt.localName.toLowerCase() === "html") {
doc.replaceChild(first_elt, doc_elt);
}
return doc;
} else {
return real_parseFromString.apply(this, arguments);
}
};
}(DOMParser));
Try something like this:
const fragment = document.createRange().createContextualFragment(html);
whereas html is the string you want to convert.
Use element.getAttribute(attributeName) for XML/HTML elements

Categories