Standard way of parsing HTML - javascript

I'm working on a project for my university in which i need to parse an html string into a document and add the children elements in the existing page. I have to use Javascript 1.6 with DOM Level
3 and the project must work with Firefox 32.0 and Chrome 37.0.2062.120, pretty old browsers. The problem is i have to use only standard methods and properties so i can't use innerHTML. These are my attempts so far:
I managed to parse html using the DOMParser object but i'm not sure if i'm allowed to use that (i found this document but it's not clear to me if DOMParser is standard or not), if this was the case it looks like the best option to me.
I also tried to parse html using:
var doc = document.implementation.createHTMLDocument(title);
doc.open();
doc.writeln(html);
doc.close();
The problem with this method is that it doesn't work with the version of mozilla i need to use. I also tried to use the document inside a dummy iframe pointing to about:blank but chrome then prevents me (for security reasons i believe) to add event handlers to any of the elements coming from that document, and i need to do that.

Related

Check where/which line or document a script gets called

I'm having a JavaScript debugging question. I would like to know, how it would be possible to find out in which file/line a new script is loaded and called. My website has several scripts which are appended via document.write(), and I would like to find a way to find the function call in all attached scripts of the website.
I would prefer either Firebug or Chrome Dev tools.
Thanks!
Neither Firebug nor the Firefox or the Chrome DevTools currently allow to debug code inserted via document.write(). I created bug 1122222 for the Firefox DevTools and issue 449269 for the Chrome DevTools requesting to be able to debug such scripts. As upcoming Firebug versions will be based on the Firefox DevTools, it will offer this feature once the Firefox bug is fixed, so there's no need to create a separate issue for it.
Until the above bugs are fixed you need to use another method to inject your script in order to be able to debug it within the browser.
Method 1: using eval()
You can use the eval() function to evaluate arbitrary code dynamically. Note that the eval() only evaluates JavaScript code, it must not be surrounded by any HTML.
Example:
eval("console.log('Hi!')");
Method 2: injecting a <script> tag
You can add a <script> tag to the page and then add contents to it.
Example:
var script = document.createElement("script");
script.textContent = "console.log('Hi!');";
document.body.appendChild(script);
Method 3: using new Function()
You can create a new function via the Function constructor.
Example:
(new Function("console.log('Hi!');"))();
Note that JavaScript won't be executed using innerHTML or insertAdjacentHTML() due to security reasons.

Parse HTML retrieved via AJAX in JavaScript

I'm attempting to write some JavaScript code (in particular, a Chrome extension) which does the following:
Retrieve some web page's contents via AJAX.
Get some content from that page by locating certain elements inside of the HTML string and getting their contents.
Do a thing with that data.
I have 1) and 3) working, but I'm having some trouble achieving step 2) in a reasonable way.
I currently have 2) implemented via jQuery(htmlString) and then using normal jQuery selectors and etc. to extract the data I want. The problem is that jQuery actually adds the retrieved HTML to the current page, loading and executing all external resources / scripts in the process. This is obviously bad.
So I'm looking for a way to get the text and HTML in certain tags inside my HTML string without:
Loading or executing ANY scripts or resources (images, CSS, etc.) referenced in the HTML string.
Trying to remove external resources with regular expressions, since we all know what happens when you parse [X]HTML with regex.
I believe that I can achieve what I want using jsdom and jQuery, since jsdom has a FetchExternalResources option which can be set to false. However, jsdom seems to only work in NodeJS, not in the browser.
Is there any reasonable way to do this?
You could use document.implementation.createHTMLDocument
This is an experimental technology
Because this technology's
specification has not stabilized, check the compatibility table for
the proper prefixes to use in various browsers. Also note that the
syntax and behavior of an experimental technology is subject to change
in future versions of browsers as the spec changes
Feature Chrome Firefox (Gecko) Internet Explorer Opera Safari
Basic support (Yes) 4.0 (2.0) [1] 9.0 (Yes) (Yes)
[1] The title parameter has only been made option in Firefox 23.
Javascript
$.ajax("http://www.html5rocks.com/en/tutorials/").done(function (htmlString) {
var doc = document.implementation.createHTMLDocument("");
doc.write(htmlString);
console.log(doc.getElementById('siteheader').textContent);
});
On jsFiddle
You can also take a look at DOMParser and XMLHttpRequest
Example using XMLHttpRequest
XMLHttpRequest originally supported only XML parsing. HTML parsing
support is a recent addition.
Feature Chrome Firefox (Gecko) Internet Explorer Opera Safari (WebKit)
Support 18 11 10 --- Not supported
Javascript
var xhr = new XMLHttpRequest();
xhr.onload = function () {
console.log(this.responseXML.getElementById('siteheader').textContent);
};
xhr.open("GET", "http://www.html5rocks.com/en/tutorials/");
xhr.responseType = "document";
xhr.send();
On jsFiddle

Automatically eval a local script on a specific URI in Firefox

I want to improve an existing website (I have no access to) by using my own javascript.
This means I have to add an < script > to the < head >. I am currently doing this by clicking on a bookmark which has something like this as its target:
javascript: if(document.createElement){
void(head=document.getElementsByTagName('head').item(0));
void(script=document.createElement('script'));
void(script.src='http://local/script.js');
void(script.type='text/javascript');
void(head.appendChild(script));
} // i added some spacer to make it readable
This works fine, but since i've got a lot of different scripts it gets complicated to organize them.
I am now looking for a way to automatically insert a defined script-uri if the top.location.href matches a given string.
Is there a way firefox can do something like this - maybe with the help of an add-on?
Userscripts are what you're looking for. Use Greasemonkey for Firefox to run your own script on specific sites.
https://addons.mozilla.org/en-US/developers/docs/sdk/1.12/modules/sdk/page-mod.html uses the add-on sdk, and provides precisely what you're looking for (i.e. an easy way to attach a script to a webpage if it matches a given domain). That gives you the option of making your addon a standalone one easily; moreover, if you were to extend its functionality significantly, the addon sdk would certainly be the way to go!

IE8 / JavaScript: Override native implementation of document.all.item?

I know this is crazy, but IE can drive one to do crazy things.
Here's the deal: we have a SharePoint 2007 site with Content Editor Web Parts. We have a contingent of users with Internet Explorer 8. We want to keep the site in IE8/IE8 Standards modes for better rendering of content. However, this configuration breaks the ability to open the Rich Text Editor window from the C.E. web part.
If we force IE8 into IE7 document mode or quirks mode, it works. Indeed, other sources online have suggested doing just that to fix the problem. But we'd really rather have everything run in standards mode.
Through some debugging, we found the source of the problem to be the use of document.all.index("<web_part_id>") JavaScript when retrieving a web part object on the page. In IE8 standards, this returns an object with most of the properties either empty, null or undefined; most notably, the id property is not set. If you were to use document.getElementById to retrieve the same ID, you get back a fully populated object. Likewise if IE8 is not in standards mode, you get back a mostly (but not completely) populated object -- but populated enough to avoid the script error, anyway.
All this code looks like it is injected dynamically into the SharePoint page, so that rules out simply replacing references to document.all. However, we got the crazy idea of redefining the document.all.item method to actually invoke document.getElementById. Our attempts so far to do this don't work, so perhaps someone might shed some light on what we're doing wrong (okay, there's lots wrong about this, but it's IE and SharePoint, right?).
Our initial attempt at an override looks like this:
<script type="text/javascript">
document.all.item = function(id) { return document.getElementById(id); }
</script>
This code is in the document HEAD, above any other script references (SharePoint or otherwise), but it does not appear to supersede the native code.
Thoughts, ideas, suggestions and criticisms welcome!
document.all is a HTMLCollection, so you may use HTMLCollection.prototype to change the behaviour:
if(document.all
&& !window.opera
&& typeof HTMLDocument=='object'
&& typeof HTMLCollection=='object')//filter IE8
{
HTMLCollection.prototype.item=
function(id)
{
return((this==document.all)
? document.getElementById(id)//document.all
: this[id]//maintain native behaviour for other collections
);
}
}
So you're saying you can't change the references to document.all.item because SP directly injects it to the html? So couldn't you use the DOM to replace it?
BTW, I feel your SharePoint pain. I've built SP workflows, and those are enough of a pain for me!

running an XPath expression SVG inside an HTML with javascript

I'm trying to run an xpath-expression over an svg which is embedded in html. I just cannot figure out how to set up the parameters. I want find elements that have an arbitary attribute from a given namespace. I use the following xpath expression:
var xpathexp = "//*[#*[namespace-uri()='"+this.typo7namespace+"']]";
I tested this expression and it worked as expected.
this is the code to find the result set:
var result = this.svgdocument.contentDocument.evaluate( xpathexp, this.svgdocument.documentElement, null, XPathResult.ANY_TYPE, null );
Could anybody tell me, or post a link to a tutorial, how to deal with the namspaces, the namespace resolvers??
Greetings...
Here's the Mozilla tutorial on using XPath:
https://developer.mozilla.org/en/Introduction_to_using_XPath_in_JavaScript
Here's one on writing custom namespace resolvers:
https://developer.mozilla.org/en/Introduction_to_using_XPath_in_JavaScript#Implementing_a_User_Defined_Namespace_Resolver
I found these interfaces to be rather clunky, though, so I wrote an abstraction layer that would take the xpath string and a context node, and would return a regular js array. It works inside the browser and embedded in Java under Mozilla Rhino:
https://svn.apache.org/repos/asf/commons/sandbox/gsoc/2010/scxml-js/trunk/src/javascript/scxml/cgf/util/xpath.js
All of the above should work in all browsers except for IE6-9.
IE6-8 does not support SVG natively, so this should be less important to your question. For completeness, though, here's a good article describing XPath support in earlier IE8, including support for resolving namespaces:
http://www.nczonline.net/blog/2009/04/04/xpath-in-javascript-part-3/
Apparently, IE9 also does not include support for XPath in the browser, which is more problematic, as it does support SVG natively. Probably the best approach here is to use ActiveX to work with MSXML APIs:
IE9 selectSingleNode missing from beta, how to overcome this in JavaScript?

Categories