Chrome extension read innerHTML of the current page? - javascript

Hi this may be a silly question, but I can't find the answer anywhere.
I'm writing a chrome extension, all I need is to read in the html of the current page so I can extract some data from it.
here's what I have so far:
<script>
window.addEventListener("load", windowLoaded, false);
function windowLoaded() {
alert(document.innerHTML)
});
}
</script>
Can anybody tell me what I'm doing wrong?
thanks,

function windowLoaded() {
alert('<html>' + document.documentElement.innerHTML + '</html>');
}
addEventListener("load", windowLoaded, false);
Notice how windowLoaded is created before it is used, not after, which won't work.
Also notice how I am getting the innerHTML of document.documentElement, which is the html tag, then adding the html source tags around it.

I'm writing a chrome extension, all I need is to read in the html of
the current page so I can extract some data from it.
I think an important answer here is not the correct code to use to alert the innerHTML but how to get the data you need from what's already been rendered.
As pimvdb pointed out, your code isn't working because of a typo and needing document.documentElement.innerHTML, something you can diagnose in the Chrome console (Ctrl+Shift+I). But that's secondary to why you'd want the inner HTML in the first place. Whether you're looking for a certain node, specific text, how many <div> elements exist, the value of an ID, etc., I'd heavily recommend the use of a library like jQuery (vanilla JS works, but it can be verbose and unwieldy). Instead of reading in all the HTML and parsing it with string functions or regex, you probably want to take advantage of all the DOM parsing functionality already available to you.
In other words, something like this:
$("#some_id").val(); // jQuery
document.getElementById("some_id").value; // vanilla JS
is probably way safer, easier and more readable than something eminently breakable like this (probably a bit off here, but just to make a point):
innerHTML.match(/<[^>]+id="some_id"[^>]+value="(.*?)"[^>]*?>/i)[1];

Use document.documentElement.outerHTML. (Note that this is not supported in Firefox; irrelevant in your case.) However, this is still not perfect as it doesn't return nodes outside the root element (!doctype and possibly some comments or processing instructions). The document.innerHTML property is, AFAIK, specified in HTML5 specification, but currently not supported in any browser.
Just FYI, navigating to view-source:www.example.com also displays the entire markup (Chrome & Firefox). But I don't know whether you can work with it somehow.

window.addEventListener("load", windowLoaded, false);
function windowLoaded() {
alert(document.documentElement.innerHTML);
}
You had a } with no purpose, and the }); should just be }. These are syntax errors.
Also, it's document.documentElement.innerHTML, since it's not a property of document.

Related

Is my code running jQuery?

The codebase I've inherited is full of $() methods and $.() methods.
eg.
$("#buttona").show();
$("#buttonb").show();
or
bComment = $.trim($("#aComment" + i).val().replace(/"/g, "'").replace(/\,/g, " ").replace(/&/g, "&").replace(/</g, "<").replace(/>/g, ">"));
Which leads me to believe that this is jQuery.
For example that trim method, appears to be a jquery method (takes a string as a parameter) rather than a core javascript method.
However, I can't see any reference to the jQuery source in the code base. I did a search for 'jquery' over the entire codebase, and couldn't find any src = jquery...js references- as said is necessary here..
How is it that this code could be otherwise running jQuery?
I'm answering this one, since Arun P Johny isn't answering.
Yes you are right. That's jquery.
You can check for its version by doing $.fn.jquery or jQuery.fn.jquery. This will return version.
The reason you didn't see a src = jquery..js might be because the script references may have been dynamically assigned. You can see which scripts were actually referenced by looking at the how the script was resolved in your browser (by pressing F12).
But Arun.P.Johny answered it first ;)

Is it possible to get jquery objects from an html string thats not in the DOM?

For example in javascript code running on the page we have something like:
var data = '<html>\n <body>\n I want this text ...\n </body>\n</html>';
I'd like to use and at least know if its possible to get the text in the body of that html string without throwing the whole html string into the DOM and selecting from there.
First, it's a string:
var arbitrary = '<html><body>\nSomething<p>This</p>...</body></html>';
Now jQuery turns it into an unattached DOM fragment, applying its internal .clean() method to strip away things like the extra <html>, <body>, etc.
var $frag = $( arbitrary );
You can manipulate this with jQuery functions, even if it's still a fragment:
alert( $frag.filter('p').get() ); // says "<p>This</p>"
Or of course just get the text content as in your question:
alert( $frag.text() ); // includes "This" in my contrived example
// along with line breaks and other text, etc
You can also later attach the fragment to the DOM:
$('div#something_real').append( $frag );
Where possible, it's often a good strategy to do complicated manipulation on fragments while they're unattached, and then slip them into the "real" page when you're done.
The correct answer to this question, in this exact phrasing, is NO.
If you write something like var a = $("<div>test</div>"), jQuery will add that div to the DOM, and then construct a jQuery object around it.
If you want to do without bothering the DOM, you will have to parse it yourself. Regular expressions are your friend.
It would be easiest, I think, to put that into the DOM and get it from there, then remove it from the DOM again.
Jquery itself is full of tricks like this. It's adding all sorts off stuff into the DOM all the time, including when you build something using $('<p>some html</p>'). So if you went down that road you'd still effectively be placing stuff into the DOM then removing it again, temporarily, except that it'd be Jquery doing it.
John Resig (jQuery author) created a pure JS HTML parser that you might find useful. An example from that page:
var dom = HTMLtoDOM("<p>Data: <input disabled>");
dom.getElementsByTagName("body").length == 1
dom.getElementsByTagName("p").length == 1
Buuuut... This question contains a constraint that I think you need to be more critical of. Rather than working around a hard-coded HTML string in a JS variable, can you not reconsider why it's that way in the first place? WHAT is that hard-coded string used for?
If it's just sitting there in the script, re-write it as a proper object.
If it's the response from an AJAX call, there is a perfectly good jQuery AJAX API already there. (Added: although jQuery just returns it as a string without any ability to parse it, so I guess you're back to square one there.)
Before throwing it in the DOM that is just a plain string.
You can sure use REGEX.

How safe is it use document.body.innerHTML.replace?

Is running something like:
document.body.innerHTML = document.body.innerHTML.replace('old value', 'new value')
dangerous?
I'm worried that maybe some browsers might screw up the whole page, and since this is JS code that will be placed on sites out of my control, who might get visited by who knows what browsers I'm a little worried.
My goal is only to look for an occurrence of a string in the whole body and replace it.
Definitely potentially dangerous - particularly if your HTML code is complex, or if it's someone else's HTML code (i.e. its a CMS or your creating reusable javascript). Also, it will destroy any eventlisteners you have set on elements on the page.
Find the text-node with XPath, and then do a replace on it directly.
Something like this (not tested at all):
var i=0, ii, matches=xpath('//*[contains(text(),"old value")]/text()');
ii=matches.snapshotLength||matches.length;
for(;i<ii;++i){
var el=matches.snapshotItem(i)||matches[i];
el.wholeText.replace('old value','new value');
}
Where xpath() is a custom cross-browser xpath function along the lines of:
function xpath(str){
if(document.evaluate){
return document.evaluate(str,document,null,6,null);
}else{
return document.selectNodes(str);
}
}
I agree with lucideer, you should find the node containing the text you're looking for, and then do a replace. JS frameworks make this very easy. jQuery for example has the powerful :contains('your text') selector
http://api.jquery.com/contains-selector/
If you want rock solid solution, you should iterate over DOM and find value to replace that way.
However, if 'old value' is a long string that never could be mixed up with tag, attribute or attbibute value you are relatively safe by just doing replace.

This javascript works in every browser EXCEPT for internet explorer!

The webpage is here:
http://develop.macmee.com/testdev/
I'm talking about when you click the ? on the left, it is supposed to open up a box with more content in it. It does that in every browser except IE!
function question()
{
$('.rulesMiddle').load('faq.php?faq=rules_main',function(){//load page into .rulesMiddle
var rulesa = document.getElementById('rulesMiddle').innerHTML;
var rules = rulesa.split('<div class="blockbody">');//split to chop off the top above rules
var rulesT = rules[1].split('<form class="block');//split to chop off below rules
rulesT[0] = rulesT[0].replace('class=','vbclass');//get rid of those nasty vbulletin defined classes
document.getElementById('rulesMiddle').innerHTML = rulesT[0];//readd the content back into the DIV
$('.rulesMain').slideToggle();//display the DIV
$('.rulesMain').center();//center DIV
$('.rulesMain').css('top','20px');//align with top
});
}
IE converts innerHTML contents into upper case, so you probably are not able to split the string this way, as string operations are case sensitive. Check what the contents really looks like by running
alert(rulesa);
Andris is right. And that's not all. It'll also throw away the quotes in attributes.
It is completely unreliable to make any assumptions about the format of the string you get from innerHTML; the browser may output it in a variety of forms — some of which, in IE's case, are not even valid HTML. The chances of you getting back the same string that was originally parsed are very low.
In general: HTML-string-hacking is a shonky waste of time. Modify HTML elements using their node objects instead. You seem to be using jQuery, so you've got loads of utility functions to help you.
In any case you should not be loading the whole HTML page into #rulesMiddle. It includes a load of scripts and stylesheets and other header nonsense that can't go in there. jQuery allows you to pick which part of the document to insert; you seem to just want the first .blockbody element, so pick that:
$('#rulesMiddle').load('faq.php?faq=rules_main .blockbody:first', function(){
$('#rulesMiddle .blockrow').attr('class', '');
$('.rulesMain').slideToggle();
$('.rulesMain').css('top', '20px');
});
My IE debugger throws an error on your script when I click that button. On this line:
var rulesT = rules[1].split('<form class="block');//split to chop off below rules
IE stops processing the Javascript and says '1' is null or not an object
Don't know if you solve it, but it work's on my Ugly IE ... (its an v8)
Btw: It's me, or does pop-up widows wen open are really, really, really slowing down that platform ?

node selection and manipulation out of the dom (What is jQuery's trick ?)

Hi I would like to do dom selection and manipulation out of the dom.
The goal is to build my widget out of the dom and to insert it in the dom only once it is ready.
My issue is that getElementById is not supported on a document fragment. I also tried createElement and cloneNode, but it does not work either.
I am trying to do that in plain js. I am used to do this with jQuery which handles it nicely. I tried to find the trick in jQuery source, but no success so far...
Olivier
I have done something similar, but not sure if it will meet your needs.
Create a "holding area" such as a plain <span id="spanReserve"></span> or <td id="cellReserve"></td>. Then you can do something like this in JS function:
var holdingArea = document.getElementById('spanReserve');
holdingArea.innerHTML = widgetHTMLValue;
jQuery will try to use getElementById first, and if that doesn't work, it'll then search all the DOM elements using getAttribute("id") until it finds the one you need.
For instance, if you built the following DOM structure that isn't attached to the document and it was assigned to the javascript var widget:
<div id="widget">
<p><strong id="target">Hello</strong>, world!</p>
</div>
You could then do the following:
var target;
// Flatten all child elements in the div
all_elements = widget.getElementsByTagName("*");
for(i=0; i < all_elements.length; i++){
if(all_widget_elements[i].getAttribute("id") === "target"){
target = all_widget_elements[i];
break;
}
}
target.innerHTML = "Goodbye";
If you need more than just searching by ID, I'd suggest installing Sizzle rather than duplicating the Sizzle functionality. Assuming you have the ability to install another library.
Hope this helps!
EDIT:
what about something simple along these lines:
DocumentFragment.prototype.getElementById = function(id) {
for(n in this.childNodes){
if(id == n.id){
return n;
}
}
return null;
}
Why not just use jQuery or the selection API in whatever other lib youre using? AFAIK all the major libs support selection on fragments.
If you wan tto skip a larger lib like jQ/Prototype/Dojo/etc.. then you could jsut use Sizzle - its the selector engine that powers jQ and Dojo and its offered as a standalone. If thats out of the question as well then i suppose you could dive in to the Sizzle source and see whats going on. All in all though it seems like alot of effort to avoid a few 100k with the added probaility that the code you come up with is going to be slower runtime wise than all the work pulled into Sizzle or another open source library.
http://sizzlejs.com/
Oh also... i think (guessing) jQ's trick is that elements are not out of the DOM. I could be wrong but i think when you do something like:
$('<div></div>');
Its actually in the DOM document its just not part of the body/head nodes. Could be totally wrong about that though, its just a guess.
So you got me curious haha. I took a look at sizzle.. than answer is - its not using DOM methods. It seems using an algorithm that compares the various DOMNode properties mapped to types of selectors - unless im missing something... which is entirely possible :-)
However as noted below in comments it seems Sizzle DOES NOT work on DocumentFragments... So back to square one :-)
Modern browsers ( read: not IE ) have the querySelector method in Element API. You can use that to get and element by id within a DocumentFragment.
jQuery uses sizzle.js
What it does on DocumentFragments is: deeply loop through all the elements in the fragment checking if an element's attribute( in your case 'id' ) is the one you're looking for. To my knowledge, sizzle.js uses querySelector too, if available, to speed things up.
If you're looking for cross browser compatibility, which you probably are, you will need to write your own method, or check for the querySelector method.
It sounds like you are doing to right things. Not sure why it is not working out.
// if it is an existing element
var node = document.getElementById("footer").cloneNode(true);
// or if it is a new element use
// document.createElement("div");
// Here you would do manipulation of the element, setAttribute, add children, etc.
node.childNodes[1].childNodes[1].setAttribute("style", "color:#F00; font-size:128px");
document.documentElement.appendChild(node)
You really have two tools to work with, html() and using the normal jQuery manipulation operators on an XML document and then insert it in the DOM.
To create a widget, you can use html():
$('#target').html('<div><span>arbitrarily complex JS</span><input type="text" /></div>');
I assume that's not what you want. Therefore, look at the additional behaviors of the jQuery selector: when passed a second parameter, it can be its own XML fragment, and manipulation can happen on those documents. eg.
$('<div />').append('<span>').find('span').text('arbitrarily complex JS'). etc.
All the operators like append, appendTo, wrap, etc. can work on fragments like this, and then they can be inserted into the DOM.
A word of caution, though: jQuery uses the browser's native functions to manipulate this (as far as I can tell), so you do get different behaviors on different browsers. Make sure to well formed XML. I've even had it reject improperly formed HTML fragments. Worst case, though, go back and use string concatenation and the html() method.

Categories