A bookmarklet-friendly Javascript that grabs source? - javascript

So,
I'm trying to build a simple bookmarklet that does a whole bunch of stuff based upon the source code (Which, itself, contains javascript.
Essentially, it's taking a number of bits of data from source which it grabs and and finds using regex queries and then manipulates.
I've got everything beyond the grabbing the source code... I just need some help figuring out the source bit.
So, what do I need to do to take the source code of the page I'm currently

document.documentElement.innerHTML will get you everything except the <html> tag itself and the <doctype>. But, this may not be the actual source code, as the html may have changed by some script. It may be better to get the source code via Ajax:
var xhr = new XMLHttpRequest();
xhr.open("GET", location.href, false);
xhr.send();
var source = xhr.responseText;

Once you get the object (with something like document.getElementById()), you can try using .innerHTML
For example
<html>
<head>
<title>Demo</title>
</head>
<body>
<div id="box">I want the code for this <span>html</span></div>
</body>
</html>
The javascript would run something like this
var data=document.getElementById('box').innerHTML;
Here's a demo in JSFiddle:
http://jsfiddle.net/LW2VH/

Related

Extend HTML file with script and override/extend some section tags

There is open source (client side) which I can use to extend HTML,
for example I need to add scripts to it or change some of the src values and add additional tags, etc.
I found the following: https://www.npmjs.com/package/gulp-html-extend
but I'm not sure if I can use it in the client (we don't use gulp in our project) By client I mean for example to use it in jsFiddle.
The input should be HTML content with some object/json with the new content and the output should be extended HTML.
If there is no open source , and I need to develop it myself, is there is some guide line I should follow from good design aspects?
UPDATE:
For example if I've the following HTML doc as JS input variable
THIS IS THE INPUT WHICH I GOT AS STRING
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta charset="UTF-8">
<title>td</title>
<script id="test-ui-bootstrap"
src="resources/test-ui-core.js"
data-test-ui-libs="test.m"
data-test-ui-xx-bindingSyntax="complex"
data-test-ui-resourceroots='{"tdrun": "./"}'>
</script>
<link rel="stylesheet" type="text/css" href="css/style.css">
<script>
test.ui.get().attachInit(function() {
});
</script>
</head>
<body class="testUiBody" id="content">
</body>
</html>
For example I need the following:
1.
I want to add additional script (e.g. with alert inside) after
<script id="test-ui-bootstrap" ....
if there is in the file script with id "test-ui-bootstrap"
I want to add immediately after this script another script e.g.
script with alert inside
2.
To add additional property inside the first script(with id id="test-ui-bootstrap") after the last script...
data-test-ui-libs="test.m"
To add
data-test-ui-libs123 ="test.bbb"
3.
If I want to modify the value of existing property e.g. change
src="resources/test-ui-core.js"
to
src="resources/aaaa/test-ui-core.js"
I got string with HTML and I need to create new string with the modified HTML I can I do it right with nice way?
UPDATE 2
THIS IS THE OUTPUT AFTER THE HTML WAS CHANGED
<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<meta charset="UTF-8">
<title>td</title>
<script id="test-ui-bootstrap"
src="resources/aaaa/test-ui-core.js"
data-test-ui-libs="test.m"
data-test-ui-libs123 ="test.bbb"
data-test-ui-xx-bindingSyntax="complex"
data-test-ui-resourceroots='{"tdrun": "./"}'>
</script>
<script>
alert("test)
</script>
<link rel="stylesheet" type="text/css" href="css/style.css">
<script>
test.ui.get().attachInit(function() {
});
</script>
</head>
<body class="testUiBody" id="content">
</body>
</html>
You can create a sandboxed element outside of the DOM, then insert your HTML into it.
var sandbox = document.createElement('div');
sandbox.innerHTML = yourHTMLString;
The browser will parse your HTML, then you'll be able to traverse/modify it with the DOM APIs.
You can use it to find elements and add attributes.
var script = sandbox.querySelectorAll('#test-ui-bootstrap');
script.setAttribute('data-test-ui-libs', 'test.m');
script.setAttribute('src', 'resources/aaaa/test-ui-core.js');
Or insert new elements after existing ones.
var newScript = document.createElement('script');
newScript.innerText = 'your script contents';
script.parentNode.insertBefore(newScript, script.nextSibling);
As soon as you're ready to work with it as a string again, you can read it out as a property.
var html = sandbox.innerHTML;
Note. Different browsers handle the innerHTML mechanism differently and you might find that they strip the <body> and <head> tags when you insert your HTML into your sandbox.
If this is the case then you can workaround it with a hack.
var escapedTags = yourHTMLString
.replace(/body/ig, 'body$')
.replace(/head/ig, 'head$')
// now the browser won't recognize the tags
// and therefore won't strip them out.
sandbox.innerHTML = escapedTags;
// do some work
// ...
// don't forget to unescape them!
var unescapedTags = sandbox.innerHTML
.replace(/body\$/g, 'body')
.replace(/head\$/g, 'head');
This makes use of the fact that the browser won't understand what a <body$> or a <head$> tag is, so it just leaves in intact.
You can use:
DOMParser and XMLSerializer.
The most important thing is; this is not a sandbox. It only uses a parser & serializer; and therefore it will not execute the scripts within the input; until you inject the output into an actual DOM.
// HTML string to be modified
var strHTML = '<html>...</html>'; // your HTML
// We'll parse this string into DOM in memory.
var parser = new DOMParser(),
doc = parser.parseFromString(strHTML, 'text/html'),
// in this example, we'll get the script elements and change/set
// some attributes of the first and the content of the second
scripts = doc.getElementsByTagName('script');
scripts[0].setAttribute('data-test-ui-libs123', 'test.bbb');
scripts[0].setAttribute('src', 'resources/aaaa/test-ui-core.js');
scripts[1].innerHTML = 'alert("test")';
// now that we've modified the HTML, we can serialize it into string
var serializer = new XMLSerializer(),
outputHTML = serializer.serializeToString(doc);
Example Pen.
DOMParser and XMLSerializer on MDN.
Browser support: IE10+ and modern browsers.
jQuery.parseHTML()
The document.implementation.createHTMLDocument() API also does not execute scripts or fetch resources via HTTP (such as videos, images, etc). This is the approach used by jQuery.parseHTML() method. See source here.
From jQuery docs; security considerations:
Most jQuery APIs that accept HTML strings will run scripts that are included in the HTML. jQuery.parseHTML does not run scripts in the parsed HTML unless keepScripts is explicitly true. However, it is still possible in most environments to execute scripts indirectly, for example via the attribute. The caller should be aware of this and guard against it by cleaning or escaping any untrusted inputs from sources such as the URL or cookies. For future compatibility, callers should not depend on the ability to run any script content when keepScripts is unspecified or false.
INITIAL (Node.js)
I understand your question as follows: You want to parse a HTML string in a Node.js environment (you mentioned Gulp), extend it and get the resulting string back.
First, you need to parse the string into a structure, on which you can make queries. There are several libraries available to achieve this. Cheerio.js was recommended and explained in a StackOverflow answer. Other solutions are also explaind there. The library provides you then an interface to the DOM of your HTML code. In the example of Cheerio.js, you can access the DOM similarly as in jQuery. The official example of their GitHub page is depicted below. In a similar manner, you can do your logic by selecting the elements and add your content (modify it, etc.). By calling the $.html() function, you get the modified structure back.
var cheerio = require('cheerio'),
$ = cheerio.load('<h2 class="title">Hello world</h2>');
$('h2.title').text('Hello there!');
$('h2').addClass('welcome');
$.html();
// => returns '<h2 class="title welcome">Hello there!</h2>'
If you want to use this logic in a Gulp build process, you need to wrap it into a Gulp plugin with Cheerio.js as a dependency. On this official GitHub readme file of Gulp it is explained in detail how you can create a Gulp plugin.
EDIT (Browser)
According to your edited question, I'll add this section about editing the HTML in the browser.
It is very convenient to use jQuery to modify a DOM in the browser. You can also modify a virtual DOM with jQuery. To do that, you just need to create the element but not append it to the real DOM. Unfortunately, the browser acts special when it comes to the following tags: <html>, <body>, <head> and <!DOCTYPE html>. As a workaround, you can just edit those tags with a regular expression and rename them to something like <body_temp> and so on. You need to have a good regular expression to only match tags and not content like class="testUiBody" which does also contain the word body. The special behavior is decribed here in detail.
The following code makes all the desired changes in the HTML. You can test it in an updated JSFiddle. Just click the Submit button and you can see the changes. The upper textarea acts as HTML input and the lower one as HTML output.
var html = "<!DOCTYPE html><html><head><meta.....";
// replace html, head and body tag with html_temp, head_temp and body_temp
html = html.replace(/<!DOCTYPE HTML>/i, '<doctype></doctype>');
html = html.replace(/(<\/?(?:html)|<\/?(?:head)|<\/?(?:body))/ig, '$1_temp');
// wrap the dom into a <container>: the html() function returns only the contents of an element
html = "<container>"+html+"</container>";
// parse the HTML
var element = $(html);
// do your calculations on the parsed html
$("<script>alert(\"test\");<\/script>").insertAfter(element.find('#test-ui-bootstrap'));
element.find("#test-ui-bootstrap").attr('data-test-ui-libs123', "test.bbb");
element.find("#test-ui-bootstrap").attr('src', 'resources/aaaa/test-ui-core.js');
// reset the initial changes (_temp)
var extended_html = element.html();
extended_html = extended_html.replace(/<doctype><\/doctype>/, '<!DOCTYPE HTML>');
extended_html = extended_html.replace(/(<\/?html)_temp/ig, '$1');
extended_html = extended_html.replace(/(<\/?head)_temp/ig, '$1');
extended_html = extended_html.replace(/(<\/?body)_temp/ig, '$1');
// replace all " inside data-something=""
while(extended_html.match(/(<.*?\sdata.*?=".*?)(")(.*?".*?>)/g)) {
extended_html = extended_html.replace(/(<.*?\sdata.*?=".*?)(")(.*?".*?>)/g, "$1'$3");
}
// => extended_html contains now your edited HTML

Javascript changing innerHTML to a different pages content not working [duplicate]

This question already has answers here:
Replace innerhtml with external page
(2 answers)
Closed 7 years ago.
EDIT SOLVED:
Below works great. The suggested link marking this a duplicate does not solve the problem.
changeDrinks.innerHTML = '< object type="text/html" data="drinks.html" > </object > ";
What is wrong with my javascript below?
JS :
var changeDrinks = document.querySelector("#menuDropWine");
changeDrinks.innerHTML = 'drinks.html';
I wanted this to change the content of a div to drinks.html webpage. The div's content that is being changed looks like this...
<div id="menuDropWine" class="divBtn">
I've read a couple questions on here already about changing the content of a div. Some used ajax, others used jQuery, but I feel I should be able to just use innerHTML equals a link. Currently this is just changing my div's content to the literal text output of 'drinks.html'. I hope I'm just missing an a href reference or something, as I feel this solution should be simple. The only real reason I want to put this into my website is so it cleans the looks of my index.html to not have so much text content, by storing the text content in different links that just load on a click.
Thanks for your time.
You are misunderstanding the function of innerHTML; it accepts a string of HTML, creates the required DOM structure and injects it into the the target element. The easiest way to achieve what you want is with an iframe, setting its src property to 'drinks.html'.
<iframe src="drinks.html" frameborder="0">Your browser doesn't support iframes</iframe>
Assuming that drinks.html is a full HTML document (has an <html>, <head> and <body> tag), this is really the best route. If drinks.html is a partial HTML document then you could look into using AJAX. I would suggest using jQuery and then you could easily do something like:
$('#menuDropWine').load('drinks.html');
The short answer to your question is that the element will be filled with the text drinks.html, which is not incorrect, but obviously not what you had in mind.
JavaScript is not psychic, so it doesn’t know whether you want to change the content or load from an external document, so this behaviour is perfectly natural.
If you want a simple solution, use an iframe, whose src attribute can be set in JavaScript as follows:
var changeDrinks = document.getElementById("menuDropWine");
changeDrinks.src = 'drinks.html';
BTW note that if you know you are using an id, it is more efficient to use document.getElementById.
If you really want to use innerHTML, as the others have commented, you will have to use Ajax. A simple example, without jQuery, follows:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Test</title>
<script type="text/javascript">
window.onload=init;
function init() {
var test=document.getElementById('test');
var url='test.html';
load(test,url);
}
function load(element,url) {
var xhr=new XMLHttpRequest();
xhr.open('get', url, true);
xhr.send(null);
xhr.onreadystatechange=function() {
if (this.readyState==4) {
element.innerHTML=this.responseText;
}
}
}
</script>
</head>
<body>
<h1>Test</h1>
<div id="test">Hello</div>
</body>
</html>
Here, test.html is expected to have your new content.

Display JSON in Javascript from Python Bottle

I am trying to access a mongodb record within a javascript function to display the document on a webpage. Using the Bottle framework with pymongo, I have tried to first encode the mongodb document as a JSON object to pass to my javascript function.
#bottle.route('/view/<_id>', method = 'GET')
def show_invoice(_id):
client = pymongo.MongoClient("mongodb://localhost")
db = client.orders
collection = db.myorders
from bson.objectid import ObjectId
result = collection.find_one({'_id': ObjectId(_id)})
temp = json.dumps(result,default=json_util.default)
print "temp: " + temp
return bottle.template('invoice', rows = temp)
When I try to display the document within my HTML page with the javascript function, nothing happens. However, when I call the variable, rows, that I am trying to pass as {{rows}} within the body of the HTML it does display. It seems it is only the JS function that does not display anything.
<!DOCTYPE html>
<html>
<head>
<head>
<title>Invoice Report</title>
<script type="text/javascript">
function fillTable()
{
var obj = {{rows}};
document.write(obj);
}
</script>
</head>
</head>
<body onload="fillTable()">
<div class="invoice">
</div>
<h4>Rows from body</h4> {{rows}}
</body>
</html>
I tried to use jQuery to deserialize the JSON object rows with the function
jQuery.parseJSON(rows);
and even as
jQuery.parseJSON({{rows}});
I also tried to make the variable unescaped everywhere possible as {{!rows}}
So does anybody see what I am doing wrong? How do I take a mongodb document with pymongo, and use bottle to display it on a webpage? I realize that similar questions have been asked, but I can't seem to get anything I have found to work in my particular situation.
The issue isn't with bottle rendering your json, it's with using document.write().
Open a new tab in your browser, and point it to the url: 'about:blank'. This will give you a blank webpage. Now, right click and open your developer tools. Try running document.write('Stuff'); from that context. You shouldn't see any changes to the page.
Instead try:
var body = document.getElementsByTagName("body")[0];
body.innerHTML = "Stuff";
and note the difference.
There are of course, many other ways to achieve this effect, but this is the simplest without any requirements on external javascript libraries.
You can't have both an 'src' attribute and javascript code in the same tag. Place the fillTable function within a new script tag.

How do I get Aptana/Firefox to execute my JavaScript rather than show the code?

I'm extremely new to coding and I'm reading a book on it. And I think I have the basics down on this little test project I'm doing, but whenever I test the page I just see the code I used. Here's the entirety of my code.
<script type = "text/javascript">;
//<![CDATA[
// from concat.html
var person = "" ;
person = prompt( "What is your name?") ;
alert("Hi there, ") + person + "!");
//]]>
</script>
Honestly I don't know what the CDATA is for or what concat.html is.
How can I get Firefox to run my JavaScript rather than just show the code?
Try wrapping it in <html> to make the whole page get treated as HTML. Does the file have a .js extention, by any chance?
CDATA is to distinguish code from markup.
Put it in an HTML file.
So, first, save it as scriptname.html - you're embedding JavaScript within an HTML file.
Next, make it valid html - add <html> to the top and </html> to the bottom. And <head> and <body> tags where appropriate - if you don't know what those are, head over to any HTML site to look them up (www.diveintohtml5.org is nice, if you can follow it.)
Better install Firebug plugin for Firefox or use other browser's Javascript console. It will allow you to run your code
http://www.w3resource.com/web-development-tools/execute-JavaScript-on-the-fly-with-Firebug.php

Extract JavaScript Generated Links in Delphi

I'm working on a piece of code in which I require to extract all links from a particular web page. I use the component EmbeddedWB because I need to show the current page as well. I have a simple page that is loaded into the EmbeddedWB and contains some scripts that generates some URLs using the function "document.write" of JavaScript. Theoretically I have something like this:
<html>
<body>
<a href=#>No problem Here<a/>
<script Language="JavaScript">
var random=Math.floor(Math.random()*11);
document.write(" I Can’t catch this link! ");
</script>
</body>
</html>
By using the function ViewPageLinksToStrings (LinksList: TStrings) of the component I get as expected the URL’s found in the source code, but my intention is to catch the links that are generated with JavaScript too.
What would be the best way to do this? There is any library I can use?
Thank you for your time. John Marko
It looks like EmbeddedWB supports Javascript and I found this article in the forum. I contains code which reads the full (Javascript-generated) DOM tree into a variable of type IHTMLDocument2, which is simplified here:
procedure MyProcedure(Sender: TObject);
var
Doc: IHTMLDocument2;
begin
EmbeddedWB1.Navigate('... some url ...');
while EmbeddedWB1.ReadyState < READYSTATE_INTERACTIVE do
Application.ProcessMessages;
Doc := EmbeddedWB1.Document as IHTMLDocument2;
...

Categories