Phantomjs: Modifying html dom before opening it as webpage - javascript

I need to process html files that have corrupted script files that are added to it via tag.
Im planning to remove all script tag present in the webpage via phantomjs.
But on opening the webpage via webpage.open(), phantomjs parse error is thrown since it cannot parse the JS content within the script tag.
Here is an example:
<html>
<head>
<script>
corrupted JS
if(dadadd
;
</script>
<body>
some content
</body>
</html>
Can someone help me on suggesting the right way to clean this webpage using phantomjs ?

It's not (easily) possible. You could download (not through opening the page, but rather making an Ajax request in page.evaluate()) the static html, then change according to your needs, then assign it to page.content.
This still might not work, because as soon as you assign it to page.content, you're saying that PhantomJS should interpret this source as a page from an unknown domain (about:blank). Since the page source contains all kinds of links/scripts/stylesheets without a domain name, you'll have to change those too in order for the page to successfully load all kinds of resources.
It might be easier to just have a proxy between PhantomJS and the internet with a custom rule to adjust the page source to your needs.

Related

Why can't I get my script tag src's to work?

I have been coding up a localhost, and I made the localhost by using of course a JavaScript file to do so, and I then made it reference an HTML file. However, I noticed that when I am using localhost to serve up the HTML file I get this error:
"GET http://localhost:3333/filetesting.js"
The filetesting.js is that js file, there are also other things I'm referencing too, like websites. I'm referencing it by using script tag src.
I looked at the network on developer tools of it and it says it's a 404 error not found. I'm trying to figure out how to reference my script tag src's without having localhost:3333 go before it.
When I run the HTML file without using the localhost, it works just fine when it comes to the script tag src's. If you do not entirely understand what I'm asking for, just ask.
Assuming that your script will always reside in the root level of your website, you can simply target it with the root-relative prefix /:
<script src="/filetesting.js"></script>
This will load your script from the root, regardless of the site the file is hosted on. For example, on http://localhost:3333/ it will load the file from http://localhost:3333/filetesting.js, and from http://localhost:3333/folder/, it will attempt to load the file from the same location.
If you move your files over to a proper website, it will still work the same way: www.example.com will look for the file at www.example.com/filetesting.js, and www.example.com/folder/ will look for the same file at www.example.com/filetesting.js.
Hope this helps! :)

UWP: webview does not display page using navigateToString method

I am trying to use webview element in a universal app using javascript. My aim is to browse some websites adding some content of my own to its html document.
First, I set src attribute of webview to www.example.com and it browses the site. This was just to make sure the webview is capable of browsing the site.
Next, I tried getting the html and load it to webview using navigateToString method like this:
$.get(url, function (data) {
webView.navigateToString(data);
});
This causes the page to be loaded out of shape (aperarently some .js or .css files are not loaded or blocked from running), or it isn't even loaded.
I wonder what is the difference loading the page by its url and loading its html by manually like this. And is there a workaround I can overcome this problem.
Note: I'm new at both js and html.
A web page is usually not made of a single HTML file. In order to make it work, you will have to retrieve not only the HTML but also the javascript and the css files.
This can be a tedious work.
If you are trying to open something from the web, the easiest way is to perform a regular navigate() which will take the URI as parameter and perform a "full" browse (as the browser will do). The retrieval/loading of the CSS/JS will be done for you.
If you want to open a local page (local to your application), navigateToString() is a good path but you will have to host locally all the page dependencies (css/js fiels) or embed all the style and code in the HTML page itself.

Dynamically load & parse local HTML from within HTML?

A bit of an unusual setup:
I'm writing in an html page that in turn loads another html page, parses it, analyzes it, and displays information about it.
The parsing is fairly easy using jQuery. I just need to figure out how to load the external page - that is, when page A is displayed in the browser, it needs to load page B, analyze page B, and display information about page B.
Both pages are local (not served via a web server).
Both load and ajax from jQuery run into the cross-origin permission issue:
XMLHttpRequest cannot load file://localhost/Users/me/test.html. Origin null is not allowed by Access-Control-Allow-Origin.
I can load the page with a script tag, but then I don't know how to access it so I can parse it:
<script type="text/html" src="test.html"></script>
Any ideas?
Have you thought about using JavaScript/jQuery to create an iframe? (You can use CSS to make the iframe hidden to the end user.) Then you can listen for the iframe's onload event, and parse it through the iframe's contentDocument element (I believe).

Print JavaScript code from external file

I am trying to understand how to include JavaScript externally so the code prints to the page.
When I insert the JavaScript directly into the page code, it prints "hello"
<html>
<head>
<title></title>
</head>
<body>
<script type="text/javascript">document.write("hello");</script>
</body>
</html>
However, when I put that same code into external file say "javascript.js" and include it (src) in the html it does not print "hello"?
<html>
<head>
<title></title>
<script type="text/javascript" src="http://thewebsite.com/javascript.js"></script>
</head>
<body>
</body>
</html>
I am trying to understand how to get that external JavaScript file to run and print "hello".
How does XSS work then if a hacker was to include the following tag inside say a textarea to call his malicious script from malicious server?
<script type="text/javascript" src="http://thewebsite.com/javascript.js"></script>
Heres whats in the "javascript.js" file:
<script type="text/javascript">
document.write("hello");
</script>
The file is on the same domain so Same Origin Policy should not apply here and as mentioned if I directly insert code it does work but not when I try to include as separate file.
I thought including JavaScript as external file, should print the contents of the external file (i.e. "hello" in this case) as if it was directly inserted in html page?
When I insert the JavaScript directly into the page code, it prints "hello"
Correct
However, when I put that same code into external file say "javascript.js" and include it (src) in the html it does not print "hello"?
If the content isn't being written then, presumably, an error is being thrown instead. Check the error console for your browser.
The problem is that you are including the HTML script tags in the JavaScript file. JavaScript files should contain only JavaScript.
The file is on the same domain so Same Origin Policy should not apply here
It doesn't. The Same Origin Policy just prevents JavaScript running (not loaded from) Origin A from reading data from Origin B. Since the data is included in the script itself, it would still be available, even if the script was loaded from Origin B.
I guess there is a policy enforced by browsers called Same Origin Policy which makes sure that JS from different domains does not access each others data when loaded in a single page. Lets say that you have a Google Ad and it has some Javascript in it. It wouldn't be advisable if the script in Google Ads be able to access the data in your site (Vice-Versa but ofcourse you always have Google Ads or the Like button as iFrame and hence anyways they are most neatly seperated.)
If you could load the js file as a src to image file then I suppose you can achieve what you intend to.(If I am not wrong.)
Edit: The javascript file cannot be given as input to the src of img tag. You can only use it as javascript: scheme.

Is external JavaScript source available to scripting context inside HTML page?

When an external JavaScript file is referenced,
<script type="text/javascript" src="js/jquery-1.4.4.min.js"></script>
is the JavaScript source (lines of code before interpretation) available from the DOM or window context in the current HTML page? I mean by using only standard JavaScript without any installed components or tools.
I know tools like Firebug trace into external source but it's installed on the platform and likely has special ability outside the context of the browser sandbox.
Nope. There's no Javascript API for loading the true content of <script> tags. This is actually not an oversight, but rather a security feature: suppose I request the .json file that Gmail requests via AJAX to load your inbox by putting it in an external <script> tag. A JSON document is valid Javascript (granted, without side-effects), so it would run without error. Then, if I could inspect the content of the external script, I would be able to read your e-mail. (I'm almost certain that Gmail is more complex than that, but most sites are not.)
So, making up a few things about how Gmail works, here's how the attack would look:
<script id="inbox" type="text/javascript" src="http://mail.google.com/OMGYOURINBOX.json"></script>
<script type="text/javascript">
// Supposing a value called `externalScriptContent` existed on a script tag:
var inboxJSON = document.getElementById('inbox').externalScriptContent;
var messages = JSON.parse(inboxJSON);
for(var i in messages) {
// Do something malicious with each e-mail message
alert(messages[i].body);
}
</script>
If a script tag had the value externalScriptContent, I could just put whatever URL in for the src that I wanted, and then summon up the remote file's contents, effectively circumventing AJAX cross-origin restrictions. That'd be bad. We allow cross-origin requests for remote scripts because they are run and run only. They cannot be read.
Firebug has these permissions because Firefox extensions have the ability to inspect anything that the browser requests; normal pages, thankfully, do not.
However! Bear in mind that, if the script is on your domain, instead of writing it in <script src="…"></script> form, you can pull it up with an AJAX request then eval it to have access to the contents and still only request it once :)
You can parse the <script> tag and re-request the js file by XMLHttpRequest, it will likely be readily served from cache and with credentials of the current page. But unless both your requesting script and the script in the tag originate from the same domain, the browser will disallow this.

Categories