Getting real source code with javascript? - javascript

OK I don't use js enough to know, but is there a way to get the real source code of the page with it?
document.body.innerHTML for example gives some kind of "fixed up" version where malformed tags have been removed.
I'm guessing using XMLHttpRequest on the original page might work, but seems kind of stupid.

This happens because browsers parse the DOM and don't keep the HTML in memory. What is returned to you is the browser's conversion of the current DOM back to HTML, which is the reason for the uppercase tags and lack of self closing tags where applicable.
An XMLHttpRequest would be the best way to go. In most cases, assuming the server doesn't send the no-cache header, and the HTML page has finished downloading, the XMLHttpRequest would be almost instant because the file is fetched from the cache.

For accessing JS of the same origin, XMLHttpRequest is quite fine. You can have access to any JS document in "raw" format using this technique without the browser getting in the way (i.e. conversion to DOM and back).
I am not sure I understand your comment re: XMLHttpRequest being stupid : is it because you are worried about the potential duplication of work? i.e. getting the code 2times from the origin server.

I typically use FireBug when I want to peruse or copy source files.

Related

Cross-Domain AJAX issue trying to load JSON

I have a URL(that i am not in control of) that returns a JSON string. This JSON string contains a URL that i am trying to load with JavaScript/jQuery AJAX. What I’m running in to when loading the JSON string is a Cross-Domain issue.
I know a couple of workarounds for fixing Cross-Domain issues like:
Using JSONP by adding "callback=?” as a parameter.
Call for example a PHP script and letting it load and return the JSON.
My issue is that i JSONP is not supported and i can’t do the convinient method of using PHP on the server because of the problem below:
The URL in the JSON data has one dynamically generated parameter. As from what i understand when playing around with it the dynamically generated parameter is that it is defined by whats loading the JSON, specifically the User-Agent string. But it is also depending on the IP loading the JSON. This is important because the URL inside the JSON string will return 403 forbidden if whats loading the JSON is not matching whats loading the URL.
I hope i have explained my issue well and i appreciate all help i can get.
As Quentin said, it sounds like a security measure designed to stop you doing what you are trying to do. So you can't do it. That, or you implement a query from a server and reverse-engineer how to deal with the dynamically-generated parameter, but note that you'll almost certainly be violating some terms of service that the service in question seems to take seriously. And so after a few days of working, don't be surprised if it stops working when they figure out what you're doing and patch it.

What does the "?" sign mean in a request for a static JS file?

I've seen that a lot and I just don't know what it means. This, for example:
<script src="http://server.com/file.js?y=2345678" type="text/javascript"></script>
If it is in deed possible to 'catch' the value of 'y' in the javascript file, how would that be?
Thank you.
PS. I know what mod_rewrite is and that is not the answer, just in case :)
This is to force the browser not to cache the file, by making it believe that it is a dynamic file with get parameter rather than a static one.
This is often used to facilitate caching of the JS file. You set a far-future Expires header which means the browser may cache it for a very long time. If you change something in the file you also update the number in the querystring, which will make the browser refetch the file. This works because caching is for unique filenames and the querystring is part of the filename (as far as the browser is concerned).
A similar approach to this is to use rewrite rules in the web server to have some part of the file name which it doesnät care about. Here's a Nginx rule to show what I mean:
rewrite ^/style\..*\.css$ /style.css;
I use this rule to have filenames like style.42750cad6.css, which always points to the file style.css. The text in the middle is changed whenever I change style.css. The difference between the first approach is that this does not use the querystring so the caching will work in more browsers.
ok the way i see it in two ways.
it can be used to load js without caching
for every request to the server, the server might log information(if logging is enabled), if i am using it for analytics i can therefore use a different parameter for locations and from the log i can analyse and get required details.

ASP.NET - Overriding Javascript and Styleheets cached by browser

I'm working on an asp.net website. I need to make sure my javascript and css updates will be immediately available for returning visitors, without them having to clear their cache to get the latest code. How can I force this when needed, and leave it alone on other occasions?
Thanks in advance.
Stackoverflow does this very elegantly IMO. You can see in there source there is a query-string variable passed.
<script type="text/javascript" src="http://sstatic.net/js/stub.js?v=778aaf5a38e2"></script>
This will force new load if that changes.
In order to reliably force a refresh of cached content on the browser you need to change the name / requested URL of the file somehow.
Often this is done by appending a query string to the URL (which is often ignored)
See When Does Browser Automatically Clear JavaScript Cache? for a more complete discussion.
You'll need to apply no-cache header to those resources, alternatively make the url dynamic somehow.
see - http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

Is there a way to validate the HTML of a page after AJAX operations are performed on it?

I'm writing a web app that inserts and modifies HTML elements via AJAX using JQuery. It works very nicely, but I want to be sure everything is ok under the bonnet. When I inspect the source of the page in IE or Chrome it shows me the original document markup, not what has changed since my AJAX calls.
I love using the WC3 validator to check my markup as it occasionally reminds me that I've forgotten to close a tag etc. How can I use this to check the markup of my page after the original source served from the server has been changed via Javascript?
Thank you.
Use developer tool in chrome to explore the DOM : it will show you all the HTML you've added in javascript.
You can now copy it and paste it in any validator you want.
Or instead of inserting code in JQuery, give it to the console, the browser will then not be able to close tags for you.
console.log(myHTML)
Both previous answers make good points about the fact the browser will 'fix' some of the html you insert into the DOM.
Back to your question, you could add the following to a bookmark in your browser. It will write out the contents of the DOM to a new window, copy and paste it into a validator.
javascript:window.open("").document.open("text/plain", "").write(document.documentElement.outerHTML);
If you're just concerned about well-formedness (missing closing tags and such), you probably just want to check the structure of the chunks AJAX is inserting. (Once it's part of the DOM, it's going to be well-formed... just not necessarily the structure you intended.) The simplest way to do that would probably be to attempt to parse it using an XML library. (one with an HTML mode that can be made strict, if you're not using XHTML)
Actual validation (Testing the "You can't put tag X inside tag Y" rules which browsers generally don't care too much about) is a lot trickier and, depending on how much effort you're willing to put into it, may not be worth the trouble. (Because, if you validate them in isolation, you'll get a lot of "This is just a fragment" false positives)
Whichever you decide to use, you need to grab the AJAX responses before the browser parses them if you want a reliable test result. (While they're still just a string of text rather than a DOM tree)

Command line URL fetch with JavaScript capabliity

I use curl, in php and httplib2 in python to fetch URL.
However, there are some pages that use JavaScript (AJAX) to retrieve the data after you have loaded the page and they just overwrite a specific section of the page afterward.
So, is there any command line utility that can handle JavaScript?
To know what I mean go to: monster.com and try searching for a job.
You'll see that the Ajax is getting the list of jobs afterward. So, if I wanted to pull in the jobs based on my keyword search, I would get the page with no jobs.
But via browser it works.
you can use PhantomJS
http://phantomjs.org
You can use it as below :
var page=require("webpage");
page.open("http://monster.com",function(status){
page.evaluate(function(){
/* your javascript code here
$.ajax("....",function(result){
phantom.exit(0);
}); */
});
});
Get FireBug and see the URL for that Ajax request. You may then use curl with that URL.
There are 2 ways to handle this. Write your screen scraper using a full browser based client like Webkit, or go to the actual page and find out what the AJAX requesting is doing and do request that directly. You then need to parse the results of course. Use firebug to help you out.
Check out this post for more info on the subject. The upvoted answer suggests using a test tool to drive a real browser.
What's a good tool to screen-scrape with Javascript support?
I think env.js can handle <script> elements. It runs in the Rhino JavaScript interpreter and has it's own XMLHttpRequest object, so you should be able to at least run the scripts manually (select all the <script> tags, get the .js file, and call eval) if it doesn't automatically run them. Be careful about running scripts you don't trust though, since they can use any Java classes.
I haven't played with it since John Resig's first version, so I don't know much about how to use it, but there's a discussion group on Google Groups.
Maybe you could try and use features of HtmlUnit in your own utility?
HtmlUnit is a "GUI-Less browser for
Java programs". It models HTML
documents and provides an API that
allows you to invoke pages, fill out
forms, click links, etc... just like
you do in your "normal" browser.
It has fairly good JavaScript support
(which is constantly improving) and is
able to work even with quite complex
AJAX libraries, simulating either
Firefox or Internet Explorer depending
on the configuration you want to use.
It is typically used for testing
purposes or to retrieve information
from web sites.
Use LiveHttpHeaders a plug in for Firefox to see all URL details and then use the cURL with that url.
LiveHttpHeaders shows all information like type of method(post or get) and headers body etc.
it also show post or get parameters in headers
i think this may help you.

Categories