How to get page source of page at another domain (in Javascript!) - javascript

I am trying to get the source page of a webpage on a different domain. I know this is easily done with PHP for example but I would like to do it in Javascript because I am getting results from a page and if I use a server-side language, the original website will block the calls since they come from the same IP. However, if the calls are done on the client side, it is like the user request the results each time (different user, different IP, no original site blocking me). Is there a way to do that (even if not in Javascript but client-side).
To clarify the code I want will be applied to an HTML page so I can get the results, style them, add/delete, etc then display them to the user.
Thank you.

Modern browsers support cross-domain AJAX calls but the target site has to allow them by using special headers in the reply. Apart from that, there is no pure Javascript solution AFAIK.

Could you use an iframe? You wont have direct access to the markup to due to cross-domain restrictions, but you can still display the third-party page to the user...
window.onload = function() {
var i = document.createElement("IFRAME");
i.setAttribute("name", "ID_HERE");
i.setAttribute("id", "ID_HERE");
i.setAttribute("src", "URL_HERE");
i.style.maxHeight = "300px";
i.style.width = "100%";
document.body.appendChild(i);
}

On Windows you can use HTA
An HTA can access anything crossdomain in an iframe for example

Related

Redirecting from a page when not accessed via an iframe without using JavaScript?

I need to redirect users if they try and access certain pages directly, i.e. not via the iframe I've provided. This is to stop them accessing other users' areas.
All the solutions I've found (which work, btw) use JavaScript - the current working script I have is
if (top == self) {
var newURL = 'http://www.exampleurlhere.co.uk/'
window.setTimeout('GotoIndex()',0);
}
function GotoIndex() { top.location.href = newURL; }
However, this will of course only work if the user has JavaScript enabled, which kind of scuppers me. Is there a way to achieve this server-side? I'm using aspx.
Thanks, Oli.
You can't check it severside as IFrame requests (in general) are no different from any other HTTP request. You can however use a GET parameter to indicate this is an IFrame
<iframe src="mypage?iframe=yes"></iframe>
Obviously this solution will only work if you're in control of the code that contains the IFrame, otherwise there is no way to do it server-side.

Reading document.links from an IFrame

EDIT:
Just a quick mention as to the nature of this program. The purpose of this program is for web inventory. Drawing different links and other content into a type of hierarchy. What I'm having trouble with is pulling a list of links from a webpage within an IFrame.
I get the feeling this one is gonna bite me hard. (other posts indicate relevance to xss and domain controls)
I'm just trying something with javascript and Iframes. Basically I have a panel with an IFrame inside that goes to whatever website you want it to. I'm trying to generate a list of links from the webpage within the Iframe. Its strictly read only.
Yet I keep coming up against the permission denied problem.
I understand this is there to stop cross site scripting attacks and the resolution seems to be to set the document domain to the host site.
JavaScript permission denied. How to allow cross domain scripting between trusted domains?
However I dont think this will work if I'm trying to go from site to site.
Heres the code I have so far, pretty simple:
function getFrameLinks()
{
/* You can all ignore this. This is here because there is a frame within a frame. It should have no effect ont he program. Just start reading from 'contentFrameElement'*/
//ignore this
var functionFrameElem = document.getElementById("function-IFrame");
console.log("element by id parent frame ");
console.log(functionFrameElem);
var functionFrameData = functionFrameElem.contentDocument;
console.log("Element data");
console.log(functionFrameData);
//get the content and turn it into a doc
var contentFrameElem = functionFrameData.getElementById("content-Frame")
console.log(contentFrameElem);
var contentFrameData = contentFrameElem.contentDocument;
console.log(contentFrameData);
//get the links
//var contentFrameLinks = contentFrameData.links;
var contentFrameLinks = contentFrameData.getElementsByTagName('a');
Goal: OK so due to this being illegal and very similar to XSS. Perhaps someone could point out a solution as to how to locally store the document. I dont seem to have any problems accessing document.links with internal pages in the frame.
Possibly some sort of temp database of cache. The simpler the solution the better.
If you want to read it just for your self and in your browser, you can write a simple proxy with php in your server. the most simple code:
<?php /* proxy.php */ readfile($_GET['url']); ?>
now set your iframe src to your proxy file:
<iframe src="http://localhost/proxy.php?url=http://www.google.com"
id="function-IFrame"></iframe>
now you can access the iframe content from your (local) server.
if you want set the url with a program remember to encode the url (urlencode in php or encodeURIComponent in js)
Here is a bookmarklet you can run on any page (assuming the links are not in an iframe)
javascript:var x=function(){var lnks=document.links,list=[];for (var i=0,n=lnks.length;i<n;i++) {var href = lnks[i].href; list.push(href)};if (list.length>0) { var w=window.open('','_blank');w.document.write(list.length+' links found<br/><ul><li>'+list.sort().join('</li><li>')+'</ul>');w.document.close()}};void(x());
the other way is for you (on Windows) to save your HTML with extension .HTA
Then you can grab whatever lives in the iFrame
You might be interested in using the YQL (Yahoo Query Language) to retrieve filtered results from remote urls..
example of retrieving all the links from the yahoo.com domain

Is it possible to use jQuery to grab the HTML of another web page into a div?

I am trying to integrate with the FireShot API to given a URL, grab HTML of another web page into a div then take a screenshot of it.
Some things I will need to do after getting the HTML
grab <link> & <script> from <head>
grab <body> into <div>
But 1st, it seems when I try to do a
$.get("http://google.com", function(data) { ... });
I get a 200 in firebug colored red. I think it has to do with sites not allowing you to grab their page with JS? Then is opening a window the best I can do? But how might I control the other page with jQuery or call fsapi on that page?
UPDATE
I tried to do something like below to do something when the new window is ready, but FireBug says "Permission denied to access property 'document'"
w = window.open($url.val());
setTimeout(function() { // if I dont do this, I always get about:blank, is there a better way around this?
$(w.document).ready(function() {
console.log(w.document.body);
});
}, 1000);
I believe the cross-site security setup within Javascript is basically blocking this. You'd likely have to proxy the content through your own domain.
There are a couple other options I think for break the cross-site security constraints, but I'm not sure I'd promote them.
If the "another page" locates within the same domain of your hosting page, yes, you can. Please refer to jQuery's $().load() API.
Otherwise, you're disallowed to do so by the browser's Cross-Site Security Policy. At this moment, you can choose to use iFrame instead of DIV.
Some jQuery plugins, e.g. thickbox provides ability to load pages to appropriate container automatically.
Unless I am correct, I do not believe you can AJAX a page cross domain (e.g. from domain1.com to domain2.com). To get around this, you can have a PHP "proxy" script that does the "getting" of the page and then pass it to JS.
For example, in JS you would get() http://mydomain.com/get/?domain=http://google.com and then do what you need to do!

Is there a way to mitigate downloading of resources (images/css and js files) with Javascript?

I have a html page on my localhost - get_description.html.
The snippet below is part of the code:
<input type="text" id="url"/>
<button id="get_description_button">Get description</button>
<iframe id="description_container" src="#"/>
When the button is clicked the src of the iframe is set to the url entered in the textbox. The pages fetched this way are very big with lots of linked files. What I am interested in the page is a block of text contained in a <div id="description"> element.
Is there a way to mitigate downloading of resources linked in the page that loads into the iframe?
I don't want to use curl because the data is only available to logged in users and the steps to take with curl to get the content is too complicated. The iframe is simple as I use this on a box which sends the right cookies to identify the request as coming from a logged in user, but the problem is that it is very wasteful to get nearly 1 MB of data to keep 1 KB of it and throw out the rest.
Edit
If the proposed method just works in Firefox it is fine, so I added Firefox tag. Also, it is possible that the answer actually is from the realm of Firefox add-on techniques, so I added that tag as well.
The problem is not that I cannot get at what I'm looking for, rather, the problem is the easy iframe method is wasteful.
I know that Firefox does allow loading only the text of a page. If you open a page and press Ctrl+U you are taken to 'view page source' window, There links behave as normal and are clickable, if you click on a link in source view, the source of the new page is loaded into the view source window, without the linked resources being downloaded, exactly what I'm trying to get. But I don't know how to access this behaviour.
Another example is the Adblock add-on. It somehow kills elements before they get loaded. With plain Javascript this is not possible. Because it only is triggered too late to intervene in good time.
The Same Origin Policy forbids any web page to access contents of any other web page in a different domain so basically you cannot do that.
However it seems that with some browsers it is allowed to access web pages content if you are trying to access it from a local web page which seems to be your case.
Safari, IE 6/7/8 are browser that allow a local web page to do so via XMLHttpRequest (source: Google Browser Security Handbook) so you may want to choose to use one of those browsers to do what you need (note that future versions of those browsers may not allow to do so anymore).
A part from this solution I only see two possibities:
If the web pages you need to fetch content from are somehow controlled by you, you can create a simpler interface to let other web pages to get the content you need (for example allowing JSONP requests).
If the web pages you need to fetch content from are not controlled by you the only solution I see is to fetch content server side logging in from the server directly (I know that you don't want to do so, but I don't see any other possibility if the previous I mentioned are not practicable)
Hope it helps.
Actually I've seen Cross Domain jQuery .load request before, here: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/
The author claims that codes like these found on that page
$('#container').load('http://google.com'); // SERIOUSLY!
$.ajax({
url: 'http://news.bbc.co.uk',
type: 'GET',
success: function(res) {
var headline = $(res.responseText).find('a.tsh').text();
alert(headline);
}
});
// Works with $.get too!
would work. (The BBC code might not work because of the recent redesign, but you get the idea)
Apparently it is using YQL wrapped into a jQuery plugin to do the trick. Now I cannot say I fully understand what he is doing there but it appears to work, and fits the bill. Once you load the data I suppose it is a simple matter of filtering out the data that you need.
If you prefer something that works at the browser level, may I suggest Mozilla's Jetpack framework for lightweight extensions. I've not yet read the documentations in its entirety but it should contain the APIs needed for this to work.
There are various ways to go about this in AJAX, I'm going to show the jQuery way for brevity as one option, though you could do this in vanilla JavaScript as well.
Instead of an <iframe> you can just use a container, let's say a <div> like this:
<div id="description_container"></div>
Then to load it:
$(function() {
$("#get_description_button").click(function() {
$("#description_container").load($("input").val() + " #description");
});
});
This uses the .load() method which takes a string in this format: .load("url selector"), then takes that element in the page and places it's content inside the container you're loading, in this case #description_container.
This is just the jQuery route, mainly to illustrate that yes, you can do what you want, but you don't have to do it exactly like this, just showing the concept is getting what you want from an AJAX request, rather than in an <iframe>.
Your description sounds like you are fetching pages from the same domain (you said that you need to be logged in and have session credentials) so have you tried to use async request via XMLHttpRequest? It might complain if the html on a page is particularly messed up but you chould still be able to get raw text via .responseText and extract what you need with a regex.

Dashboard Cross-domain AJAX with jquery

Hey everyone, I'm working on a widget for Apple's Dashboard and I've run into a problem while trying to get data from my server using jquery's ajax function. Here's my javascript code:
$.getJSON("http://example.com/getData.php?act=data",function(json) {
$("#devMessage").html(json.message)
if(json.version != version) {
$("#latestVersion").css("color","red")
}
$("#latestVersion").html(json.version)
})
And the server responds with this json:
{"message":"Hello World","version":"1.0"}
For some reason though, when I run this the fields on the widget don't change. From debugging, I've learned that the widget doesn't even make the request to the server, so it makes me think that Apple has some kind of external URL block in place. I know this can't be true though, because many widgets phone home to check for updates.
Does anyone have any ideas as to what could be wrong?
EDIT: Also, this code works perfectly fine in Safari.
As requested by Luca, here's the PHP and Javascript code that's running right now:
PHP:
echo $_GET["callback"].'({"message":"Hello World","version":"1.0"});';
Javascript:
function showBack(event)
{
var front = document.getElementById("front");
var back = document.getElementById("back");
if (window.widget) {
widget.prepareForTransition("ToBack");
}
front.style.display = "none";
back.style.display = "block";
stopTime();
if (window.widget) {
setTimeout('widget.performTransition();', 0);
}
$.getJSON('http://nakedsteve.com/data/the-button.php?callback=?',function(json) {
$("#devMessage").html(json.message)
if(json.version != version) {
$("#latestVersion").css("color","red")
}
$("#latestVersion").html(json.version)
})
}
In Dashcode click Widget Attributes then Allow Network Access make sure that option is checked. I've built something that simply refused to work, and this was the solution.
Cross-domain Ajax requests ( Using the XMLHttpRequest / ActiveX object ) are not allowed in the current standard, as per the W3C spec:
This specification does not include
the following features which are being
considered for a future version of
this specification:
Cross-site XMLHttpRequest;
However there's 1 technique of doing ajax requests cross-domain, JSONP, by including a script tag on the page, and with a little server configuration.
jQuery supports this, but instead of responding on your server with this
{"message":"Hello World","version":"1.0"}
you'll want to respond with this:
myCallback({"message":"Hello World","version":"1.0"});
myCallback must be the value in the "callback" parameter you passed in the $.getJSON() function. So if I was using PHP, this would work:
echo $_GET["callback"].'({"message":"Hello World","version":"1.0"});';
Apple has some kind of external URL block in place.
In your Info.plist you need to have the key AllowNetworkAccess set to true.
<key>allowNetworkAccess</key>
<true/>
Your code works in Safari because it is not constrained in the dashboard sever and it is not standards complient in that it DOES allow cross site AJAX. FF IS standards complient in that it DOES NOT allow cross site ajax.
If you are creating a dashboard widget, why don't you use the XMLHttpRequest Setup function in the code library of DashCode. Apple built these in so you don't need to install 3rd party JS libraries. I'm not sure about JSON support but perhaps starting here will lead you in a better direction.
So another solution is to create your own server side web service where you can control the CORS of, the users web browser can't access another site, but if you wrap that other site in your own web service (on the same domain) then it does not cause an issue.
Interesting that it works in Safari. As far as I know to do x-domain ajax requests you need to use the jsonp dataType.
http://docs.jquery.com/Ajax/jQuery.getJSON
http://bob.pythonmac.org/archives/2005/12/05/remote-json-jsonp/
Basically you need to add callback=? to your query string and jquery will automatically replace it with the correct method eg:
$.getJSON("http://example.com/getData.php?act=data&callback=?",function(){ ... });
EDIT: put the callback=? bit at the end of the query string just to be safe.

Categories