GET arbitrary URLs with JavaScript - javascript

I want to be able to access /robots.txt from a variety of sites using JavaScript. This is for a side project that tests the availability of sites, not all of which are under my control. I've tried this:
$.get(robotsUrl, function() {
console.log('success!');
}, "text")
.fail(function() {
console.log('failed :(');
});
However, this fails with
XMLHttpRequest cannot load https://my.test.url/robots.txt. Origin http://localhost:8000 is not allowed by Access-Control-Allow-Origin
MDN's page on Same-Origin-Policy says that it's possible to embed content with some elements, such as <script>, <iframe> <embed>. Could I load /robots.txt from an arbitrary site with any of these? Is there any other way I can access this file on other domains?

You could load it with any of them, you just won't be able to make the data available to JavaScript. That's rather the point of the Same Origin Policy.
If you want to get arbitrary data from arbitrary sites, you need to do it server side.

To get around a same origin policy, you need to either have control over the host site and set the allow-origin (not an option here), or load it by a method other than JavaScript (which JSONP does; it is loaded as a standard script).
That means you could display the robots.txt in an iframe, for example, by just setting its src attribute.
If you want to manipulate the contents in JavaScript, that won't work (even after you load the content in an iframe, you're still not allowed to interact with it). Your final option is to set up a proxy. Have a script on your server which when called will load the relevant file and redirect the content. It's not hard to do, but means your server will have higher traffic (and you'll need to lock it down so that it isn't used maliciously).

iframes won't let you peek at the content. You could show it to your user, but I'm guessing you want to analyze it with code.
You could do it on your server. Even if you just have a /cors/robots/domain.tld handler (and others for other files you need to access). This is probably the best way, if it's feasible for your situation.
AnyOrigin, is a free service allows you to make cross-origin requests.
$.getJSON('http://anyorigin.com/get?url=google.com/robots.txt&callback=?', function(data){
console.log(data.contents); // contents of Robots.txt
});

Pretty sure this is possible with Chrome by runnning the browser with the Same Origin Policy disabled: Disable same origin policy in Chrome.
It may be preferable to do something like this outside the context of a browser however, on the command line perhaps using something like CURL?

Related

Load external page and Replace text

Would it be possible to load an external page inside a container and replace text elements?
We work with ad campaigns and earn a percentage whenever a user signs up.
Can a script replace certain words? For instance “User” to “Usuario” or “Password” to “Contraseña” without affecting the original website or its functions.
Note: These links always pass through a redirection.
Example:
http://a2g-secure.com/?E=/0yTeQmWHoKOlN6zUciCXQwUzfnVGPGN&s1=
Note 2: Using an iframe is out of the question due to “Same-origin policy”.
I'm not sure if this answers your question, but you might find it useful.
(Perhaps you might give a step-by-step example of what you're trying to accomplish?)
If we assume that a browser attempts to retrieve page P from a proxy which first retrieves the content of page P from its actual home and then performs some transformation on its content before returning that page content to the browser, what you're describing is a Reverse HTTP Proxy and is a very well-known page serving technique.
Rather than performing complex transformations at the server (which require specialized knowledge of the page layout), this technique is usually used to inject a single line into the retrieved source that calls a JavaScript file to actually perform the required transformation at the browser.
So in essence:
Browser requests Page P from Proxy 1.
Proxy 1 retrieves the actual Page P from its real home, Server 2.
Proxy 1 adds the line <script src="//proxy1.com/transform.js"></script> to the source of Page P.
Proxy 1 then returns the modified source of Page P to Browser.
Once the Browser has received the page content, the JavaScript file is also retrieved, which can then modify the page contents in any way required.
This technique can be used to solve your "Same origin policy" issue by loading an iframe from a URL that points to the same server as that which provided the parent or owning page of the iframe which acts as proxy, like:
http://example.com/?proxy_target=//server2.com/pageP.html
Thus, the browser only "sees" content from a single server.
You would need to load the external page server-side, and then you can do whatever you want with it. You can do serverside string replacement, or you can do it later in javascript.
But, remember that as soon as you add a whole webpage into for example a div in your own page, the css from your page will affect it.
Plus, you would need to manipulate all the links in the documents, to have absolute urls. If the page depends on ajax, there is pretty much no way to accomplish what you want to do.
If on the other hand the pages you will be loading are static html, it is possible, though there are a lot of things you need to take care of before you can actually present the page to the user, like adjusting links, urls to stylesheets and so on.
It seems you are trying to localize a website on the fly, using your server as a proxy for that content. Does it make sense? If that's the case, depending on the size of your operation, there are several proxy translation services out there (I'll name them if needed).
Basically, they scrape a website, providing a way for you to translate and host the translated content. Of course, this depends on your relationship with the content providers. You should also take this into consideration, since modifying content, even for translation, can be a copyright problem.
All things considered, if you trust the provider's javascript, the solution involves scraping the content, as mentioned in other answers, and serving that modified content. You really need to trust the origin...
update per request
http://www.easyling.com
http://www.smartling.com
http://www.motionpoint.com
http://www.lionbridge.com/solutions/translation-proxy/
http://www.sajan.com/translation-proxy-technology-and-traditional-website-translation-understanding-your-options/
They are all aimed at enterprise-grade projects, but I would say Easyling is the most accessible.
Hope this helps.
Using the .load() callback function, this will replace the text
$(function(){
$("#Content").load("http://example.com?user=Usuario",function() {
$(this).html($(this).html().replace("user", +get param value+));
});
redirection u can use
// similar behavior as an HTTP redirect
window.location.replace("url");
// similar behavior as clicking on a link
window.location.href = "url";
The answer is NO, not without using a server-side proxy. For a really good overview of how to use a proxy, see this YUI page: https://developer.yahoo.com/javascript/howto-proxy.html (Be patient, as it will take time to load, but the illustrations are worth it!)
When I try to do this in jsfiddle to see what data that the 3 parameters contain, then the error below appears:
$(function() {
$(this).load('https://stackoverflow.com/questions/36003367/load-external-page-and-replace-text', function(responseText, textStatus, jqXHR){
debugger;
});
});
ERROR:
XMLHttpRequest cannot load Load external page and Replace text.
No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'https://fiddle.jshell.net' is therefore not allowed access.

transferring localstorage to another website [duplicate]

I am attempting to share data across subdomains using Safari. I would like to use an HTML5 database (specifically localStorage as my data is nothing but key-value pairs).
However, it seems as though data stored to example.com can not be accessed from sub.example.com (or vice versa). Is there any way to share a single database in this situation?
Update 2016
This library from Zendesk worked for me.
Sample:
Hub
// Config s.t. subdomains can get, but only the root domain can set and del
CrossStorageHub.init([
{origin: /\.example.com$/, allow: ['get']},
{origin: /:\/\/(www\.)?example.com$/, allow: ['get', 'set', 'del']}
]);
Note the $ for matching the end of the string. The regular expression in the above example will match origins such as valid.example.com, but not invalid.example.com.malicious.com.
Client
var storage = new CrossStorageClient('https://store.example.com/hub.html');
storage.onConnect().then(function() {
return storage.set('newKey', 'foobar');
}).then(function() {
return storage.get('existingKey', 'newKey');
}).then(function(res) {
console.log(res.length); // 2
}).catch(function(err) {
// Handle error
});
Check https://stackoverflow.com/a/39788742/5064633
There is simple way to use cross-domain anything, just create simple page that will be included as proxy iframe hosted on domain you try to access, send PostMessage to that iframe and inside iframe you do your LocalStorage database manipulation. Here is a link to article that do this with lcoalStorage. And here is demo that send message to different page in subdomain check the source code, it use iframe and PostMessage.
EDIT: New version of sysend.js library (used by above demo) use BroadcastChannel if browser support it, but still it require Iframe. Recent version also simplify using of Cross-Origin messages, you have html of the iframe in repo, that you can use (or you can use simple html file with single script tag with the lib) and in parent you just need to call one function sysend.proxy('https://example.com'); where example.com need to have proxy.html file (you can also use your own filename and different path).
Google Chrome blocks localStoage access from an iFrame in another domain by default,unless 3rd party cookie is enabled and so does Safari on iPhone...the only solution seems to be opening the parent domain on a different domain and then sending to to the Child via window.postMessage but looks ugly and shifty on phones...
Yes. This is how:
For sharing between subdomains of a given superdomain (e.g. foo.example.com vs bar.example.com vs example.com), there's a technique you can use in that situation. It can be applied to localStorage, IndexedDB, SharedWorker, BroadcastChannel, etc, all of which offer shared functionality between same-origin pages, but for some reason don't respect any modification to document.domain that would let them use the superdomain as their origin directly.
NOTE: This technique depends on setting document.domain to allow direct communication between iframes on different subdomains. That functionality has now been deprecated. (As of April 2021 it continues to work in all major browsers however. From Chrome v109 the feature will be disabled unless an Origin-Agent-Cluster: ?0 header is also sent.)
NOTE: Be aware that this technique removes the same-origin defences that block malicious script on a subdomain from affecting the main-domain window, or visa versa, potentially broadening the attack surface for XSS attacks. There are other security implications for shared hosting as well - see the MDN document.domain page for details.
(1) Pick one "main" domain to for the data to belong to: i.e. either https://foo.example.com or https://bar.example.com or https://example.com will hold your localStorage data. Let's say you pick https://example.com.
(2) Use localStorage normally for that chosen domain's pages.
(3) On all other https://*.example.com pages (the other domains), use JavaScript to set document.domain = "example.com"; (always the superdomain). Then also create a hidden <iframe>, and navigate it to some page on the chosen https://example.com domain (It doesn't matter what page, as long as you can insert a very little snippet of JavaScript on there. If you're creating the site, just make an empty page specifically for this purpose. If you're writing an extension or a Greasemonkey-style userscript and so don't have any control over pages on the example.com server, just pick the most lightweight page you can find and insert your script into it. Some kind of "not found" page would probably be fine).
(4) The script on the hidden iframe page need only (a) set document.domain = "example.com";, and (b) notify the parent window when this is done. After that, the parent window can access the iframe window and all its objects without restriction! So the minimal iframe page is something like:
<!doctype html>
<html>
<head>
<script>
document.domain = "example.com";
window.parent.iframeReady(); // function defined & called on parent window
</script>
</head>
<body></body>
</html>
If writing a userscript, you might not want to add externally-accessible functions such as iframeReady() to your unsafeWindow, so instead a better way to notify the main window userscript might be to use a custom event:
window.parent.dispatchEvent(new CustomEvent("iframeReady"));
Which you'd detect by adding a listener for the custom "iframeReady" event to your main page's window.
(NOTE: You need to set document.domain = example.com even if the iframe's domain is already example.com: Assigning a value to document.domain implicitly sets the origin's port to null, and both ports must match for the iframe and its parent to be considered same-origin. See the note here: https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy#Changing_origin)
(5) Once the hidden iframe has informed its parent window that it's ready, script in the parent window can just use iframe.contentWindow.localStorage, iframe.contentWindow.indexedDB, iframe.contentWindow.BroadcastChannel, iframe.contentWindow.SharedWorker instead of window.localStorage, window.indexedDB, etc. ...and all these objects will be scoped to the chosen https://example.com origin - so they'll have the this same shared origin for all of your pages!
The most awkward part of this technique is that you have to wait for the iframe to load before proceeding. So you can't just blithely start using localStorage in your DOMContentLoaded handler, for example. Also you might want to add some error handling to detect if the hidden iframe fails to load correctly.
Obviously, you should also make sure the hidden iframe is not removed or navigated during the lifetime of your page... OTOH I don't know what the result of that would be, but very likely bad things would happen.
And, a caveat: setting/changing document.domain can be blocked using the Feature-Policy header, in which case this technique will not be usable as described.
However, there is a significantly more-complicated generalization of this technique, that can't be blocked by Feature-Policy, and that also allows entirely unrelated domains to share data, communications, and shared workers (i.e. not just subdomains off a common superdomain). #jcubic already described it in their answer, namely:
The general idea is that, just as above, you create a hidden iframe to provide the correct origin for access; but instead of then just grabbing the iframe window's properties directly, you use script inside the iframe to do all of the work, and you communicate between the iframe and your main window only using postMessage() and addEventListener("message",...).
This works because postMessage() can be used even between different-origin Windows. But it's also significantly more complicated because you have to pass everything through some kind of messaging infrastructure that you create between the iframe and the main window, rather than just using the localStorage, IndexedDB, etc. APIs directly in your main window's code.

Is it possible to use jQuery to grab the HTML of another web page into a div?

I am trying to integrate with the FireShot API to given a URL, grab HTML of another web page into a div then take a screenshot of it.
Some things I will need to do after getting the HTML
grab <link> & <script> from <head>
grab <body> into <div>
But 1st, it seems when I try to do a
$.get("http://google.com", function(data) { ... });
I get a 200 in firebug colored red. I think it has to do with sites not allowing you to grab their page with JS? Then is opening a window the best I can do? But how might I control the other page with jQuery or call fsapi on that page?
UPDATE
I tried to do something like below to do something when the new window is ready, but FireBug says "Permission denied to access property 'document'"
w = window.open($url.val());
setTimeout(function() { // if I dont do this, I always get about:blank, is there a better way around this?
$(w.document).ready(function() {
console.log(w.document.body);
});
}, 1000);
I believe the cross-site security setup within Javascript is basically blocking this. You'd likely have to proxy the content through your own domain.
There are a couple other options I think for break the cross-site security constraints, but I'm not sure I'd promote them.
If the "another page" locates within the same domain of your hosting page, yes, you can. Please refer to jQuery's $().load() API.
Otherwise, you're disallowed to do so by the browser's Cross-Site Security Policy. At this moment, you can choose to use iFrame instead of DIV.
Some jQuery plugins, e.g. thickbox provides ability to load pages to appropriate container automatically.
Unless I am correct, I do not believe you can AJAX a page cross domain (e.g. from domain1.com to domain2.com). To get around this, you can have a PHP "proxy" script that does the "getting" of the page and then pass it to JS.
For example, in JS you would get() http://mydomain.com/get/?domain=http://google.com and then do what you need to do!

accessing func in iframe. local to online

HI, i got a simple html page, localy with an iframe. the iframe includes a generated page which got a javascript function. i know want to call that function. of course, im getting "permission denied". so since im new to js and all that stuff i dont know if it's actually possible to do that. give me some hints for searching or a nice solution.
i do cal lthe func like: parent.myiframe.myfunc();
I guess the page in the iframe resides on another server / domain. Modern browser do not allow "cross site scripting", see: http://de.wikipedia.org/wiki/Cross-Site_Scripting
If possible, move the site in the iframe to the same server. An alternative (workaround) would be to proxy the page on the local server, so that that for the client it seems to be loaded from the same domain.
Edit: This is also called a "Same Origin Policy". You can only call java script functions in a document that is:
from the same domain (www.mydomain.com)
from the same subdomain (mail.mydomain.com <- no go!)
both use the same port (p.Ex.
accessing a http://... document from
a http*s*:// document won't work).
There might be another workaround if you have access to the iframe's source:
Change the iframe domain to the same as the outer frame's, by applying:
document.domain = "domain.com";
in the iframe source (see http://ajaxian.com/archives/how-to-make-xmlhttprequest-calls-to-another-server-in-your-domain for more information).
Also there is a Draft for "Cross-Origin Resource Sharing" (http://www.w3.org/TR/cors/) that is already partially implemented in several browser, see: http://www.webdavsystem.com/ajax/programming/cross_origin_requests

How can I access the contents of a frame or iframe with JavaScript or JQuery?

I'm trying
$(document).ready(function(){
var myIFrame = document.getElementById("myIframe");
var content = myIFrame.contentWindow.document.body.innerHTML;
alert(
content
);
})
but I get an access denied error I guess because what I have loaded in the frame is Google. Is what I want to do even possible? The document must be an external one (like Google). And I would like to loop through all elements in the document. I'm a newbie on JS and JQuery so I might be missing a very basic thing.
You can't because of the Same Origin Policy, implemented in all common browsers.
The protocol, host, domain and port of two IFRAME pages must match(having the same origin) to allow javascript to communicate between them.
If you want to enable cross domain communication in javascript you need some cooperation from the other domain.
Now if you plan to read something less ambitious than google.com, and the other domain is ready to communicate with you, here are two techniques:
For modern browsers you can use parent.postMessage from the IFRAME and have a listener in the main page to receive a string message.
For older browsers you can use tricks like passing string data i.e. through windows.name or the window.location.hashBut with those tricks you will have to poll with a setInterval to check for changes.
No, this is called cross site scripting (XSS), and is disabled for security.
Try doing this:
$(document).ready(function(){
iframe = $('#myIframe');
alert($(iframe).contents());
});

Categories