extracting html from cross-domains - javascript

I am using jquery to get the html from a specific domain. I know that you cant use ajax becuause of the cross-domain policy, so what i did was use $.getJSON and YQL which worked great. But this only returns the body of the html, i want to retrieve the full html document with the head tags, title tags and html tags.
Can i still do this using something else?

The same principle applies.
If you want to fetch data then it must either be:
Using JSON-P
From your own domain (as far as the browser is concerned)
You can proxy the entire document through your own domain with a little server side programming (although you should take steps to prevent yourself from being used as a URL cloaker by spammers)

Related

Javascript detect which host it was loaded from

I have a Javascript library I'm working on. It can be self-hosted or run from another server. The script makes a number of AJAX calls and the preferred method is making POST requests to the same host as the including page. To allow for cross-domain calls it also supports JSONP, but this limits the amount of data that can be sent (~2K to safely accommodate most modern browsers' URL length limits).
Obviously the user including the script knows where they're getting it from and could manually select JSONP as needed, but in the interest of simplifying things, I'd like to detect, within the script itself, whether the script was loaded from the same host as the page including it or not.
I'm able to grab the script element with jQuery but doing a $('script').attr('src') is only returning a relative path (e.g. "/js/my-script.js" not "http://hostname.com/js/my-script.js") even when it's being loaded from a different host.
Is this possible and if so, how would I go about it?
Thanks in advance.
Don't use JSONP, use CORS headers.
But if you really want to do JS check, use var t = $('script')[0].outerHTML.
Effect on my page:
[20:43:34.865] "<script src="http://www.google-analytics.com/ga.js" async="" type="text/javascript"></script>"
Checking location.host should do the trick.

YQL removes data within <script></script> tags from the requested page

I have code seperated in two domain. Both domains and code is trusted.
I wanted to access code of domain B VIA jquery's .load() method from domain A, but came to know that its not possible due to restriction of data-access between cross-domains.
YQL came as option to me, worked fine. BUT it doesn't brings the data in < script >< / script > tags. I have examined the data being returned and it has all the HTML of called page from Domain B otherthan SCRIPT code.
I need to bring SCRIPT code and execute it.
Couldn't find anything related in YQL help (they only mentioned how to remove < script > from returned data), but in my case YQL itself removes scripts.
What's happening at the url you've provided, is not so much that script tags are being stripped, as the YQL select returning only the body of the document by default, and your script is placed in the head.
You can get the head with a query such as this one:
where url="..." and xpath='/html/head/'
YQL does not strip <script> elements. load() uses innerHTML which doesn't cause <script> elements to execute.

Load a remote XML into a web page and access its content via JavaScript

I'd like to know if it's possible to load a remote XML file through the <script> tag, and access the content using JavaScript.
As the XML is a result of an external website (I'm using TheTVDb API), I can't load it using AJAX.
I'm looking for something like the following, if it's possible (JQuery-like syntax):
<script id="xmlload" type="text/xml" src="...">
<script type="text/javascript">
var xmlcontent = $('#xmlload').content();
// parse xmlcontent
</script>
I don't think that this is possible - you will need to use XmlHttpRequest (AJAX) to use a HTTP-based API. However, it might still be possible to actually do cross-site requests if the TheTVDb server allows this - see HTTP access control on MDN, which describes the relevant W3C specification (Cross-Origin Resource Sharing).
So if you haven't done so yet, I'd recommend you just try if making an AJAX request works. Otherwise, it might be a good idea to ask the TheTVDb folks if they are so kind to implement the mentioned spec.

Load external page and use it's elements like I would use them normally?

Can I load an external page using Javascript and convert it to a DOM structure so I can scrape it like I would to it normally?
Bad explanation, but code says more than thousand words, I think. ;)
foobar = loadExternalPage('foobar.com');
foobar = convertToDOM(foobar);
headers = foobar.getElementsByClassName('header');
Thank you!
If the external page is on the same domain, then yes you can using XMLHttpRequest, then treating the response as HTML. Alternatively, load it into an iframe and access the resulting contentDocument.
For a page on another domain, however, it's a bit more complicated. You may want to look at PHP's DOMDocument, which you can use to parse HTML from any domain, and even pass it back to JavaScript if you make an AJAX call to your PHP script.
https://developer.mozilla.org/en/Code_snippets/HTML_to_DOM

load cross domain xml by Javascript

Hi
Is it possible to load an XML file from a domain that differs from scripts domain with pure javascript and without using a php/asp/jsp/... script as proxy?
Something like xmlHttpRequest but with ability to manage cross domain requests.
Thanks
You can use something called JSONP. I know the name sucks, because it's not really related to JSON. But this requires you have control over the other domain. You need to wrap your XML inside a function call, or assign it to a javascript variable:
func('<xml></xml>');
or
var myxml = '<xml></xml>';
So if your other domain returns one of these two formats, you can use the <script src="http://otherdomain/yourjsonp"></script> syntax in your html to load that data in JavaScript. It's a little hacky but a lot of people use it.
It is possible with yql! (Yahoo did it for you)
Go to this site and simple at the "select from url='xxx' " replace the xxx with your xml url. Use the url created at the text box below and do a simple xmlrequest. You won't have any cross-domain prolems

Categories