Scraping particular data from iframe - javascript

My intent is to load some content/html in iframe through ajax call to different domain.
Due to same-origin policy it keeps me from extracting these content and sending to my script.
I have read numerous articles how to workaround this problem but buttom line is that I need to have access to html file from which I am trying to extract content and this is not a case.
Do you have some idea, hack, workaround?
Thanx in advance

This question is basically a duplicate of Simple Screen Scraping using jQuery.
Proxy the other website using php or some other means, and then you no longer have cross domain issues.

Related

How can I load an alternate URL if, upon trying to load a URL in a frame, it times out?

Say I have set up a page with 8 frames, each frame tries to load different URLs in my LAN. If one or more of these URLs (belonging to local web servers) fail to load, I get the ugly error screen. Is there any way I can force that frame(s) to load an alternate URL if it times out or we can't catch that?
ThankS!
Your post lacks sufficient information to answer it fully.
This Answer is just resource help & knownledge. Hope you find it helpful.
If you are http://localhost/ as your main page, and the iframes are on the same host domain http://localhost/ you should be able to read the iframes & their content with javascript.
If you are lets say http://mywebsite.com/ and you are trying to access http://localhost/ you'll run into a wall, because web browsers are built to prevent you from interacting with the sub-domain or iframe bacause of Same-Origin-Policy.
https://en.wikipedia.org/wiki/Same-origin_policy
If you control http://localhost/ you can add the headers or permissions to allow Same-Origin-Policy to overrode. CORS for short. Not sure if it allows iframe access, i dont use iframe much anymore, i just use AJAX. Which for you i'd recommend looking into because it can handle ERROR PAGES exactly the way you want.
https://en.wikipedia.org/wiki/Cross-origin_resource_sharing
If you control both the HOST http://mywebsite.com/ & http://localhost/ you can put javascript on both pages & allow them to communicate with each other. But if you are getting error pages, it is unlikely you can control error page responses.
https://en.wikipedia.org/wiki/Web_Messaging

Is there any alternative to iframe, object or embed?

I want to include an external (external domain) html page into my page. For SEO reasons iframe, object or embed are not useful, there are just a link in my source file. PHP's include function is very helpful but it causes many problems for UI. I want contents of include function and view of iframe.
How can I do that?
Thanks.
There's no reasonable alternative to <iframe>.
Who knows if you could extract the markup from the site from the server-side and print that HTML inside a <div> in your own site, but I doubt that this could ever work, because if the whole target site does AJAX requests, WebSockets or whatever, it should be secure enough to block you from performing them from other domains than allowed ones by themselves (i.e. their official domains).
If you are adding content from an external source the it should really have 0 impact on your SEO. Needs to be your own content as far as I am aware. You could try scraping the external source and using Ajax add it to your page using $().load() or similar... Wouldn't recommend though!

Can I grab specific page HTML code from another webpage through Javascript?

I've read about how there are multiple methods to grabbing source code from another webpage via jQuery or using Cross-Domain Requests. What I want to try and do is make it so I grab a div that has different code each time a page is loaded and not the source code as a whole. So for example, the greater detail you see when you use 'inspect element' or a tool like firebug to dive deeper into the page code.
Would I be using one of the same methods?
Yes.
If you control BOTH domains you can add the Access_control_CORS
header to allow access of cross domain requests and use a
headless browser like phantomJS to grab a cached version of
rendered HTML page.
If you don't control both domains you will have to write a server
side proxy to grab the page and all its resources (you will have
to parse the page to get or rewrite links to images, javascripts,
stylesheets etc...) then run it through phantomJS to create a
HTML snapshot.`
source:
https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy
https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS
http://phantomjs.org/
NOTE: despite my best efforts, stack overflow is absolutely convinced these links are code. Sorry for posting as code.

Same Origin Policy and Facebook

Now I know there are a lot of resources about same origin policy, but I just want a straight up answer for my specific query as I am really struggling to understand.
I am using Facebook plugins on my website, these create iframes that are only visible in the DOM when I use chromes inspect element etc.
Is there a way that I can access these iFrames properties/attributes at all, or is it a resounding "NO CHANCE!". I am spending far too much time on this and I just need to get a final verdict.
Thanks!
Javascript doesn't see the iframe content. Chrome inspector just loads 2 different websites in the same time, yours and the plugins one, so you can play with both of them.
Just curious, how would you like to change it?
In general, JavaScript cannot access iframe content from outside of the iframe, unless the page domain and the iframe domain share the same protocol and host and port. In your case, this could possibly be done using a proxy server to load the iframe content from your domain.
http://en.wikipedia.org/wiki/Same_origin_policy

ideas/hacks for using javascript inside an iframe with remote content

I'm trying to throw together a proof of concept in rails where I want to put a remote sites content inside and iframe and then use jQuery to modify the content in the iframe. I know that I can't really do this because of cross site scripting protections and I also know it's not a great design; it's just a hack as a proof of concept. But, is there anyway to scrape the html from the remote site and pipe that into the iframe?
Thanks!
In short answer you can't do this.
However, you can try some crazy solution like download remote page using wget and then link file to the iframe. But then the page is not really in different domain.
BTW, you may want to have a look at https://developer.mozilla.org/en/DOM/window.postMessage maybe it will be helpful in some cases for you.

Categories