How to fetch javascript heavy pages from chrome extension

How to fetch javascript heavy pages from chrome extension - javascript

I am developing an extension that fetches pages that the user is likely to access on a website. My extension uses jQuery.get() to fetch a page. This works correctly for a site like amazon.com.
But if the user logs in to gmail and I try to fetch some other pages like "account settings", I get an incomplete page. Somewhere in that page, I get the message:
"Your browser does not support Javascript or Javascript has been disabled.As your browser does not support Javascript or has Javascript disabled, we are not able to display the requested page."
Is there some way to fetch complete page in such cases?

I ended up opening a new tab and fetching the page in that tab. Then using content script, I analyze the page data. Sure this is a problem in the sense that a user will see newly opened tab. But then it is also transparent to the user.
If you are developing an extension on Firefox using Jetpack, you can use page-worker which is an invisible page and gives access to the DOM.

Related

How do I get referrer of first document loaded into webbrowser to determine how web-page opened

Background to explain the issue
I have a Java desktop application that as part of a task creates an HTML report and loads it directly into the user's preferred web browser. This is how it works on a PC or mac.
The Java app can also run in remote mode, in this case the whole interface is in a web browser. The main purpose of this is it can then be installed on a NAS and controlled remotely via PC/iPad etcetera, and everything including the HTML report is loaded from the web server running the application.
The trouble with the application is that Safari on mac doesn't like the HTML files to be loaded directly into the web browser, and this (sometimes) prevents the loading of resources and causes the the HTML report to not be rendered correctly.
The solution is for my application to run a simple web server that always serves the report files so the browser would open http://local:4567/report1.html rather than file://C:/application/reportslocation/report1.html -- in all cases.
Cancel Button Problem
This works but now I have another problem: the report has a Cancel button which should not be shown when running in desktop mode, because this is the first page, so it can't be canceled, whereas remote mode takes you to the start page.
What I used to do was to check the currently loaded URL, and if it starts with file:, hide the button:
if (window.location.protocol == 'file:') {
document.getElementById('return').style.visibility='hidden'
}
But now all files are served via HTTP, so I changed it to
if (!document.referrer) {
document.getElementById('return').style.visibility = 'hidden'
}
and it works for the first page of the report because referrer is not set when loading from the desktop app.
However there are multiple pages of reports, so when I click on a link to open another page of the report, the Cancel button is now incorrectly displayed because document.referrer is no longer empty.
So my thought was, if I could get the original referrer when opening a page, I could then correctly check if the Cancel button should be displayed.
Also note, the reports may have been created in desktop mode and then viewed later on in remote mode or vice versa. So the report is only created once and has to be valid for both cases.

If you are using the http protocol and not the file protocol then I think it would be fine to use the browsers localStoreage.
Such as on the first page:
if (!document.referrer) localStorage.setItem('mode', 'desktop')
Then on following pages
if (localStorage.getItem('mode') === 'desktop') document.getElementById('return').style.visibility='hidden';
However if you are using the file protocol then a new store is created for each file, so they won't be transferred/accessible.
Another more hacky option could be to contain this information in your href as a query param if you are in desktop mode or not.

How to get list of network requests done from javascript?

I am loading new web page in new tab from my javascript by writing below code. window.open('www.gmail.com','_blank')
While the new tab is loading I would like to read all redirected url's in the network tab(chrome devtools). I tried to reload all url's using performance.getEntriesByType("resource"); but here the issue is my existing application is refreshing.
To make this question more clear.
1) I have my existing application lets a gmail only.
2) When I click on one button(lets say compose in gmail ), it has to open new web page in different tab.
3) When the new page gets loaded completely I would like to read all loaded urls' in the network tab in chrome dev tools.
4) While reading those url's from chrome network tab my existing page should not refresh.

I don't think that what you want is possible. See Can I programmatically open the devtools from a Google Chrome extension?
You could manually export the network data as JSON (saves as .har). Then JSON.parse() using the console or external script. See Export data from Chrome developer tool.
Edit:
Maybe you can modify the XMLHttpRequest object to add an onOpen onSend listener. See How can I modify the XMLHttpRequest responsetext received by another function?

How to download a page with javascript content

I have a website www.website.com. A web user opens website.come/article.html where there is html text, images.... and javascript content (wich is different for every user).
Now my website is wordpress powered, how can i download the final version (javascript loaded and executed) of the pages opened by my users?
I want to do that because i want to know what content javascript displays for each one of them.
Can i use a php/javascript function or is there any service which do that?

You'll need a headless browser like PhantomJS to visit the page, let the javascript run and then extract the content.
There is a PHP bridge available at https://github.com/diggin/php-PhantomjsRunner, but I don't know whether it's any good.

Access elements on an external page

I have an html page that is being accessed via a link that places an external page in the url - e.g.
http://www.mydomain.com/mypage?external-page=encodedURL
It is the responsibility of my page to scrape some data from the URL it is handed.
How can I access the passed-in page using javascript/jquery? I need to be able to pull out the content for certain classes and ids.
Is this a violation of same origin policy? If so, is there some other way to process an external page like this? Seems strange to me that I can hit the web page in a browser or a terminal command and receive the content, but not in a js file.

You can use a browser extension to scrape the external page, then send the data to your site, OR display it within the page, so that it can then be accessed by your page's javascript via the DOM.
You can use a proxy on your domain which fetches the external page and hands it to your javascript whose origin is on your domain, too.
You can use an API for the external page which is accessible.
You can ask,command, change the code of the external page (if you have access to it) to serve pages with Access-Control-Allow-Origin=*
I think this is all you can do.
EDIT: The "seems strange" is until you realize the intended difference between a user, and a process. The user is not thought to be malicious, but a process could be. A process could for example, grab data from a user's logged in gmail session if it had access to the external page, and transmit that data to a server. Since the user on the terminal is probably (but not always !) the one who logged in to that session, the user is not thought to be malicious. But a script whose origin is some website that user navigates to, should not be able to act with the same permissions as that user. Since that script is an agent as well, and can make actions, but it is not created or directed by the user. That's the strongest reason for the isolation of origin's and the same origin policy.
Example
Execution Context of Bookmarklets, and IFrames
If you are injecting JS into every page via a bookmarklet, then that injected code will behave as if it has the same origin as the rest of the page, or at least the "top frame" of that page. It will execute in the same context as the top frame. If there are nested iframes in the page then you will get an "unsafe attempt to access page x from " error if your bookmarklet tries to inject into there. This is because the bookmarklet has it's origin in the top page, and the top page can never access nested iframes on different domains anyway.
So if some part of the site you wish to scrape is in an iframe below the top frame, your bookmarklet will fail to get it.
Transmitting Data using a bookmarklet
If you want to take a url on one page, on your domain, then grab data from that url, on another domain, then display that data back on the same page, you need a way to get the data across. You could use a bookmarklet but the flow would still involve some "user help". It would go something like this:
Load your domain's page, D. User puts a url into an input box. Clicks submit.
Javascript on D opens a new tab/window pointing to the user provided url.
User clicks your scraping bookmarklet on that external page, which collects the desired data, X.
Desired data, X, is sent via Ajax to a "server", S, with session identifier I.
Page D, polls the server S, until it gets notified that some data with session identifier I has been grabbed, then it gets that data and displays it on D.
There is the need for a server. You can't use local storage to transmit the information since this is specific to a domain. There is an alterative that does not require a server. It requires making a browser extension.
Transmitting data using a browser extension The "background page" of the extension is basically the same as a local server for all the browser tabs, it permits transmitting of information across tabs targeted to different domains. The "clients" in this set up are the "content scripts", which are loaded to every page (just like a bookmarklet, except without the requirement for a user to actually click the bookmarklet to load it. It happens automatically). The flow would go like this:
Page D again. User inputs url in input box. Clicks submit -> which triggers some code in the extension.
The extension background page instructs a tab to open and targets it to the url.
A content script loads automatically into that tab, checks with the background what data it should get. It gets that data, and sends it, via a message (a json string) to the background page.
The background page pushes that notification and the data on to the original contents script on page D. Which displays the information.
Optionally, the background page also transmits the information to your server for saving into that user's datastore.
The language I use for the browser extension "background page" and "content script" is pretty much focussed on Google Chrome. The same concepts are available in Safari, Firefox as well. If you want to support IE you're going to have to work out something else. IE10 does not plan to even support extensions.

If the external page and your page is on the same domain, then you should be able to access that external page using JavaScript. Otherwise, the JavaScript won't be allowed to access the external site, browsers will prevent Cross-site scripting.

Javascript reads previously opening tab html on the save Window

I have a task that i do not know where to start, i hope Stack Overflowers can give me some ideas.
I want to read the html source code of the previously opened and still opening tab in my web page.
My approach was to grab the url of the targeted page, send that url to server and do something, then use it in my web page. But i am facing the "same domain policy" on the server side, i know that JSONP can be used, but i must use POST in this case (other reasons). So i think if the tab (page) has been opened and is still open, there must be some ways that i can read the HTML when my web page is opened.
The flow will be if there is Page1 opening, user opens mywebpage.html on the same Window, mywebpage.html finds there is Page1 opening, then grab the HTML source page and use it.
Thanks!
Edit:
This is the full story.
What I am planning to do is a FireFox plugin. And there is a Button (myPluginButton) on the tool bar.
If user click myPluginButton, the HTML code of the current page will be sent to the server, then server parse the HTML code and generate a report, a new tab then is opened to display this report.
My current approach is to read the HTML of current page using newTabBrowser.contentDocument and send it to server, then do the parsing on server side. But this approach creates extra traffic. The efficient way would be only the url of the current page is sent to the server, and we can read HTML and parse it on the server side. However, the same domain policy does not allow me to do this easily.
So, my question is if it is possible to do when user click myPluginButton to open a new tab, this new tab loop all the opening tabs on the browser and reads the HTML contents of them then generate the report, since these tabs are still opening and the HTML contents must be saved on somewhere ( or i am wrong).
Thanks.

The browsers have a built in protection called same origin policy that prevent a page to read the content of other origin(domain, subdomain, port,...)
If you want to gain access to the current page you can use a bookmarklet.
You ask your users to add it in their bookmarks bar, and each time they want to use it, they don't open a tab but click on the bookmark.
This will load your script in the page, with all access to read the page content.
And oddly enough you can POST from this page to your domain, by posting a FORM to an IFRAME hosted on your domain. But you won't be able to read the response of the POST. You can use a setInterval with a JSONP call to your domain to know if the POST was successful.

We Keep Coding

JavaScript is the programming language of the Web.