How to load only html from web pages in selenium - javascript

How to load only html from web pages in selenium?
I need only html of requested page without css and javascript.

If you need selenium for web-scraping, strictly speaking, you would still need need javascript and css files since they can take a significant part in the page load and rendering. For example, several parts of a page can be loaded with additional ajax calls, or inserted via a custom javascript logic.
Also, if you want only HTML part of a page, why do you need to involve a real browser?
If you still want to prevent js and css files from loading, you can configure certain permissions in Firefox through tweaking FirefoxProfile preferences, see:
Do not want images to load and CSS to render on Firefox in Selenium WebDriver tests with Python
FirefoxDriver: how to disable javascript,css and make sendKeys type instantly?

Related

HtmlAgilityPack download webpage which loaded asynchronously by javascript

i am using HtmlAgilityPack and trying to load some webpages. some webpages are javascript based and loads asynchoronously. is there any way to load web page after x seconds or after making sure page is completely loaded
Html Agility Pack is not mimicking the client side calls to dynamically load content into the DOM. It is a headless scraper that is downloading the static page given by the server; if you want that content, you will have to mimic the calls made by the client browser. If you do not want to try to emulate the calls a browser would make, instead of using a headless scraper, you can use something like Selenium to do this for you, the down side being, the browser will be opened on the host machine.

Can I grab specific page HTML code from another webpage through Javascript?

I've read about how there are multiple methods to grabbing source code from another webpage via jQuery or using Cross-Domain Requests. What I want to try and do is make it so I grab a div that has different code each time a page is loaded and not the source code as a whole. So for example, the greater detail you see when you use 'inspect element' or a tool like firebug to dive deeper into the page code.
Would I be using one of the same methods?
Yes.
If you control BOTH domains you can add the Access_control_CORS
header to allow access of cross domain requests and use a
headless browser like phantomJS to grab a cached version of
rendered HTML page.
If you don't control both domains you will have to write a server
side proxy to grab the page and all its resources (you will have
to parse the page to get or rewrite links to images, javascripts,
stylesheets etc...) then run it through phantomJS to create a
HTML snapshot.`
source:
https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy
https://developer.mozilla.org/en-US/docs/Web/HTTP/Access_control_CORS
http://phantomjs.org/
NOTE: despite my best efforts, stack overflow is absolutely convinced these links are code. Sorry for posting as code.

Append javascript/html to page when navigating from a different page?

Alright, first off this is not a malicious question I'm asking. I have no intentions of using any info for ill gains.
I have an application that contains an embedded browser. This browser runs within the application's process, so I can't access it via Selenium WebDriver or anything like that. I know that it's possible to dynamically append scripts and html to loaded web pages via WebDriver, because I've done it.
In the embedded browser, I don't have access to the pages that get loaded. Instead, I can create my own html/javascript pages and execute them, to manipulate the application that houses the browser. I'm having trouble manipulating the existing pages within the browser.
Is there a way to dynamically add javascript to a page when you navigate to it and have it execute right after the page loads?
Something like
page1.navigateToUrl(executeThisScriptOnLoad)
page2 then executes the passed script.
I guess it is not possible to do it without knowledge of destination site. Although you can send data to the site and then use eval() function to evaluate sent data on destination page.

Override to read local JS-file in web app wrapper

Im looking into creating a web wrapper for a existing web app. I clearly want to make it as quick as possible.
Is it possible to host the JS-files locally, instead of having to download the file, without altering the existing web app?
Using a WebViewClient you can prevent loading the javascript from the web server (edit only in API level 11 and higher unfortunately). Or you can disable JavaScript, load the page, then enable JavaScript again. After the page is loaded you can modify the DOM using javascript: urls to load the scripts from a local url (like file:///android_asset from the top of my head).
You can also change the cache strategy of the WebView so that it will never fetch anything that is already fetched once before, which might also be what you want in this case. These are set in http://developer.android.com/reference/android/webkit/WebSettings.html, you could set it to LOAD_CACHE_ELSE_NETWORK in this case.

Disabling loading specific JavaScript files with Firefox

I am looking for a way to prevent loading a specific JavaScript file on a website for any website of choice, with Firefox.
For example:
Say I don't want to load jQuery (when loading the page, not afterwards 'disabling' it). I then want to be able to set that
http://ajax.googleapis.com/ajax/libs/jquery/1.5.2/jquery.min.js
should not be loaded. The browser should complete ignore this to debug other JavaScript on the website. I don't have access to the domain directly, so that is why I am trying to do this via the browser.
So for clarity: :) I don't want to disable scripts from a certain domain, but want to be able to disable certain scripts. It can be that 10 scripts are on 1 domain, so killing all 10 of them is not what I want; in that case I want to prevent loading only one.
Is there a way to do so?
Several options:
Use the Addon "Adblock Plus". It will probably still accesses the js but does not execute it.
Use the Addon "Greasemonkey", which - when cofigured right - does not even touch the js-url. But its generally harder to configure right. ;)
Have a look at Firefox's buildin security policies: http://kb.mozillazine.org/Security_Policies Here you can block javascript on an url or even function-level
Go to your hosts file C:\Windows\System32\drivers\etc (Windows) or /etc/hosts (Linux).
Add:
127.0.0.1 ajax.googleapis.com (separated by a tab)
And reopen your browser
This way the jQuery file will fail to load.

Categories