Can I crawl the html data in my Chrome browser using Python? - javascript

My goal to control my Chrome web browser with Python program.
Especially I'd like to directly get html data of web page that i see.
Currently I'm using chrome extension + python program to resolve it.
See diagram of my Current system.
chrome extension copies [document.documentElement.innerHTML] into clipboard,
python program retrieves the clipboard data and do some work(mouse click or keystroke).
I know the webbrowser module and urllib module in python.
However webbrowser module only open pages in my browser,
and urllib does not interact directly to my chrome browser.
Is there any method to get html data in my active chrome browser directly? like this?

Related

Can you render a selenium chrome instance in javascript (or other code languange) to be displayed on a website?

Could u somehow render a python instance that is running selenium using javascript or something else, and display that render onto a website, which can be viewed by someone using chrome or other browser.

Can website js read web extension console.log

I am in the process of developing a web extension (for firefox) and use console.log a lot during the development process. I do not want my extension to be detected by the website itself, therefore my question:
Can js functionality of the website capture console.log output I generate from within a content script?
Thanks!

Get rendered HTML page file from URL input with browser

What would be fastest and most less consuming (CPU, RAM) way to get JavaScript rendered HTML page and save it on drive based on URL with ordinary browser in headless (Google Chrome or Firefox) mode?
Idea is to also have proxy options in browser changed per request as well.
I'm well aware of Selenium, Puppeteer, PhantomJS and similar solutions. This needs to be done with REAL browser, remotely managed through some API on Linux environment.
I've found only JS API implementations for building addons but haven't found any solutions except Remote browser for which I'm not sure weather is updated any more.
Any pointers, snippets or whatever are more than welcome since I can't find anything.
Is it necessary for the JavaScript rendered HTML
Page to be functional after it is saved?
Just take a screenshot using Python and save it on drive.

How to load JavaScript enabled response using axios or fetch API in JavaScript?

I am working on a personal project in which I want to read the whole HTML of a JavaScript dependent webpage. For Example if I to load this URL in a JavaScript Enabled web browser, this is what I get:
However, if I disable JavaScript in the browser, and load the same URL now, I get this:
This is pretty normal I know.
Now I am trying to load the HTML of the same link in JavaScript code using axios HTTP client, and obviously I am getting the HTML of JavaScript disabled webpage as the HTTP response.
I want to get the HTML(+JS) source as the response of the same link (in which JavaScript is enabled). I don't know how to mimic a JavaScript enabled Web Browser when working with HTTP clients like axios or fetch API.
If you're trying to do this in the browser, you basically can't unless the site you're loading lets you do so (via CORS or similar). You'd have to load it into a window or iframe, wait for its JavaScript to run, and then access the resulting DOM. But accessing the DOM of a cross-origin page is disabled by default.
The only browser-based way I can think of is to write and install a browser extension, since when a user installs an extension, they can grant greater power to the extension than a web page normally has.
If you're trying to do this in a non-browser environment, you can use a headless browser like headless Chromium or similar. The browser-based restrictions don't apply.

How can I Extract JavaScript Source on Chrome Extension or Console-Command?

HTML :
<html> <script scr="http://someurl.com/jscript.js"></script></html>
I'm trying to extract source code of jscript.js in chrome extension.
But there is no attribute of DOM Object holding source of js.
Is there a way to extract the source code of JavaScript which is loaded on page?
(By using DOM object or some internal object, except re-downloading the script)
Because.. Some web-server returns different source code depends on the request-packet (Usually.. BAD servers do that). So If I tried to download it with different request, I can't get the same one that was loaded on the browser.
According to Is external JavaScript source available to scripting context inside HTML page?, it's not normally possible without redownloading since it's not exposed to the DOM.
An extension, however, can hook into information available to the browser.
The simplest would be to create a DevTools extension. It would only work when the DevTools are open on the page, but then you can easily access the source with chrome.devtools.inspectedWindow.getResources().
Somewhat harder, but one can use chrome.debugger API to achieve the same while DevTools are closed. It's a low-level API, but it allows doing everything DevTools can do. I don't have a ready example, but Debugger Protocol docs will help.
Neither is possible from a content script.
You can also go directly to the extension on your file system:
Where to find extensions installed folder for Google Chrome on Mac?
for example on my mac book computer:
pwd
output: ~/Library/Application\ Support/Google/Chrome/Default/Extensions/hkbhjllliedcceblibllaodamehmbfgm/1.7.1_0

Categories