Using puppeteer as a semi-automated system

Using puppeteer as a semi-automated system - javascript

I've got a certain task which I would normally accomplish with a chrome extension.
The only thing I cannot perform with a chrome extension is to make a screenshot of a node. I've been trying to use debugger api, but I couldn't make it snap a proper screenshot of an element when it goes beyond the viewport (even using the captureBeyondViewport property and some other stuff).
So my goal right now is to make puppeteer a semi-automated tool. It should open a browser and work in a concept similar to chrome extensions. So I have to be able to run certain code on any of the tabs of the puppeteer browser instance. Just like content scripts do. Once something happens (for example, clicking on certain element) the puppeteer should make certain actions (eg make a screenshot of the node/full-page screenshot etc). It must be able to cover multiple tabs and perform an action only from the tab where the action was performed.
If anyone can give me a good option to test with the chrome extension, that would be even better. The goal is to be able to capture screenshot of an element even if it's big enough to go beyond the viewport.
PS: html2canvas isn't an option because the website I need to work on the screenshots lack of images and basically look a bit weird.

Related

How to block auto-scroll functionality of full page auto-scroll screenshot extensions?

I have a membership website where I want to prevent people from taking full page screenshots of the wall of text I have in my members area, using auto-scroll screenshot browser extensions like: Fireshot and GoFullPage.
I am looking for a script I can embed in my page that will break the auto-scroll functionality of these screenshot extensions. When the screenshot is being taken if the auto-scroll feature can be somehow broken, it will prevent the whole page from being captured in the screenshot using these extensions.
Not sure if its possible but it will definitely be very useful in stopping people from stealing the text content behind the paywall. I have enabled a script that prevents copy and right click. Now the only thing they can do is take auto-scroll screenshots of the long page. If I can prevent that as well, the only thing pirates can do is take multiple manual screenshots of the long page and manually stitch it together in photoshop. Just trying to make life difficult for them.

Screenshots are not controlled by the web browser, they are controlled by software running on the user's operating system and cannot be remotely controlled by a web server. There's a lot you can do to make it harder (javascript to capture the prtsc key press and return false, flicker different quadrants of the screen at intervals so that it's too fast for human eyes to really tell but so that the whole screen never shows at once, etc) but fundamentally anyone who's even vaguely tech-savy can bypass this in about 5 seconds. Also, a google search should have answered this for you in way less time than it took to ask.
You cannot block the print screen button or the snipping tool in Windows or the Grab application included with Mac OS or any other tool on any other OS. I hope I helped 😊

JavaScript emulate window being in focus

For a project I need to be able to pull data from a minimized tab that displays the Instagram website. Everything works except for the "minimized" part. As soon as the window is minimized, and I refresh the page, there is no image data coming in. This is why I think it is necessary to "trick" the website into thinking that it's onscreen in order for it to load data.
Steps to reproduce:
open https://www.instagram.com/
right-click > Inspect (Developer Tools, please undock for windowed mode)
confirm that the _4rbun elements contain a srcset with the images
minimize Instagram tab
in the console, type location.reload() to reload the page
in dev tools, confirm that there are 0 or 1 _4rbun elements which however don't contain a picture
open the minimized tab
confirm that there are numerous _4rbun elements which all contain links in their srcsets
Similar behavior can be found in the /explore path except that this, on the contrary, at least contains links to the pages displaying the image (but still no direct link to the .jpg)
Is there anything i can do about this? Other websites use things like Store.js to keep track of their data but I couldn't find anything like this in the website's window object so either going the "hacky" focus way or the much cleaner "Store"-kind of way would be fine.
What I've tried was to override hasFocus like the following document.hasFocus = function(){return true}; which does keep the value on true but doesn't seem to affect the website's behavior at all. And I have tried finding an object carrying links etc. but this has been unsuccessful so far, too.

How to trigger a click event of a button with javascript on an arbitrary site

I need to navigate through a particular website, frequently, to get at some sub-page that is several layers beyond the front page and it is taking too much time to click and scroll and click and scroll to get at the desired final screen where I enter the search string. So, I would like to automate the process by making Javascript trigger the right button events to get me to the distant page where I can enter the search string manually.
So, I know how the code needed to trigger the event,
document.getElementById('x').click();
but how can I implement this inside my browser, since this is not my own website?

If this is going thru different pages, then probably a Web UI automation tool would be the best (like Selenium - http://www.seleniumhq.org).
as #elcarns says, if you need to inject code into another's website, you could do so opening the console (view --> developers --> javascript console in Chrome).
Another, more complex way to do it when you have to traverse several pages is by developing a plugin.

javascript:document.getElementById('x').click(); in the url bar. You can probably make a bookmarklet for it as well.

Difference in browser behavior when clicking a link vs. copy-pasting it?

I'm curious what sort of difference there is in browser behavior when loading a page by clicking through a link vs. copy-pasting said link into the browser bar. Is there a general difference in the load process that I should know about?
I ask because I am working on an application using the Google Maps API, in which the user is able to create custom map links with a GPS location defined in the URL. The links work fine when copy-pasted into the browser; however, if the link is clicked directly, the browser generates a 'stack exceeded' exception (appropriate for this site!).
While I understand that generally we like to see code examples on Stack Overflow, I'm going to refrain from that for now - perhaps if anyone knows about differences in the DOM loading process between click and copy-paste, I can use that to narrow in on the issue without bogging down the discussion with tangentially-related code.
The same issue occurs in both Firefox and Chrome.
thanks!

Instead of copying and pasting the link, try right clicking and copying the link address. Basically, the links on a web page can display differently than they are.

Automated conditional browsing in background with chrome extension

I am researching the possibility that I might be able to use a Chrome extension to automate browsing and navigation (conditionally). My hope is that the extension can load a remote page (in the background) and inject a javascript to evaluate clickable links and click (by calling the click method) the appropriate (evaluated by some javascript logic) link, then repeat process for the resulting page.
My ability to javascript is not the problem - but I am struggling to discern whether (or not) a chrome extension can load pages in the back and inject script into them (making the DOM accessible).
I would be pleased if anyone could confirm (or deny) the ability to do so - and if so, some helpful pointers on where I should research next.
#Rob W - it seems the experimental features fit the bill perfectly. But my first tests seem to show the features are still very experimental ... ie. no objects get returned from callbacks:
background.html
function getAllosTabs(osTabs){
var x = osTabs;
alert(x.length); // error: osTabs is undefined
}
function createOffScreenTabCallback(offscreenTab){
document.write("offscreen tab created");
chrome.experimental.offscreenTabs.getAll(getAllosTabs);
alert(offscreenTab); // error: offscreenTab is undefined
}
var ostab = chrome.experimental.offscreenTabs.create({"url":"http://www.google.com"}, createOffScreenTabCallback)
alert(ostab); // error: ostab is undefined
Some further digging into the chromium source code on github revealed a limitation creating offscreenTab from background:
Note that you can't create offscreen tabs from background pages, since they
don't have an associated WebContents. The lifetime of offscreen tabs is tied
to their creating tab, so requiring visible tabs as the parent helps prevent
offscreen tab leaking.
So far it seems like it is unlikely that I can create an extension that browses (automatically and conditionally) in the background but I'll still keep trying - perhaps creating it from script in the popup might work. It won't run automatically at computer startup but it will run when the browser is open and the user clicks the browseraction.
Any further suggestions are highly welcome.

Some clarifications:
there's no "background" tabs except extension's background page (with iframes) but pages loaded in iframes can know they are being loaded in frames and can break or break at even the best counter-framebreaker scripts
offscreenTab is still experimental and is very much visible, as its intended use is different from what you need it for
content scripts, and chrome.tabs.update() are the only way handle the automated navigation part; aside being extremely harsh to program, problems and limitations are numerous, including CSP (Content-Security-Policy), their isolated context isolating event data, etc.
Alternatives... not many really. The thing is you're using your user's computer and browser to do your things and regardless of how dirty they are not, chrome's dev team still won't like it and will through many things at you and your extension (like the v2 manifest).
You can use NPAPI to start another instance chrome.exe --load-extension=iMacros.

We Keep Coding

JavaScript is the programming language of the Web.

Using puppeteer as a semi-automated system - javascript

Related

How to block auto-scroll functionality of full page auto-scroll screenshot extensions?

JavaScript emulate window being in focus

How to trigger a click event of a button with javascript on an arbitrary site

Difference in browser behavior when clicking a link vs. copy-pasting it?

Automated conditional browsing in background with chrome extension

Categories

Resources