I'm trying to run a simple javascript file from the terminal (Ubuntu) that clicks a button on a website. However, I haven't been able to find how to do so, since I've learned that you can't interact with the browser in Node (for doing things like running commands such as window.location.href).
(source: ReferenceError : window is not defined at object. <anonymous> Node.js).
For example, I'd like to be able to create a script (let's call it test.js) where when I run ./test.js or node test.js in the terminal, it will:
go to www.google.com
Click on the "Images" button in the top right.
I wrote out how I understand to do that below:
window.location.href = "https://www.google.com"
document.getElementById('the id of the image button').click()
It seems extremely straightforward, but I am a beginner to Javascript and am not aware of its limitations and could most definitely be wrong about Node. Could someone help explain how I should go about doing something as simple as this? Thanks
EDIT: For clarification on the context, this is just a part of me trying to automate form submissions. I also want to be able to enter specified text into input fields and so on.
Node.js does not include a browser or any browser-like control required to execute the code you posted. Fortunately, this is fairly straightforward with the addition of some extra Node.js software.
What you're looking for is Puppeteer. It's a Node.js library that comes with a small Chrome browser and allows you to remote control that browser from some very easy Node.js functions / methods.
In a directory of your choosing, install puppeteer with npm like so:
npm install -S puppeteer
This will install the library locally into a node_modules/ directory.
Then, you'll need a single javascript file (like test.js in your example) in which you write code like the example in the README (linked above):
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.google.com');
await page.click('the id of the "images" link or some selector');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
Is there a way in which you can get a screenshot of another websites pages?
e.g: you introduce a url in an input, hit enter, and a script gives you a screenshot of the site you put in. I manage to do it with headless browsers, but I fear that could take too much resources and time, to launch. let's say phantomjs each time the input is used the headless browser would need to get the new data, I investigate HotJar, it does something similar to what I'm looking for, but it gives you a script that you must put into the page header, which is fine by me, afterwards, you get a preview, how does it work?, and how can one replicate it?
Do you want a print screen of your page or someone else's?
Own page
Use puppeteer or phantomJS with Beverly build of your site, this way you will only run it when it changes, and have a screenshot ready at any time.
Foreign page
You have access to it (the owner runs your script)
Either try to get into his build pipeline, and use solution from above.
Or use this solution Using HTML5/Canvas/JavaScript to take in-browser screenshots.
You don't have any access
Use some long-running process that will give you screenshot when asked.
Imagine a server with one URL endpoint: screenshot.example.com?facebook.com.
The long-running server has a puppeteer/phantomJS instance ready to go when given URL, it will flood that page, get the screenshot and send it back. The browser will actually think of it as a slow ping image request.
You can make this with puppeteer
install with: npm i puppeteer
save the following code to example.js
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
and run it with:
node example.js
I'm trying to fetch an entire webpage using JavaScript by plugging in the URL. However, the website is built as a Single Page Application (SPA) that uses JavaScript / backbone.js to dynamically load most of it's contents after rendering the initial response.
So for example, when I route to the following address:
https://connect.garmin.com/modern/activity/1915361012
And then enter this into the console (after the page has loaded):
var $page = $("html")
console.log("%c✔: ", "color:green;", $page.find(".inline-edit-target.page-title-overflow").text().trim());
console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
Then I'll get the dynamically loaded activity title as well as the statically loaded page footer:
However, when I try to load the webpage via an AJAX call with either $.get() or .load(), I only get delivered the initial response (the same as the content when over view-source):
view-source:https://connect.garmin.com/modern/activity/1915361012
So if I use either of the the following AJAX calls:
// jQuery.get()
var url = "https://connect.garmin.com/modern/activity/1915361012";
jQuery.get(url,function(data) {
var $page = $("<div>").html(data)
console.log("%c✖: ", "color:red;", $page.find(".page-title").text().trim());
console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
});
// jQuery.load()
var url = "https://connect.garmin.com/modern/activity/1915361012";
var $page = $("<div>")
$page.load(url, function(data) {
console.log("%c✖: ", "color:red;", $page.find(".page-title").text().trim() );
console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
});
I'll still get the initial footer, but won't get any of the other page contents:
I've tried the solution here to eval() the contents of every script tag, but that doesn't appear robust enough to actually load the page:
jQuery.get(url,function(data) {
var $page = $("<div>").html(data)
$page.find("script").each(function() {
var scriptContent = $(this).html(); //Grab the content of this tag
eval(scriptContent); //Execute the content
});
console.log("%c✖: ", "color:red;", $page.find(".page-title").text().trim());
console.log("%c✔: ", "color:green;", $page.find("footer .details").text().trim());
});
Q: Any options to fully load a webpage that will scrapable over JavaScript?
You will never be able to fully replicate by yourself what an arbitrary (SPA) page does.
The only way I see is using a headless browser such as PhantomJS or Headless Chrome, or Headless Firefox.
I wanted to try Headless Chrome so let's see what it can do with your page:
Quick check using internal REPL
Load that page with Chrome Headless (you'll need Chrome 59 on Mac/Linux, Chrome 60 on Windows), and find page title with JavaScript from the REPL:
% chrome --headless --disable-gpu --repl https://connect.garmin.com/modern/activity/1915361012
[0830/171405.025582:INFO:headless_shell.cc(303)] Type a Javascript expression to evaluate or "quit" to exit.
>>> $('body').find('.page-title').text().trim()
{"result":{"type":"string","value":"Daily Mile - Round 2 - Day 27"}}
NB: to get chrome command line working on a Mac I did this beforehand:
alias chrome="'/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'"
Using programmatically with Node & Puppeteer
Puppeteer is a Node library (by Google Chrome developers) which provides a high-level API to control headless Chrome over the DevTools Protocol. It can also be configured to use full (non-headless) Chrome.
(Step 0 : Install Node & Yarn if you don't have them)
In a new directory:
yarn init
yarn add puppeteer
Create index.js with this:
const puppeteer = require('puppeteer');
(async() => {
const url = 'https://connect.garmin.com/modern/activity/1915361012';
const browser = await puppeteer.launch();
const page = await browser.newPage();
// Go to URL and wait for page to load
await page.goto(url, {waitUntil: 'networkidle'});
// Wait for the results to show up
await page.waitForSelector('.page-title');
// Extract the results from the page
const text = await page.evaluate(() => {
const title = document.querySelector('.page-title');
return title.innerText.trim();
});
console.log(`Found: ${text}`);
browser.close();
})();
Result:
$ node index.js
Found: Daily Mile - Round 2 - Day 27
First off: avoid eval - your content security policy should block it and it leaves you open to easy XSS attacks. Scraping bots definitely won't run it.
The problem you're describing is common to all SPAs - when a person visits they get your app shell script, which then loads in the rest of the content - all good. When a bot visits they ignore the scripts and return the empty shell.
The solution is server side rendering. One way to do this is if you're using a JS renderer (say React) and Node.js on the server you can fairly easily build the JS and serve it statically.
However, if you aren't then you'll need to run a headless browser on your server that executes all the JS a user would and then serves up the result to the bot.
Fortunately someone else has already done all the work here. They've put a demo online that you can try out with your site:
I think you should know the concept of SPA,
SPA is Single Page Application, it is only static html file. when the route changs, the page will create or modify DOM nodes dynamically to achieve the effect of switch page by using Javascript.
Therefore, if you use $.get(), the server will response a static html file that has a stable page, so you won't load what you want.
If you wants to use $.get() , it has two ways, the first is using headless browser, for example, headless chrome, phantomJS and etc. It will help you load the page and you can get dom nodes of the loaded page.The second is SSR (Server Slide Render), if you use SSR, you will get HTML data of page directly by $.get, because the server response HTML data of correspond page when requesting different routes.
Reference:
SSR
the SRR frame of vue: Nuxt.js
PhantomJS
Node API of Headless Chrome
I assume this question might have been asked before but after hours of searching, I haven't found anything satisfying.
Here's my question: Is it possible to screenshot a fully rendered web page using JavaScript? A little like what most browsers do on windows on the press of ctrl+p.
I have looked into a lot of alternative solutions like html2Canvas.js
but none suits my needs. The biggest issue being my web page almost entirely rendered on client side using Javascript. This is also why server side solution like PhantomJS are hardly applicable.
I need the screenshots to be printed as image or PDF.
Any idea?
Thanks.
Have you looked into Puppeteer by Google?
If you're able to run it on your server, it might be exactly what you're looking for. See their example code:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
Is it possible to to take a screenshot of a webpage with JavaScript and then submit that back to the server?
I'm not so concerned with browser security issues. etc. as the implementation would be for HTA. But is it possible?
Google is doing this in Google+ and a talented developer reverse engineered it and produced http://html2canvas.hertzen.com/ . To work in IE you'll need a canvas support library such as http://excanvas.sourceforge.net/
I have done this for an HTA by using an ActiveX control. It was pretty easy to build the control in VB6 to take the screenshot. I had to use the keybd_event API call because SendKeys can't do PrintScreen. Here's the code for that:
Declare Sub keybd_event Lib "user32" _
(ByVal bVk As Byte, ByVal bScan As Byte, ByVal dwFlags As Long, ByVal dwExtraInfo As Long)
Public Const CaptWindow = 2
Public Sub ScreenGrab()
keybd_event &H12, 0, 0, 0
keybd_event &H2C, CaptWindow, 0, 0
keybd_event &H2C, CaptWindow, &H2, 0
keybd_event &H12, 0, &H2, 0
End Sub
That only gets you as far as getting the window to the clipboard.
Another option, if the window you want a screenshot of is an HTA would be to just use an XMLHTTPRequest to send the DOM nodes to the server, then create the screenshots server-side.
Another possible solution that I've discovered is http://www.phantomjs.org/ which allows one to very easily take screenshots of pages and a whole lot more. Whilst my original requirements for this question aren't valid any more (different job), I will likely integrate PhantomJS into future projects.
Pounder's if this is possible to do by setting the whole body elements into a canvase then using canvas2image ?
http://www.nihilogic.dk/labs/canvas2image/
A possible way to do this, if running on windows and have .NET installed you can do:
public Bitmap GenerateScreenshot(string url)
{
// This method gets a screenshot of the webpage
// rendered at its full size (height and width)
return GenerateScreenshot(url, -1, -1);
}
public Bitmap GenerateScreenshot(string url, int width, int height)
{
// Load the webpage into a WebBrowser control
WebBrowser wb = new WebBrowser();
wb.ScrollBarsEnabled = false;
wb.ScriptErrorsSuppressed = true;
wb.Navigate(url);
while (wb.ReadyState != WebBrowserReadyState.Complete) { Application.DoEvents(); }
// Set the size of the WebBrowser control
wb.Width = width;
wb.Height = height;
if (width == -1)
{
// Take Screenshot of the web pages full width
wb.Width = wb.Document.Body.ScrollRectangle.Width;
}
if (height == -1)
{
// Take Screenshot of the web pages full height
wb.Height = wb.Document.Body.ScrollRectangle.Height;
}
// Get a Bitmap representation of the webpage as it's rendered in the WebBrowser control
Bitmap bitmap = new Bitmap(wb.Width, wb.Height);
wb.DrawToBitmap(bitmap, new Rectangle(0, 0, wb.Width, wb.Height));
wb.Dispose();
return bitmap;
}
And then via PHP you can do:
exec("CreateScreenShot.exe -url http://.... -save C:/shots domain_page.png");
Then you have the screenshot in the server side.
This might not be the ideal solution for you, but it might still be worth mentioning.
Snapsie is an open source, ActiveX object that enables Internet Explorer screenshots to be captured and saved. Once the DLL file is registered on the client, you should be able to capture the screenshot and upload the file to the server withing JavaScript. Drawbacks: it needs to register the DLL file at the client and works only with Internet Explorer.
We had a similar requirement for reporting bugs. Since it was for an intranet scenario, we were able to use browser addons (like Fireshot for Firefox and IE Screenshot for Internet Explorer).
This question is old but maybe there's still someone interested in a state-of-the-art answer:
You can use getDisplayMedia:
https://github.com/ondras/browsershot
The SnapEngage uses a Java applet (1.5+) to make a browser screenshot. AFAIK, java.awt.Robot should do the job - the user has just to permit the applet to do it (once).
And I have just found a post about it:
Stack Overflow question JavaScript code to take a screenshot of a website without using ActiveX
Blog post How SnapABug works – and what they should do
I found that dom-to-image did a good job (much better than html2canvas). See the following question & answer: https://stackoverflow.com/a/32776834/207981
This question asks about submitting this back to the server, which should be possible, but if you're looking to download the image(s) you'll want to combine it with FileSaver.js, and if you want to download a zip with multiple image files all generated client-side take a look at jszip.
You can achieve that using HTA and VBScript. Just call an external tool to do the screenshotting. I forgot what the name is, but on Windows Vista there is a tool to do screenshots. You don't even need an extra install for it.
As for as automatic - it totally depends on the tool you use. If it has an API, I am sure you can trigger the screenshot and saving process through a couple of Visual Basic calls without the user knowing that you did what you did.
Since you mentioned HTA, I am assuming you are on Windows and (probably) know your environment (e.g. OS and version) very well.
If you are willing to do it on the server side, there are options like PhantomJS, which is now deprecated. The best way to go would be Headless Chrome with something like Puppeteer on Node.JS. Capturing a web page using Puppeteer would be as simple as follows:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
However it requires headless chrome to be able to run on your servers, which has some dependencies and might not be suitable on restricted environments. (Also, if you are not using Node.JS, you might need to handle installation / launching of browsers yourself.)
If you are willing to use a SaaS service, there are many options such as
Restpack
UrlBox
Screenshot Layer
A great solution for screenshot taking in Javascript is the one by https://grabz.it.
They have a flexible and simple-to-use screenshot API which can be used by any type of JS application.
If you want to try it, at first you should get the authorization app key + secret and the free SDK
Then, in your app, the implementation steps would be:
// include the grabzit.min.js library in the web page you want the capture to appear
<script src="grabzit.min.js"></script>
//use the key and the secret to login, capture the url
<script>
GrabzIt("KEY", "SECRET").ConvertURL("http://www.google.com").Create();
</script>
Screenshot could be customized with different parameters. For example:
GrabzIt("KEY", "SECRET").ConvertURL("http://www.google.com",
{"width": 400, "height": 400, "format": "png", "delay", 10000}).Create();
</script>
That's all.
Then simply wait a short while and the image will automatically appear at the bottom of the page, without you needing to reload the page.
There are other functionalities to the screenshot mechanism which you can explore here.
It's also possible to save the screenshot locally. For that you will need to utilize GrabzIt server side API. For more info check the detailed guide here.
As of today Apr 2020 GitHub library html2Canvas
https://github.com/niklasvh/html2canvas
GitHub 20K stars | Azure pipeles : Succeeded | Downloads 1.3M/mo |
quote : " JavaScript HTML renderer The script allows you to take "screenshots" of webpages or parts of it, directly on the users browser. The screenshot is based on the DOM and as such may not be 100% accurate to the real representation as it does not make an actual screenshot, but builds the screenshot based on the information available on the page.
I made a simple function that uses rasterizeHTML to build a svg and/or an image with page contents.
Check it out :
https://github.com/orisha/tdg-screen-shooter-pure-js