I'm trying to record a webpage using NodeJs. I am using PhantomJs to take screenshots of the page and ffmpeg to conver them to video. The problem is that the page I'm using uses 3d transform css, and PhantomJs does not support 3d transforms (http://phantomjs.org/supported-web-standards.html) and everything seems static.
Is there any alternative to PhantomJs that supports 3d transforms? Or maybe a different approach?
It's not necessary to be NodeJs, other languages like Python works too.
Here's the code I'm using right now:
var page = require("webpage").create();
page.viewportSize = { width: 500, height: 860 };
page.open("pageurl", function() {
// Initial frame
var frame = 0;
// Add an interval every 25th second
setInterval(function() {
// Render an image with the frame name
page.render("frames/dragon" + frame++ + ".png", { format: "png" });
// Exit after 50 images
if (frame > 100) {
phantom.exit();
}
}, 25);
});
Phantom JS is no longer being maintained so I would suggest using something like puppeteer which uses headless chrome and gives you the ability to do what you're requesting.
OBS supports this by way of Chromium Embedded Framework. There is an API for OBS, or you can use CEF directly.
An alternative method that I use is the Tab Capture API, by way of browser extension.
Have you tried using puppeteer. It uses chrome and its super fast
They have a simple example code in their readme file
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.screenshot({path: 'example.png'});
await browser.close();
})();
Run this code here: https://try-puppeteer.appspot.com/
Also find more information about the screenshot command in the docs https://github.com/GoogleChrome/puppeteer/blob/v1.15.0/docs/api.md#pagescreenshotoptions
Good luck!
Related
I am currently developing a node.js script that needs to launch a headful chromium instance using Puppeteer and then make a screenshot of a page every 3 seconds, this is my code :
const puppeteer = require('puppeteer');
async function init (){
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto('https://example.com');
screenshot(page)
};
async function screenshot(page){
let buffer = await page.screenshot();
let imageBuffer = buffer.toString('base64');
// save imageBuffer to database
setTimeout(screenshot, 3000, page)
}
My current issue is that I need the user to still be able to normally navigate on the browser and on his computer but this impossible as :
The page lags when making the screenshot as you can see on the following video : https://youtu.be/Tl2w-qKckkc
The browser window focuses and goes on top of all the windows when making the screenshot.
I also tried using Playwright but the same bug occurs when using it with chromium. Can someone please help.
In Playwright, do the following:
// Affects all the platforms.
const page = await browser.newPage({ viewport: null });
// Local fix for those using Apple hardware with Retina displays.
const page = await browser.newPage({ deviceScaleFactor: 2 });
I posted a detailed reply at https://github.com/microsoft/playwright/issues/2576. Please feel free to follow up and ask questions / request features there!
Using Node.js, Chrome and puppeteer as headless on ubuntu server, I'm scraping a few different websites. One of the occasional task is to interact with the loaded page (click on a link to open another page and then possibly do another click to accept the terms and such).
I can do all this just fine, but I'm trying to understand how it will work if I have multiple pages open simultaneously and am trying to interact with different loaded pages at the same time (overlapping times).
To visualize this, I'm thinking how a user will do the same job. They'll have to open multiple browser windows, open the page and switch between them to see and then click on links.
But using puppeteer, we have separate browser object, we don't need to see the window or page to know where to click. We can traverse it through the browser object and then do a click on desired element without looking (headless).
I'm thinking I should be able to do multiple pages at the same time as long as I have CPU and memory available to handle them.
Does anyone have any experience with puppeteer interacting with multiple websites simultaneously? Anything I need to watch out for?
This is the problem the library puppeteer-cluster (I'm the author) is addressing. It allows you to build a pool of pages (or browsers) to use and run tasks inside.
You find several general code samples in the repository (and also on stackoverflow). Let me address your specific use case of running different tasks with an example.
Code Sample
The following code creates two tasks:
crawl: Opens the page and extracts an URL to then start the second task
screenshot: Takes a screenshot of the extracted URL
The process is started by queuing the crawl task with the URLs.
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({ // use four pages in parallel
concurrency: Cluster.CONCURRENCY_PAGE,
maxConcurrency: 4,
});
// We define two tasks
const crawl = async ({ page, data: url }) => {
await page.goto(url);
const extractedURL = /* ... */; // extract an URL (or multiple) from the document somehow
cluster.queue(extractedURL, screenshot);
};
const screenshot = async ({ page, data: url }) => {
await page.goto(url);
await page.screenshot();
};
// Crawl some pages
cluster.queue('https://www.google.com/', crawl);
cluster.queue('https://github.com/', crawl);
// Wait until everything is done and close the cluster
await cluster.idle();
await cluster.close();
})();
This is a minimal example. I left out error handling, monitoring and the setup options.
I can usually get 5 or so browsers going on a 4GB server, if you're just popping urls off a queue it's pretty straightforward:
const puppeteer = require('puppeteer');
let queue = [
'http://www.amazon.com',
'http://www.google.com',
'http://www.fabebook.com',
'http://www.reddit.com',
]
const doQueue = async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
let url
while(url = queue.shift()){
await page.goto(url)
console.log(await page.title())
}
await browser.close()
}
[1,2,3].map(() => doQueue())
What i would like to do, is loading a page, and getting the content of something trough XPath or Selector or JS Path to then use a value got by that into my program. How could i do that ?
For instance on this page, doing a request using the url of the page and following that path (while also targeting the type somehow, here it is the class) :
//*[#id="question-header"]/h1/a
Would give me 'Load any url content and follow XPATH in JS'
As i am getting the text inside this :
Load any url content and follow XPATH in JS
If you need the most reliable way to get some data from a web page — i.e. including the data that can be generated by a JavaScript execution on the client side — you can use some manager of a headless browser. For example, the described task can be accomplished with Node.js and puppeteer in this script (selectors and XPath are supported as well as all the Web API via evaluation of code fragments in browser context and exchanging the data between Node.js and browser contexts):
'use strict';
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto('https://stackoverflow.com/questions/54847748/load-any-url-content-and-follow-xpath-in-js');
const data = await page.evaluate(() => {
return document.querySelector('#question-header > h1 > a').innerText;
});
console.log(data);
await browser.close();
} catch (err) {
console.error(err);
}
})();
Well, you could use something like
document.getElementById('question-header').children[0].children[0].href;
It's not as dynamic as XPATH (redundancy of the children), but should do the trick of you're facing a static structure. For Node.js there are several libraries that could as well do it, such as libxmljs or parse5 - more on this here.
I am trying to use puppeteer to measure how fast a set of web sites loads in my environment. My focus is on the quality of network connection and network speed, so I am happy to know the the time taken for a page to load, for a layman's definition of load, when all images and html is downloaded by browser.
By using puppeteer I can run the test repeatedly and measure the difference in load times precisely.
I can see that in 64.0.3240.0 (r508693) page.getMetrics and event: 'metrics' have landed, which should help me in getting what I am looking for.
But being a newbie in node and js I am not sure how to read the page.getMetrics and which of the different key/value pairs give a useful information in my context.
My current pathetic attempt at reading metrics is as follows:
const puppeteer = require('puppeteer');
async function run() {
const browser = await puppeteer.launch({args: ['--no-sandbox', '--disable-setuid-sandbox']});
const page = await browser.newPage();
page.on('load', () => console.log("Loaded: " + page.url()));
await page.goto('https://google.com');
const metrics = page.getMetrics();
console.log(metrics.Documents, metrics.Frames, metrics.JSEventListeners);
await page.goto('https://yahoo.com');
await page.goto('https://bing.com');
await page.goto('https://github.com/login');
browser.close();
}
run();
Any help in getting this code to some thing more respectable is much appreciated :)
in recent versions you have page.metrics() available:
It will return an object with a bunch of numbers including:
The timestamp when the metrics sample was taken
Combined durations of all page layouts
Combined duration of all tasks performed by the browser.
Check out the docs for the full list
You can use it like this:
await page.goto('https://github.com/login');
const gitMetrics = await page.metrics();
console.log(gitMetrics.Timestamp)
console.log(gitMetrics.TaskDuration)
I'm looking for a way to take a screenshot of a long web page every time it changes. I would like to use Node.js for this. My question is about how to render the full page with images and save it to disk ad an image file.
Most images on the webpage is lazy loaded. So I guess that I need to scroll down the entire page first, before taking a screen shot.
I tried different tools:
casperjs
node-webshot
phantomjs
All of them seems way too complicated, if not impossible, to even install. I didn't succeed with any of them.
casperjs seems like a really nice choice, but I can't get it to work within node.js. It keeps complaining, that casper.start() is not a valid method...
I got closest with node-webshot, but I did not manage to scroll down page.
This is my code so far:
var webshot = require('webshot');
var options = {
shotSize: {
height: 'all',
streamType: 'jpg'
}
};
webshot('www.xx.com', 'xx.com.jpg', options, function(err) {
// screen shot saved to 'xx.com.jpg'
});
BTW I'm developing on a mac. The finished Node app will be on a linux server.
Any comments or experiences are appreciated!
Can't really help with installing CasperJS since on Windows it works by simply using npm install casperjs -g.
I've put up a simple script to do screenshots:
var casper = require('casper').create();
casper.options.viewportSize = {width: 1600, height: 950};
var wait_duration = 5000;
var url = 'http://stackoverflow.com/questions/33803790/capture-screen-shot-of-lazy-loaded-page-with-node-js';
console.log("Starting");
casper.start(url, function() {
this.echo("Page loaded");
});
casper.then(function() {
this.scrollToBottom();
casper.wait(wait_duration, function() {
casper.capture('screen.jpg');
this.echo("Screen captured");
});
});
casper.then(function() {
this.echo("Exiting");
this.exit();
});
casper.run();
The code is fairly straightforward:
Load the url
Scroll to the bottom
Wait for a specific duration (wait_duration) for stuff to load
Do a screenshot
End
Hopefully, that works for you!
this code work for me with node in OSX, save it like test.js and run node test.js in CLI
var webshot = require('webshot');
var options = {
streamType: 'png',
windowSize: {
width: 1024,
height: 768
},
shotSize: {
width: 'all',
height: 'all'
}
};
webshot("blablabla.com","bla-image.png",options,(err) => {
if(err){
return console.log(err);
}
console.log('image succesfully');
});
you can automate it via Selenium, http://webdriver.io/. Yes, it's most like a testing engine, not a screen shot application, but you can fully control the browser automation and see the browser on your display while debugging
Start selenium server, with, for example, Google Chrome
Load your page
Do scrolling, clicking, everything with webdriver.io
Take a picture when you think it's a good time
close session
fast way to install selenium with nodejs -> https://github.com/vvo/selenium-standalone