Capture screen shot of lazy loaded page with Node.js - javascript

I'm looking for a way to take a screenshot of a long web page every time it changes. I would like to use Node.js for this. My question is about how to render the full page with images and save it to disk ad an image file.
Most images on the webpage is lazy loaded. So I guess that I need to scroll down the entire page first, before taking a screen shot.
I tried different tools:
casperjs
node-webshot
phantomjs
All of them seems way too complicated, if not impossible, to even install. I didn't succeed with any of them.
casperjs seems like a really nice choice, but I can't get it to work within node.js. It keeps complaining, that casper.start() is not a valid method...
I got closest with node-webshot, but I did not manage to scroll down page.
This is my code so far:
var webshot = require('webshot');
var options = {
shotSize: {
height: 'all',
streamType: 'jpg'
}
};
webshot('www.xx.com', 'xx.com.jpg', options, function(err) {
// screen shot saved to 'xx.com.jpg'
});
BTW I'm developing on a mac. The finished Node app will be on a linux server.
Any comments or experiences are appreciated!

Can't really help with installing CasperJS since on Windows it works by simply using npm install casperjs -g.
I've put up a simple script to do screenshots:
var casper = require('casper').create();
casper.options.viewportSize = {width: 1600, height: 950};
var wait_duration = 5000;
var url = 'http://stackoverflow.com/questions/33803790/capture-screen-shot-of-lazy-loaded-page-with-node-js';
console.log("Starting");
casper.start(url, function() {
this.echo("Page loaded");
});
casper.then(function() {
this.scrollToBottom();
casper.wait(wait_duration, function() {
casper.capture('screen.jpg');
this.echo("Screen captured");
});
});
casper.then(function() {
this.echo("Exiting");
this.exit();
});
casper.run();
The code is fairly straightforward:
Load the url
Scroll to the bottom
Wait for a specific duration (wait_duration) for stuff to load
Do a screenshot
End
Hopefully, that works for you!

this code work for me with node in OSX, save it like test.js and run node test.js in CLI
var webshot = require('webshot');
var options = {
streamType: 'png',
windowSize: {
width: 1024,
height: 768
},
shotSize: {
width: 'all',
height: 'all'
}
};
webshot("blablabla.com","bla-image.png",options,(err) => {
if(err){
return console.log(err);
}
console.log('image succesfully');
});

you can automate it via Selenium, http://webdriver.io/. Yes, it's most like a testing engine, not a screen shot application, but you can fully control the browser automation and see the browser on your display while debugging
Start selenium server, with, for example, Google Chrome
Load your page
Do scrolling, clicking, everything with webdriver.io
Take a picture when you think it's a good time
close session
fast way to install selenium with nodejs -> https://github.com/vvo/selenium-standalone

Related

Google VR View for the Web

I'm trying to embed some 360 images on my site using Google VR View, but I'm having no luck getting anything to work. I'm following the Google provided documentation as a guide...
https://developers.google.com/vr/concepts/vrview-web
`window.addEventListener('load', onVrViewLoad)
function onVrViewLoad() {
var vrView = new VRView.Player('#vrview', {
image: 'img/jtree.jpg',
is_stereo: false
});
}`
I copied the example code, and am getting errors in the console (see attached screen shots)
Console Errors
Does anyone know of a tutorial that would better outline how to use this? Or possibly can someone shed some light on what I may be doing incorrectly?
You have to open your HTML file in a server.
Enable CORS https://enable-cors.org/server.html.
I find an easy way to enable CORS with the Chrome web server just for your experiment purpose.
I had absolutely no luck getting this to work with the instructions provided by Google - guess I'm not versed enough in coding. For me it only worked when I used the iframe, see https://www.museum-joanneum.at/spielwiese/360.
However, the view is still not exactly the same, the info-Tag is not layered over the image in the left lower corner like the demo on Google, but on top of the image and reduces the image height by about 30 pixels. Maybe that's related to the iframe since the instructions state, that the functionality isn't exactly the same as with the JavaScript API.
Also, for my images I had to select "false" for stereo in order to display correctly.
I hope this helps!
Looks like you are not setting up the directories properly. Is your web server set up so the root is the root of the repo? Are you also getting a 404 error? (looks like vrview.js is not being loaded)
As for places to get help with this, I recommend the vrview-web google group.
you need add this html id on web page.
<div id="vrview"></div>
Below JavaScript will call out the image on HTML.
var vrView;
var scenes = {
petra: {
image: 'images/petra.jpg',
preview: 'images/petra-preview.jpg'
}
}
function onLoad() {
vrView = new VRView.Player('#vrview', {
width: '100%',
height: 480,
image: 'images/blank.png',
is_stereo: false,
is_autopan_off: true
});
vrView.on('ready', onVRViewReady);
vrView.on('modechange', onModeChange);
vrView.on('getposition', onGetPosition);
vrView.on('error', onVRViewError);
}
function loadScene(id) {
console.log('loadScene', id);
// Set the image
vrView.setContent({
image: scenes[id].image,
preview: scenes[id].preview,
is_autopan_off: true
});
}
function onVRViewReady(e) {
console.log('onVRViewReady');
loadScene('petra');
}
function onModeChange(e) {
console.log('onModeChange', e.mode);
}
function onVRViewError(e) {
console.log('Error! %s', e.message);
}
function onGetPosition(e) {
console.log(e)
}
window.addEventListener('load', onLoad);
You can run those scripts only on the server and then only it will render the texture so you can add your all files in the wamp server path and access through or you can create a web project in asp.net, add your files and build the project. everything will be taken care of by the Visual Studio.
For example
download this sample Code
Add this in your wamp server path or create a project in visual
studio and add these files
Open your index.html file the in browser

screen shot with phantomjs is not accurate as its seeing in browser

Hi i am working with PhantomJs to capture screen from url. But it seem sresult not accurate.
version of PhantomJS : 1.9.8, Operating system : Ubuntu 14
With this below code i tried to capture screen from a url but it seems its not giving perfect screen shot..
Or i am doing something wrong ?
See on header part of this website and screen shot..both are not similar.
Result screen shot : http://www.awesomescreenshot.com/image/2275399/7cf995d2e287cb87c4ca4895b6b69934
Website which i am trying to capture: http://www.whiteboardexplainers.com/
var system = require("system");
if (system.args.length > 0) {
var page = require('webpage').create();
page.viewportSize = {width: 1280, height: 1024};
page.open(system.args[1], function() {
var pageTitle = system.args[1].replace(/http.*\/\//g, "").replace("www.", "").split("/")[0]
var filePath = "pageTitle + '.png';
window.setTimeout(function () {
page.evaluate(function() {
document.body.bgColor = 'white';
});
page.render(filePath);
console.log(filePath);
phantom.exit();
}, 200);
});
}
You use a very outdated PhantomJS version. Considering that, screnshot looks very good. Upgrade to a modern PhantomJS version: 2.1.1 or even better 2.5 beta. Get them here: PhantomJS downloads archive.
But even a modern version does not support showing videos, so that will not work anyway.
In your case it does not seem relevant but often is: it is advisable to declare a useragent string of a modern browser. Otherwise many sites show a mobile version of their pages.

Retrieve html content of a page several seconds after it's loaded

I'm coding a script in nodejs to automatically retrieve data from an online directory.
Knowing that I had never done this, I chose javascript because it is a language I use every day.
I therefore from the few tips I could find on google use request with cheerios to easily access components of dom of the page.
I found and retrieved all the necessary information, the only missing step is to recover the link to the next page except that the one is generated 4 seconds after loading of page and link contains a hash so that this step Is unavoidable.
What I would like to do is to recover dom of page 4-5 seconds after its loading to be able to recover the link
I looked on the internet, and much advice to use PhantomJS for this manipulation, but I can not get it to work after many attempts with node.
This is my code :
#!/usr/bin/env node
require('babel-register');
import request from 'request'
import cheerio from 'cheerio'
import phantom from 'node-phantom'
phantom.create(function(err,ph) {
return ph.createPage(function(err,page) {
return page.open(url, function(err,status) {
console.log("opened site? ", status);
page.includeJs('http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js', function(err) {
//jQuery Loaded.
//Wait for a bit for AJAX content to load on the page. Here, we are waiting 5 seconds.
setTimeout(function() {
return page.evaluate(function() {
var tt = cheerio.load($this.html())
console.log(tt)
}, function(err,result) {
console.log(result);
ph.exit();
});
}, 5000);
});
});
});
});
but i get this error :
return ph.createPage(function (page) {
^
TypeError: ph.createPage is not a function
Is what I am about to do is the best way to do what I want to do? If not what is the simplest way? If so, where does my error come from?
If You dont have to use phantomjs You can use nightmare to do it.
It is pretty neat library to solve problems like yours, it uses electron as web browser and You can run it with or without showing window (You can also open developer tools like in Google Chrome)
It has only one flaw if You want to run it on server without graphical interface that You must install at least framebuffer.
Nightmare has method like wait(cssSelector) that will wait until some element appears on website.
Your code would be something like:
const Nightmare = require('nightmare');
const nightmare = Nightmare({
show: true, // will show browser window
openDevTools: true // will open dev tools in browser window
});
const url = 'http://hakier.pl';
const selector = '#someElementSelectorWitchWillAppearAfterSomeDelay';
nightmare
.goto(url)
.wait(selector)
.evaluate(selector => {
return {
nextPage: document.querySelector(selector).getAttribute('href')
};
}, selector)
.then(extracted => {
console.log(extracted.nextPage); //Your extracted data from evaluate
});
//this variable will be injected into evaluate callback
//it is required to inject required variables like this,
// because You have different - browser scope inside this
// callback and You will not has access to node.js variables not injected
Happy hacking!

Where are screenshots from phantom.js saved?

Just starting out with Phantom.js after installing via Homebrew on my mac.
I'm trying out the examples to save screenshots of websites via https://github.com/ariya/phantomjs/wiki/Quick-Start
var page = require('webpage').create();
page.open('http://google.com', function () {
page.render('google.png');
phantom.exit();
});
But I don't see the images anywhere. Will they be in the same directory as the .js file?
PhantomJS usually renders the images to the same directory as the script that you're running. So yes, it should be in the same directory as the JavaScript file that you're running using PhantomJS.
EDIT
It appears that that particular example is flawed. The problem is that page.render(...); takes some time to render the page, but you're calling phantom.exit() before it has finished rendering. I was able to get the expected output by doing this:
var page = require('webpage').create();
page.open('http://google.com', function () {
page.render('google.png');
setTimeout(function() { phantom.exit(); }, 5000) // wait five seconds and then exit;
});
Unfortunately this isn't ideal, so I was able to come up with something that's a hair better. I say a "hair", because I'm basically polling to see when the page has finished rendering:
var done = false; //flag that tells us if we're done rendering
var page = require('webpage').create();
page.open('http://google.com', function (status) {
//If the page loaded successfully...
if(status === "success") {
//Render the page
page.render('google.png');
console.log("Site rendered...");
//Set the flag to true
done = true;
}
});
//Start polling every 100ms to see if we are done
var intervalId = setInterval(function() {
if(done) {
//If we are done, let's say so and exit.
console.log("Done.");
phantom.exit();
} else {
//If we're not done we're just going to say that we're polling
console.log("Polling...");
}
}, 100);
The code above works because the callback isn't immediately executed. So the polling code will start up and start to poll. Then when the callback is executed, we check to see the status of the page (we only want to render if we were able to load the page successfully). Then we render the page and set the flag that our polling code is checking on, to true. So the next time the polling code runs, the flag is true and so we exit.
This looks to be a problem with the way PhantomJS is running the webpage#render(...) call. I suspected that it was a non-blocking call, but according to the author in this issue, it is a blocking call. If I had to hazard a guess, perhaps the act of rendering is a blocking call, but the code that does the rendering might be handing off the data to another thread, which handles persisting the data to disk (so this part might be a non-blocking call). Unfortunately, this call is probably still executing when execution comes back to the main script and executes phantom.exit(), which means that the aforementioned asynchronous code never gets a chance to finish what it's doing.
I was able to find a post on the PhantomJS forums that deals with what you're describing. I can't see any issue that has been filed, so if you'd like you can go ahead and post one.
I have the very same issue as the author of this post, and none of the code examples worked for me. Kind of disorienting to have the second example in Phantom.js documentation not work. Installed using Home Brew on Snow Leopard.
I found a working example
var page = require("webpage").create();
var homePage = "http://www.google.com/";
page.settings.javascriptEnabled = false;
page.settings.loadImages = false;
page.open(homePage);
page.onLoadFinished = function(status) {
var url = page.url;
console.log("Status: " + status);
console.log("Loaded: " + url);
page.render("google.png");
phantom.exit();
};
Just a quick help for people who come here looking for the directory where PhantomJS or CasperJS's screenshots are saved: it is in the scripts directory by default. However, you do have control.
If you want to control where they are saved you can just alter the filename like so:
page.render('screenshots/google.jpg'); // saves to scriptLocation/screenshots/
page.render('/opt/google.jpg'); // saves to /screenshots (in the root)
Or if you use CasperJS you can use:
casper.capture('/opt/google.jpg',
undefined,
{ // imgOptions
format: 'jpg',
quality: 25
});
Hope this saves someone the trouble!
I'm not sure if something has changed, but the example on http://phantomjs.org/screen-capture.html worked fine for me:
var page = require('webpage').create();
page.open('http://github.com/', function() {
page.render('github.png');
phantom.exit();
});
I am running phantomjs-2.1.1-windows.
However, what led me to this thread was initially the image file was never getting created on my disk just like the original question. I figured it was some kind of security issues so I logged into a VM and started by putting everything in the same directory. I was using a batch file to kick off phantomjs.exe with my .js file passed in as a parameter. With all of the files in the same directory on my VM, it worked great. Through trial and error, I found that my batch file had to be in the same directory as my .js file. Now all is well on my host machine as well.
So in summary...it was probably a security related problem for me. No need for adding any kind of timeout.
My answer to the original question would be IF you have everything set up correctly, the file will be in the same directory as the .js file that phantomjs.exe runs. I tried specifying a fully qualified path for the image file but that didn't seem to work.
The root cause is that page.render() may not be ready to render the image even during the onLoadFinished() event. You may need to wait upwards of several seconds before page.render() can succeed. The only reliable way I found to render an image in PhantomJS is to repeatedly invoke page.render() until the method returns true, indicating it successfully rendered the image.
Solution:
var page = require("webpage").create();
var homePage = "http://www.google.com/";
page.onLoadFinished = function(status) {
var rendered, started = Date.now(), TIMEOUT = 30 * 1000; // 30 seconds
while(!((rendered = page.render('google.png')) || Date.now() - started > TIMEOUT));
if (!rendered) console.log("ERROR: Timed out waiting to render image.")
phantom.exit();
};
page.open(homePage);
Steal from rasterize.js example, it works more reliable than the accepted answer for me.
var page = require('webpage').create();
page.open('http://localhost/TestForTest/', function (status) {
console.log("starting...");
console.log("Status: " + status);
if (status === "success") {
window.setTimeout(function () {
page.render('myExample.png');
phantom.exit();
}, 200);
} else {
console.log("failed for some reason.");
}
});
Plenty of good suggestions here. The one thing I'd like to add:
I was running a phantomjs docker image, and the default user was "phantomjs", not root. I was therefore trying to write to a location that I didn't have permission on (it was the pwd on the docker host)...
> docker run -i -t -v $(pwd):/pwd --rm wernight/phantomjs touch /pwd/foo.txt
touch: cannot touch '/pwd/foo.txt': Permission denied
The code(s) above all run without error, but if they don't have permission to write to the destination then they will silently ignore the request...
So for example, taking #vivin-paliath's code (the current accepted answer):
var done = false; //flag that tells us if we're done rendering
var page = require('webpage').create();
page.open('http://google.com', function (status) {
//If the page loaded successfully...
if(status === "success") {
//Render the page
page.render('google.png');
console.log("Site rendered...");
//Set the flag to true
done = true;
}
});
//Start polling every 100ms to see if we are done
var intervalId = setInterval(function() {
if(done) {
//If we are done, let's say so and exit.
console.log("Done.");
phantom.exit();
} else {
//If we're not done we're just going to say that we're polling
console.log("Polling...");
}
}, 100);
And running it as the default user produces:
docker run -i -t -v $(pwd):/pwd -w /pwd --rm wernight/phantomjs phantomjs google.js
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Site rendered...
Done.
But no google.png and no error. Simply adding -u root to the docker command solves this, and I get the google.png in my CWD.
For completeness, the final command:
docker run -u root -i -t -v $(pwd):/pwd -w /pwd --rm
wernight/phantomjs phantomjs google.js

casperjs testing an internal site

I am trying to run a casper test for an internal site. Its running on pre-production environment, the code so far is
var casper = require('casper').create({
verbose: true,
loglevel:"debug"
});
// listening to a custom event
casper.on('page.loaded', function() {
this.echo('The page title is ' + this.getTitle());
this.echo('value is: '+ this.getElementAttribute
('input[id="edit-capture-amount"]',
'value'));
});
casper.start('https://preprod.uk.systemtest.com', function() {
this.echo(this.getTitle());
this.capture('frontpage.png');
// emitting a custom event
this.emit('age.loaded.loaded');
});
casper.run();
as you can see its not much but my problem is the address is not reachable. The capture also shows a blank page. Not sure what i am doing wrong. I have checked the code with cnn and google urls, the title and screen capture works fine. Not sure how to make it work for an internal site.
I had the exact same problem. In my browser I could resolve the url, but capserjs could not. All I got was about::blank for a web page.
Adding the --ignore-ssl-errors=yes worked like a charm!
casperjs mytestjs //didn't work
capserjs --ignore-ssl-errors=yes mytestjs //worked perfect!
Just to be sure.
Can you reach preprod.uk.systemtest.com from the computer on which casper runs ? For example with a ping or wget.
Is there any proxy between your computer and the preprod server ? Or is your system configured to pass through a proxy that should not be used for the preprod server ?
The casper code seems to be ok.
I know this should be a comment but I don't have enough reputation to post a comment.
As far as CasperJs tests are run in localhost, for testing a custom domain/subdomain/host, some headers need to be defined.
I experienced some problems when passing only the HOST header, for instance, snapshots were not taken properly.
I added 2 more headers and now my tests run properly:
casper.on('started', function () {
var testHost = 'preprod.uk.systemtest.com';
this.page.customHeaders = {
'HOST': testHost,
'HTTP_HOST': testHost,
'SERVER_NAME': testHost
};
});
var testing_url: 'http://localhost:8000/app_test.php';
casper.start(_testing_url, function() {
this.echo('I am using symfony, so this should have to show the homepage for the domain: preprod.uk.systemtest.com');
this.echo('An the snapshot is also working');
this.capture('casper_capture.png');
}

Categories