Where are screenshots from phantom.js saved? - javascript

Just starting out with Phantom.js after installing via Homebrew on my mac.
I'm trying out the examples to save screenshots of websites via https://github.com/ariya/phantomjs/wiki/Quick-Start
var page = require('webpage').create();
page.open('http://google.com', function () {
page.render('google.png');
phantom.exit();
});
But I don't see the images anywhere. Will they be in the same directory as the .js file?

PhantomJS usually renders the images to the same directory as the script that you're running. So yes, it should be in the same directory as the JavaScript file that you're running using PhantomJS.
EDIT
It appears that that particular example is flawed. The problem is that page.render(...); takes some time to render the page, but you're calling phantom.exit() before it has finished rendering. I was able to get the expected output by doing this:
var page = require('webpage').create();
page.open('http://google.com', function () {
page.render('google.png');
setTimeout(function() { phantom.exit(); }, 5000) // wait five seconds and then exit;
});
Unfortunately this isn't ideal, so I was able to come up with something that's a hair better. I say a "hair", because I'm basically polling to see when the page has finished rendering:
var done = false; //flag that tells us if we're done rendering
var page = require('webpage').create();
page.open('http://google.com', function (status) {
//If the page loaded successfully...
if(status === "success") {
//Render the page
page.render('google.png');
console.log("Site rendered...");
//Set the flag to true
done = true;
}
});
//Start polling every 100ms to see if we are done
var intervalId = setInterval(function() {
if(done) {
//If we are done, let's say so and exit.
console.log("Done.");
phantom.exit();
} else {
//If we're not done we're just going to say that we're polling
console.log("Polling...");
}
}, 100);
The code above works because the callback isn't immediately executed. So the polling code will start up and start to poll. Then when the callback is executed, we check to see the status of the page (we only want to render if we were able to load the page successfully). Then we render the page and set the flag that our polling code is checking on, to true. So the next time the polling code runs, the flag is true and so we exit.
This looks to be a problem with the way PhantomJS is running the webpage#render(...) call. I suspected that it was a non-blocking call, but according to the author in this issue, it is a blocking call. If I had to hazard a guess, perhaps the act of rendering is a blocking call, but the code that does the rendering might be handing off the data to another thread, which handles persisting the data to disk (so this part might be a non-blocking call). Unfortunately, this call is probably still executing when execution comes back to the main script and executes phantom.exit(), which means that the aforementioned asynchronous code never gets a chance to finish what it's doing.
I was able to find a post on the PhantomJS forums that deals with what you're describing. I can't see any issue that has been filed, so if you'd like you can go ahead and post one.

I have the very same issue as the author of this post, and none of the code examples worked for me. Kind of disorienting to have the second example in Phantom.js documentation not work. Installed using Home Brew on Snow Leopard.
I found a working example
var page = require("webpage").create();
var homePage = "http://www.google.com/";
page.settings.javascriptEnabled = false;
page.settings.loadImages = false;
page.open(homePage);
page.onLoadFinished = function(status) {
var url = page.url;
console.log("Status: " + status);
console.log("Loaded: " + url);
page.render("google.png");
phantom.exit();
};

Just a quick help for people who come here looking for the directory where PhantomJS or CasperJS's screenshots are saved: it is in the scripts directory by default. However, you do have control.
If you want to control where they are saved you can just alter the filename like so:
page.render('screenshots/google.jpg'); // saves to scriptLocation/screenshots/
page.render('/opt/google.jpg'); // saves to /screenshots (in the root)
Or if you use CasperJS you can use:
casper.capture('/opt/google.jpg',
undefined,
{ // imgOptions
format: 'jpg',
quality: 25
});
Hope this saves someone the trouble!

I'm not sure if something has changed, but the example on http://phantomjs.org/screen-capture.html worked fine for me:
var page = require('webpage').create();
page.open('http://github.com/', function() {
page.render('github.png');
phantom.exit();
});
I am running phantomjs-2.1.1-windows.
However, what led me to this thread was initially the image file was never getting created on my disk just like the original question. I figured it was some kind of security issues so I logged into a VM and started by putting everything in the same directory. I was using a batch file to kick off phantomjs.exe with my .js file passed in as a parameter. With all of the files in the same directory on my VM, it worked great. Through trial and error, I found that my batch file had to be in the same directory as my .js file. Now all is well on my host machine as well.
So in summary...it was probably a security related problem for me. No need for adding any kind of timeout.
My answer to the original question would be IF you have everything set up correctly, the file will be in the same directory as the .js file that phantomjs.exe runs. I tried specifying a fully qualified path for the image file but that didn't seem to work.

The root cause is that page.render() may not be ready to render the image even during the onLoadFinished() event. You may need to wait upwards of several seconds before page.render() can succeed. The only reliable way I found to render an image in PhantomJS is to repeatedly invoke page.render() until the method returns true, indicating it successfully rendered the image.
Solution:
var page = require("webpage").create();
var homePage = "http://www.google.com/";
page.onLoadFinished = function(status) {
var rendered, started = Date.now(), TIMEOUT = 30 * 1000; // 30 seconds
while(!((rendered = page.render('google.png')) || Date.now() - started > TIMEOUT));
if (!rendered) console.log("ERROR: Timed out waiting to render image.")
phantom.exit();
};
page.open(homePage);

Steal from rasterize.js example, it works more reliable than the accepted answer for me.
var page = require('webpage').create();
page.open('http://localhost/TestForTest/', function (status) {
console.log("starting...");
console.log("Status: " + status);
if (status === "success") {
window.setTimeout(function () {
page.render('myExample.png');
phantom.exit();
}, 200);
} else {
console.log("failed for some reason.");
}
});

Plenty of good suggestions here. The one thing I'd like to add:
I was running a phantomjs docker image, and the default user was "phantomjs", not root. I was therefore trying to write to a location that I didn't have permission on (it was the pwd on the docker host)...
> docker run -i -t -v $(pwd):/pwd --rm wernight/phantomjs touch /pwd/foo.txt
touch: cannot touch '/pwd/foo.txt': Permission denied
The code(s) above all run without error, but if they don't have permission to write to the destination then they will silently ignore the request...
So for example, taking #vivin-paliath's code (the current accepted answer):
var done = false; //flag that tells us if we're done rendering
var page = require('webpage').create();
page.open('http://google.com', function (status) {
//If the page loaded successfully...
if(status === "success") {
//Render the page
page.render('google.png');
console.log("Site rendered...");
//Set the flag to true
done = true;
}
});
//Start polling every 100ms to see if we are done
var intervalId = setInterval(function() {
if(done) {
//If we are done, let's say so and exit.
console.log("Done.");
phantom.exit();
} else {
//If we're not done we're just going to say that we're polling
console.log("Polling...");
}
}, 100);
And running it as the default user produces:
docker run -i -t -v $(pwd):/pwd -w /pwd --rm wernight/phantomjs phantomjs google.js
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Polling...
Site rendered...
Done.
But no google.png and no error. Simply adding -u root to the docker command solves this, and I get the google.png in my CWD.
For completeness, the final command:
docker run -u root -i -t -v $(pwd):/pwd -w /pwd --rm
wernight/phantomjs phantomjs google.js

Related

Retrieve html content of a page several seconds after it's loaded

I'm coding a script in nodejs to automatically retrieve data from an online directory.
Knowing that I had never done this, I chose javascript because it is a language I use every day.
I therefore from the few tips I could find on google use request with cheerios to easily access components of dom of the page.
I found and retrieved all the necessary information, the only missing step is to recover the link to the next page except that the one is generated 4 seconds after loading of page and link contains a hash so that this step Is unavoidable.
What I would like to do is to recover dom of page 4-5 seconds after its loading to be able to recover the link
I looked on the internet, and much advice to use PhantomJS for this manipulation, but I can not get it to work after many attempts with node.
This is my code :
#!/usr/bin/env node
require('babel-register');
import request from 'request'
import cheerio from 'cheerio'
import phantom from 'node-phantom'
phantom.create(function(err,ph) {
return ph.createPage(function(err,page) {
return page.open(url, function(err,status) {
console.log("opened site? ", status);
page.includeJs('http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js', function(err) {
//jQuery Loaded.
//Wait for a bit for AJAX content to load on the page. Here, we are waiting 5 seconds.
setTimeout(function() {
return page.evaluate(function() {
var tt = cheerio.load($this.html())
console.log(tt)
}, function(err,result) {
console.log(result);
ph.exit();
});
}, 5000);
});
});
});
});
but i get this error :
return ph.createPage(function (page) {
^
TypeError: ph.createPage is not a function
Is what I am about to do is the best way to do what I want to do? If not what is the simplest way? If so, where does my error come from?
If You dont have to use phantomjs You can use nightmare to do it.
It is pretty neat library to solve problems like yours, it uses electron as web browser and You can run it with or without showing window (You can also open developer tools like in Google Chrome)
It has only one flaw if You want to run it on server without graphical interface that You must install at least framebuffer.
Nightmare has method like wait(cssSelector) that will wait until some element appears on website.
Your code would be something like:
const Nightmare = require('nightmare');
const nightmare = Nightmare({
show: true, // will show browser window
openDevTools: true // will open dev tools in browser window
});
const url = 'http://hakier.pl';
const selector = '#someElementSelectorWitchWillAppearAfterSomeDelay';
nightmare
.goto(url)
.wait(selector)
.evaluate(selector => {
return {
nextPage: document.querySelector(selector).getAttribute('href')
};
}, selector)
.then(extracted => {
console.log(extracted.nextPage); //Your extracted data from evaluate
});
//this variable will be injected into evaluate callback
//it is required to inject required variables like this,
// because You have different - browser scope inside this
// callback and You will not has access to node.js variables not injected
Happy hacking!

Capture screen shot of lazy loaded page with Node.js

I'm looking for a way to take a screenshot of a long web page every time it changes. I would like to use Node.js for this. My question is about how to render the full page with images and save it to disk ad an image file.
Most images on the webpage is lazy loaded. So I guess that I need to scroll down the entire page first, before taking a screen shot.
I tried different tools:
casperjs
node-webshot
phantomjs
All of them seems way too complicated, if not impossible, to even install. I didn't succeed with any of them.
casperjs seems like a really nice choice, but I can't get it to work within node.js. It keeps complaining, that casper.start() is not a valid method...
I got closest with node-webshot, but I did not manage to scroll down page.
This is my code so far:
var webshot = require('webshot');
var options = {
shotSize: {
height: 'all',
streamType: 'jpg'
}
};
webshot('www.xx.com', 'xx.com.jpg', options, function(err) {
// screen shot saved to 'xx.com.jpg'
});
BTW I'm developing on a mac. The finished Node app will be on a linux server.
Any comments or experiences are appreciated!
Can't really help with installing CasperJS since on Windows it works by simply using npm install casperjs -g.
I've put up a simple script to do screenshots:
var casper = require('casper').create();
casper.options.viewportSize = {width: 1600, height: 950};
var wait_duration = 5000;
var url = 'http://stackoverflow.com/questions/33803790/capture-screen-shot-of-lazy-loaded-page-with-node-js';
console.log("Starting");
casper.start(url, function() {
this.echo("Page loaded");
});
casper.then(function() {
this.scrollToBottom();
casper.wait(wait_duration, function() {
casper.capture('screen.jpg');
this.echo("Screen captured");
});
});
casper.then(function() {
this.echo("Exiting");
this.exit();
});
casper.run();
The code is fairly straightforward:
Load the url
Scroll to the bottom
Wait for a specific duration (wait_duration) for stuff to load
Do a screenshot
End
Hopefully, that works for you!
this code work for me with node in OSX, save it like test.js and run node test.js in CLI
var webshot = require('webshot');
var options = {
streamType: 'png',
windowSize: {
width: 1024,
height: 768
},
shotSize: {
width: 'all',
height: 'all'
}
};
webshot("blablabla.com","bla-image.png",options,(err) => {
if(err){
return console.log(err);
}
console.log('image succesfully');
});
you can automate it via Selenium, http://webdriver.io/. Yes, it's most like a testing engine, not a screen shot application, but you can fully control the browser automation and see the browser on your display while debugging
Start selenium server, with, for example, Google Chrome
Load your page
Do scrolling, clicking, everything with webdriver.io
Take a picture when you think it's a good time
close session
fast way to install selenium with nodejs -> https://github.com/vvo/selenium-standalone

Failed to clear temp storage

Failed to clear temp storage: It was determined that certain files are unsafe for access within a Web application, or that too many calls are being made on file resources. SecurityError
I'm getting this error in console. I have a script name script.js which makes ajax calls to retrieve data from php.
Any idea why?
Here's my jQuery script
$(document).ready(function() {
var loading = false;
var docHeight = $(window).height();
$('.timeline').css({minHeight: docHeight});
function get_tl_post() {
if (loading==false) {
loading = true;
$.ajax({
type:"POST",
url:"timeline.php",
data:"data=instagram",
beforeSend:function(){
$('.loader').fadeIn("slow");
},
complete:function(){
loading = false;
$('.loader').fadeOut("slow");
},
success:function(data) {
if(data=="error")
{
get_tl_post();
}
$(data).hide().appendTo(".timeline").fadeIn(1000);
}
});
}
}
$(window).scroll(function(){
if ($(window).scrollTop() == $(document).height() - $(window).height()) {
get_tl_post();
}
});
});
This is Due to Network Mapping of your resources.
In other words, you might have added workspace folder in your chrome dev tools.
Now when you are trying to make changes in some files it makes the Request to the File-System. This works fine for a while. However in some scenarios you remove your network mapping.
Then when you trying to open that web page on the browser, it might or might not ask for remapping of network resources and still try to update the File System.
And that's when you get this error.
There is nothing wrong with your script.
Now the only solution to this could be Removing cache, then restarting System.
If the problem still persist, you can simply re install chrome and this should be fixed.
Moreover, sometimes network mapping can cause several other issues as well.
For example, making the CSS file size to whooping 75MB or above. So you have to take precautions when playing with network mapping.
Optionally if you are on Mac... or Even on Windows and have sh
commands available.
sudo find / -type f -size +50000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'
Hit this in your Terminal to find out the culprit individual file which is over 50MB. you could then remove them.
Note : What the above command does it, it will find all the individual files which are more than 50MB and print them on your terminal one by one.
If I was to guess I would say your timeline.php script is always returning "error" so you are making too many calls recursively and the browser blocks them.
Try to eliminate the recursive function call and see if that will fix the problem.
Remove the following 3 lines and try again:
if (data == "error")
{
get_tl_post();
}
If your ajax call fails for some reason, this could lead to too many recursive calls of the get_tl_post();.
I suggest that you use the error property for error handling, and to avoid situations of recursively calling your function. An idea could be to set a policy like "if the request failed/data are with errors, wait for an amount of time, then retry. If X retries are made, then show an error code and stop requesting".
Below is an example of untested code, in order to show you the idea:
var attempts = 0;
$.ajax({
//Rest of properties
success: function(data) {
if(data == "error") {
if(attempts < 3) {
setTimeout(function(){
get_tl_post();
++attempts;
}, 2000);
} else {
//output failure here.
}
}
//Rest of code....
}
});

casperjs testing an internal site

I am trying to run a casper test for an internal site. Its running on pre-production environment, the code so far is
var casper = require('casper').create({
verbose: true,
loglevel:"debug"
});
// listening to a custom event
casper.on('page.loaded', function() {
this.echo('The page title is ' + this.getTitle());
this.echo('value is: '+ this.getElementAttribute
('input[id="edit-capture-amount"]',
'value'));
});
casper.start('https://preprod.uk.systemtest.com', function() {
this.echo(this.getTitle());
this.capture('frontpage.png');
// emitting a custom event
this.emit('age.loaded.loaded');
});
casper.run();
as you can see its not much but my problem is the address is not reachable. The capture also shows a blank page. Not sure what i am doing wrong. I have checked the code with cnn and google urls, the title and screen capture works fine. Not sure how to make it work for an internal site.
I had the exact same problem. In my browser I could resolve the url, but capserjs could not. All I got was about::blank for a web page.
Adding the --ignore-ssl-errors=yes worked like a charm!
casperjs mytestjs //didn't work
capserjs --ignore-ssl-errors=yes mytestjs //worked perfect!
Just to be sure.
Can you reach preprod.uk.systemtest.com from the computer on which casper runs ? For example with a ping or wget.
Is there any proxy between your computer and the preprod server ? Or is your system configured to pass through a proxy that should not be used for the preprod server ?
The casper code seems to be ok.
I know this should be a comment but I don't have enough reputation to post a comment.
As far as CasperJs tests are run in localhost, for testing a custom domain/subdomain/host, some headers need to be defined.
I experienced some problems when passing only the HOST header, for instance, snapshots were not taken properly.
I added 2 more headers and now my tests run properly:
casper.on('started', function () {
var testHost = 'preprod.uk.systemtest.com';
this.page.customHeaders = {
'HOST': testHost,
'HTTP_HOST': testHost,
'SERVER_NAME': testHost
};
});
var testing_url: 'http://localhost:8000/app_test.php';
casper.start(_testing_url, function() {
this.echo('I am using symfony, so this should have to show the homepage for the domain: preprod.uk.systemtest.com');
this.echo('An the snapshot is also working');
this.capture('casper_capture.png');
}

Failed to clear temp storage: SecurityError in Chrome [duplicate]

Failed to clear temp storage: It was determined that certain files are unsafe for access within a Web application, or that too many calls are being made on file resources. SecurityError
I'm getting this error in console. I have a script name script.js which makes ajax calls to retrieve data from php.
Any idea why?
Here's my jQuery script
$(document).ready(function() {
var loading = false;
var docHeight = $(window).height();
$('.timeline').css({minHeight: docHeight});
function get_tl_post() {
if (loading==false) {
loading = true;
$.ajax({
type:"POST",
url:"timeline.php",
data:"data=instagram",
beforeSend:function(){
$('.loader').fadeIn("slow");
},
complete:function(){
loading = false;
$('.loader').fadeOut("slow");
},
success:function(data) {
if(data=="error")
{
get_tl_post();
}
$(data).hide().appendTo(".timeline").fadeIn(1000);
}
});
}
}
$(window).scroll(function(){
if ($(window).scrollTop() == $(document).height() - $(window).height()) {
get_tl_post();
}
});
});
This is Due to Network Mapping of your resources.
In other words, you might have added workspace folder in your chrome dev tools.
Now when you are trying to make changes in some files it makes the Request to the File-System. This works fine for a while. However in some scenarios you remove your network mapping.
Then when you trying to open that web page on the browser, it might or might not ask for remapping of network resources and still try to update the File System.
And that's when you get this error.
There is nothing wrong with your script.
Now the only solution to this could be Removing cache, then restarting System.
If the problem still persist, you can simply re install chrome and this should be fixed.
Moreover, sometimes network mapping can cause several other issues as well.
For example, making the CSS file size to whooping 75MB or above. So you have to take precautions when playing with network mapping.
Optionally if you are on Mac... or Even on Windows and have sh
commands available.
sudo find / -type f -size +50000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'
Hit this in your Terminal to find out the culprit individual file which is over 50MB. you could then remove them.
Note : What the above command does it, it will find all the individual files which are more than 50MB and print them on your terminal one by one.
If I was to guess I would say your timeline.php script is always returning "error" so you are making too many calls recursively and the browser blocks them.
Try to eliminate the recursive function call and see if that will fix the problem.
Remove the following 3 lines and try again:
if (data == "error")
{
get_tl_post();
}
If your ajax call fails for some reason, this could lead to too many recursive calls of the get_tl_post();.
I suggest that you use the error property for error handling, and to avoid situations of recursively calling your function. An idea could be to set a policy like "if the request failed/data are with errors, wait for an amount of time, then retry. If X retries are made, then show an error code and stop requesting".
Below is an example of untested code, in order to show you the idea:
var attempts = 0;
$.ajax({
//Rest of properties
success: function(data) {
if(data == "error") {
if(attempts < 3) {
setTimeout(function(){
get_tl_post();
++attempts;
}, 2000);
} else {
//output failure here.
}
}
//Rest of code....
}
});

Categories