I'm trying to run a headless browser and run JS scripts (a bot) within it using but want to run it using php. Searching on Google, I found a implementation / wrapper of PhantomJS as php-phantomjs. Please bear with me, I'm very new to this stuff.
Here what I'm trying to do is to take screenshot of alert window (this is not necessary, but just to test if JS is executed and then the screenshot is taken.)
Here is my code:
// file: app.php
$client = PhantomJsClient::getInstance();
$client->isLazy();
$location = APP_PATH . '/partials/';
$serviceContainer = ServiceContainer::getInstance();
$procedureLoader = $serviceContainer->get('procedure_loader_factory')->createProcedureLoader($location);
$client->getProcedureLoader()->addLoader($procedureLoader);
$request = $client->getMessageFactory()->createCaptureRequest();
$response = $client->getMessageFactory()->createResponse();
$request->setViewportSize(1366, 768);
$request->setHeaders([
'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36'
]);
$request->setTimeout(5000);
$request->setUrl('https://www.google.co.in');
$request->setOutputFile(APP_PATH . '/screen.jpg');
$client->send($request, $response);
Tried two custom scripts as given in the list here: http://jonnnnyw.github.io/php-phantomjs/4.0/4-custom-scripts/
// file: page_on_resource_received.partial, page_open.partial
alert("Hello");
OUTPUT: It just shows the page, not the alert window.
I repeat, Its not about taking screenshot, that's just to be sure that JS is executing.
I just want to execute my JS scripts (better say bots) like:
var form = document.getElementById('form');
var event = new Event('submit');
form.dispatchEvent(event);
or maybe using jQuery and then return the output of that page to php as response or so. So, if there is any other way to run bots using php in a headless browser please mention that in your answers or comments.
PhantomJS is headless, that is it doesn't have GUI. Therefore no window dialogs can be seen.
Try writing custom text to an element instead of alert, like
document.getElementById("#title").innerHTML = "It works!";
Related
im trying to submit a simple form with phantomjs
here is my code
var webPage = require('webpage');
var page = webPage.create();
page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';
page.onLoadFinished = function(){
page.render("after_post.png");
console.log("done2!" );
phantom.exit();
};
page.open('http://localhost/bimeh/phantom/testhtml.php', function(status) {
page.includeJs("http://code.jquery.com/jquery-latest.min.js", function() {
page.evaluate(function() {
$("[name='xxx']").val('okk');
page.render("pre_post.png");
console.log('done1!');
$('#subbtn').click();
});
});
});
the problem is i dont get the pre_post.png image her eis my output
$ phantomjs test.js
done2!
it seems onLoadFinished is called before page.evaluate can do anything ... also in the after_post.png i get picture of form before submit action
im using phantomjs 1.98 (i've downgraded from 2.1 becuz it want outputting errors apparently due to some bug in qt )
This is wrong:
page.evaluate(function() {
page.render("pre_post.png"); // <----
});
page.evaluate is as if you loaded a page in a browser and then run scripts in developer tools console. There is no variable page in there. page belongs to a PhantomJS level script:
page.open('http://localhost/bimeh/phantom/testhtml.php', function(status) {
page.includeJs("http://code.jquery.com/jquery-latest.min.js", function() {
page.render("pre_post.png");
page.evaluate(function() {
$("[name='xxx']").val('okk');
$('#subbtn').click();
});
});
});
page.onLoadFinished is called every time a page has finished loading: the first time PhantomJS opens the script and the second when form is submitted. You may keep your function as it is and in this case if form is submitted the first screenshot of original page will be overwritten with the second screenshot.
However most likely your form won't be submitted because buttons don't have a click method in PhantomJS, it was added in 2.x.
Your script also lacks a crusial thing: error control. Please use page.onError callback to catch any errors on the page (you may simply copy the function from here: http://phantomjs.org/api/webpage/handler/on-error.html )
I am currently running version 1.9.8.0 PhantomJS. My problem is it works fine and does what it should (which in my case is going through the site counting certain elements to start off with) but I got this error:
PhantomJS is a headless WebKit with JavaScript API has stopped working.
The full error is:
Problem signature:
Problem Event Name: APPCRASH
Application Name: PhantomJS.exe
Application Version: 1.9.8.0
Application Timestamp: 5449270a
Fault Module Name: PhantomJS.exe
Fault Module Version: 1.9.8.0
Fault Module Timestamp: 5449270a
Exception Code: c0000005
Exception Offset: 00057976
OS Version: 6.1.7601.2.1.0.256.48
Locale ID: 2057
Additional Information 1: 8236
Additional Information 2: 823646afcac85a21ce127aeb0b347bb5
Additional Information 3: 137e
Additional Information 4: 137ec742f6481348348abf863da72fd4
Read our privacy statement online:
http://go.microsoft.com/fwlink/?linkid=104288&clcid=0x0409
If the online privacy statement is not available, please read our privacy statement offline:
C:\Windows\system32\en-US\erofflps.txt
It only seems to happen on one retailer site (currys) as I have it running on other retailers and it works fine. Also it always breaks on currys and it is since they have updated their site. If anyone could help it would be much appreciated.
Here is a code snippet:
var options = new PhantomJSOptions();
options.AddAdditionalCapability("phantomjs.page.settings.userAgent", "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.124 Safari/537.36");
options.AddAdditionalCapability("phantomjs.page.settings.loadImages", "false");
options.AddAdditionalCapability("phantomjs.page.settings.resourceTimeout", "12000");
PhantomJSDriverService service = PhantomJSDriverService.CreateDefaultService(PhantomJSPath);
service.HideCommandPromptWindow = false;//make it true on deployment
using (PhantomJSDriver driver = new PhantomJSDriver(service, options))
{
driver.Manage().Window.Size = new System.Drawing.Size(1920, 989);
ITakesScreenshot screenShot = driver as ITakesScreenshot;
IJavaScriptExecutor jse = driver as IJavaScriptExecutor;
try
{
CAInformationProviderConfiguration.CAInformationSources.ForEach(source =>
{
DefaultCAInformationSourceResult result = new DefaultCAInformationSourceResult();
result.CAInformationSource = source;
try
{
driver.ExecutePhantomJS(
#"
var page = this;
page.onResourceRequested = function(requestData, networkRequest) {
//console.log(requestData.url);
if (requestData.url.match(/(.*ajax\.html.*)|(.*facebook.*)|(.*twitter.*)|(.*instagram.*)|(.*youtube.*)|(.*hotukdeals.*)|(.*pinterest.*)|(.*flix360.*)/)) {
networkRequest.abort();
}
};
page.onResourceReceived = function(response) {
//console.log('loaded ' + response.url+'\n'+ response.stage) ;
};
");
//scrape
driver.Navigate().GoToUrl(source.Url);
it will then loop through checking elements on the page.
Edit
I upgraded to v2.0 PhantomJS and I am getting the same issue.
I should mention I m using PhantomJS with Selenium.
what is the best way to restart PhantomJS - I was thinking if I could restart it after it has processed a certain amount of rows it may prevent the application crashing?
RESOLUTION:
It seems that the memory usage was building up for this and when it got over 1gb it crashed. To stop this from happening I ended up looping through a batch and then restarting phantomjs. Not ideal but fixes my issue for now.
Thanks
I want to donwload a csv file by using Caperjs.
This is what I wrote:
var login_id = "my_user_id";
var login_password = "my_password";
var casper = require('casper').create();
casper.userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36 ');
casper.start("http://eoddata.com/symbols.aspx",function(){
this.evaluate(function(id,password) {
document.getElementById('tl00_cph1_ls1_txtEmail').value = id;
document.getElementById('ctl00_cph1_ls1_txtPassword').value = password;
document.getElementById('ctl00_cph1_ls1_btnLogin').submit();
}, login_id, login_password);
});
casper.then(function(){
this.wait(3000, function() {
this.echo("Wating...");
});
});
casper.then(function(){
this.download("http://eoddata.com/Data/symbollist.aspx?e=NYSE","nyse.csv");
});
casper.run();
And I got nyse.csv, but the file was a HTML file for registration of the web site.
It seems login process fails. How can I login correctly and save the csv file?
2015/05/13
Following #Darren's help, I wrote like this:
casper.start("http://eoddata.com/symbols.aspx");
casper.waitForSelector("form input[name = ctl00$cph1$ls1$txtEmail ]", function() {
this.fillSelectors('form', {
'input[name = ctl00$cph1$ls1$txtEmail ]' : login_id,
'input[name = ctl00$cph1$ls1$txtPassword ]' : login_password,
}, true);
});
And this code ends up with error Wait timeout of 5000ms expired, exiting..
As far as I understand the error means that the CSS selector couldn't find the element. How can I find a way to fix this problem?
Update at 2015/05/18
I wrote like this:
casper.waitForSelector("form input[name = ctl00$cph1$ls1$txtEmail]", function() {
this.fillSelectors('form', {
'input[name = ctl00$cph1$ls1$txtEmail]' : login_id,
'input[name = ctl00$cph1$ls1$txtPassword]' : login_password,
}, true);
}, function() {
fs.write("timeout.html", this.getHTML(), "w");
casper.capture("timeout.png");
});
I checked timeout.html by Chrome Developer tools and Firebugs, and I confirmed several times that there is the input element.
<input name="ctl00$cph1$ls1$txtEmail" id="ctl00_cph1_ls1_txtEmail" style="width:140px;" type="text">
How can I fix this problem? I already spent several hours for this issue.
Update 2015/05/19
Thanks for Darren, Urarist and Artjom I could remove the time out error, but there is still another error.
Downloaded CSV file was still registration html file, so I rewrote the code like this to find out the cause of error:
casper.waitForSelector("form input[name ='ctl00$cph1$ls1$txtEmail']", function() {
this.fillSelectors('form', {
"input[name ='ctl00$cph1$ls1$txtEmail']" : login_id,
"input[name ='ctl00$cph1$ls1$txtPassword']" : login_password,
}, true);
});/*, function() {
fs.write("timeout.html", this.getHTML(), "w");
casper.capture("timeout.png");
});*/
casper.then(function(){
fs.write("logined.html", this.getHTML(), "w");
});
In the logined.html user email was filled correctly, but password is not filled. Is there anyone who have guess for the cause of this?
The trick is to successfully log in. There are multiple ways to login. I've tried some and the only one that works on this page is by triggering the form submission using the enter key. This is done by using the PhantomJS page.sendEvent() function. The fields can be filled using casper.sendKeys().
var casper = require('casper').create();
casper.userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36 ');
casper.start("http://eoddata.com/symbols.aspx",function(){
this.sendKeys("#ctl00_cph1_ls1_txtEmail", login_id);
this.sendKeys("#ctl00_cph1_ls1_txtPassword", login_password, {keepFocus: true});
this.page.sendEvent("keypress", this.page.event.key.Enter);
});
casper.waitForUrl(/myaccount/, function(){
this.download("http://eoddata.com/Data/symbollist.aspx?e=NYSE", "nyse.csv");
});
casper.run();
It seems that it is necessary to wait for that specific page. CasperJS doesn't notice that a new page was requested and the then() functionality is not used for some reason.
Other ways that I tried were:
Filling and submitting the form with casper.fillSelectors()
Filling through the DOM with casper.evaluate() and submitting by clicking on the login button with casper.click()
Mixing all of the above.
At first glance your script looks reasonable. But there are a couple of ways to make it simpler, which should also make it more robust.
First, instead of your evaluate() line,
this.fillSelectors('form', {
'input[name = id ]' : login_id,
'input[name = pw ]' : login_password,
}, true);
The true parameter means submit it. (I guessed the form names, but I'm fairly sure you could continue to use CSS IDs if you prefer.)
But, even better is to not fill the form until you are sure it is there:
casper.waitForSelector("form input[name = id ]", function() {
this.fillSelectors('form', {
'input[name = id ]' : login_id,
'input[name = pw ]' : login_password,
}, true);
});
This would be important if the login form is being dynamically placed there by JavaScript (possibly even from an Ajax call), so won't exist on the page as soon as the page is loaded.
The other change is instead of using casper.wait(), to use one of the casper.waitForXXX() to make sure the csv file link is there before you try to download it. Waiting 3 seconds will go wrong if the remote server takes more than 3.1 seconds to respond, and wastes time if the remote server only takes 1 second to respond.
UPDATE: When you get a time-out on the waitFor lines it tells you the root of your problem is you are using a selector that is not there. This, I find, is the biggest time-consumer when writing Casper scripts. (I recently envisaged a tool that could automate trying to find a near-miss, but couldn't get anyone else interested, and it is a bit too big a project for one person.) So your troubleshooting start points will be:
Add an error handler to the timing-out waitFor() command and take a screenshot (casper.capture()).
Dump the HTML. If you know the ID of a parent div, you could give that, to narrow down how much you have to look for.
Open the page with FireBug (or tool of your choice) and poke around to find what is there. (remember you can type a jQuery command, or document.querySelector() command, in the console, which is a good way to interactively find the correct selector.)
Try with SlimerJS, instead of PhantomJS (especially if still using PhantomJS 1.x). It might be that the site uses some feature that is only supported in newer browsers.
I am using the Perl Selenium package, WWW::Selenium.
Trying to resize the browser window, I am getting a mysterious JavaScript error:
"Threw an exception: missing ; before statement".
Here is the code:
use strict;
use warnings;
use 5.014;
use autodie;
use warnings qw< FATAL utf8 >;
use Carp;
use Carp::Always;
use WWW::Selenium;
my $url = 'http://www.google.com'; #for example
my $sel = WWW::Selenium->new( host => 'localhost',
port => 4444,
browser => '*firefox F:\WIN 7 programs\Web & Internet\Firefox 8 bit\firefox.exe',
browser_url => $url,
);
$sel->open( $url );
$sel->wait_for_page_to_load(10000);
my $res = $sel->window_maximize(); # So far, this works fine
$res = $sel->get_eval( q{ WebDriver driver = ((WebDriverBackedSelenium) selenium).getWrappedDriver();
driver.manage().window().setSize(1040,720);} );
# (Following this: http://stackoverflow.com/questions/1522252/, Eli Colner's post)
The program then crashes here with:
"Threw an exception: missing ; before statement"
If I drop the first JavaScript line and just leave in the 2nd line, namely:
$res = $sel->get_eval( q{driver.manage().window().setSize(1040,720);} );
It bumps with: "driver not defined".
Help will be appreciated - Thanks in advance
Helen
Note: cross posted here: http://www.perlmonks.org/?node_id=1092355
I see invalid javascript in your code, you made a mistaken assumption. Regarding the referenced SO thread that you base your code on:
How to resize/maximize Firefox window during launching Selenium Remote Control?
what makes you think Eli Corner's answer/solution is "javascript"? That is Java, or C# otherwise, because only those language bindings for WebDriver (or Selenium 2) expose a WebDriverBackedSelenium feature. All other language bindings, including Perl have no such option. So even if the code syntax is correct, on execution it will fail because that's not javascript (or shall I say the referenced classes/objects are not javascript).
Your options for a solution the way I see it are:
use real javascript code and Dave Hunt's solution (in that same SO thread) ideally should work, adapted for Perl:
$sel->get_eval("window.resizeTo(1024, 768); window.moveTo(0,0);");
use Perl WebDriver binding to correctly use Eli Corner's solution (adapted for Perl), not Selenium (RC) binding that you are currently using. Perl WebDriver binding is Selenium::Remote::Driver, not WWW:Selenium. You should then be able to do something like this (there is no need for the WebDriverBackedSelenium part in Perl, but it does mean you have to switch off using Selenium RC moving to WebDriver, there's no backward compatibility support, you need Java or C# for that):
$driver->set_window_position(0, 0);
$driver->set_window_size(640, 480);
I am trying to run a casper test for an internal site. Its running on pre-production environment, the code so far is
var casper = require('casper').create({
verbose: true,
loglevel:"debug"
});
// listening to a custom event
casper.on('page.loaded', function() {
this.echo('The page title is ' + this.getTitle());
this.echo('value is: '+ this.getElementAttribute
('input[id="edit-capture-amount"]',
'value'));
});
casper.start('https://preprod.uk.systemtest.com', function() {
this.echo(this.getTitle());
this.capture('frontpage.png');
// emitting a custom event
this.emit('age.loaded.loaded');
});
casper.run();
as you can see its not much but my problem is the address is not reachable. The capture also shows a blank page. Not sure what i am doing wrong. I have checked the code with cnn and google urls, the title and screen capture works fine. Not sure how to make it work for an internal site.
I had the exact same problem. In my browser I could resolve the url, but capserjs could not. All I got was about::blank for a web page.
Adding the --ignore-ssl-errors=yes worked like a charm!
casperjs mytestjs //didn't work
capserjs --ignore-ssl-errors=yes mytestjs //worked perfect!
Just to be sure.
Can you reach preprod.uk.systemtest.com from the computer on which casper runs ? For example with a ping or wget.
Is there any proxy between your computer and the preprod server ? Or is your system configured to pass through a proxy that should not be used for the preprod server ?
The casper code seems to be ok.
I know this should be a comment but I don't have enough reputation to post a comment.
As far as CasperJs tests are run in localhost, for testing a custom domain/subdomain/host, some headers need to be defined.
I experienced some problems when passing only the HOST header, for instance, snapshots were not taken properly.
I added 2 more headers and now my tests run properly:
casper.on('started', function () {
var testHost = 'preprod.uk.systemtest.com';
this.page.customHeaders = {
'HOST': testHost,
'HTTP_HOST': testHost,
'SERVER_NAME': testHost
};
});
var testing_url: 'http://localhost:8000/app_test.php';
casper.start(_testing_url, function() {
this.echo('I am using symfony, so this should have to show the homepage for the domain: preprod.uk.systemtest.com');
this.echo('An the snapshot is also working');
this.capture('casper_capture.png');
}