How to download a csv file after login by using Casperjs - javascript

I want to donwload a csv file by using Caperjs.
This is what I wrote:
var login_id = "my_user_id";
var login_password = "my_password";
var casper = require('casper').create();
casper.userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36 ');
casper.start("http://eoddata.com/symbols.aspx",function(){
this.evaluate(function(id,password) {
document.getElementById('tl00_cph1_ls1_txtEmail').value = id;
document.getElementById('ctl00_cph1_ls1_txtPassword').value = password;
document.getElementById('ctl00_cph1_ls1_btnLogin').submit();
}, login_id, login_password);
});
casper.then(function(){
this.wait(3000, function() {
this.echo("Wating...");
});
});
casper.then(function(){
this.download("http://eoddata.com/Data/symbollist.aspx?e=NYSE","nyse.csv");
});
casper.run();
And I got nyse.csv, but the file was a HTML file for registration of the web site.
It seems login process fails. How can I login correctly and save the csv file?
2015/05/13
Following #Darren's help, I wrote like this:
casper.start("http://eoddata.com/symbols.aspx");
casper.waitForSelector("form input[name = ctl00$cph1$ls1$txtEmail ]", function() {
this.fillSelectors('form', {
'input[name = ctl00$cph1$ls1$txtEmail ]' : login_id,
'input[name = ctl00$cph1$ls1$txtPassword ]' : login_password,
}, true);
});
And this code ends up with error Wait timeout of 5000ms expired, exiting..
As far as I understand the error means that the CSS selector couldn't find the element. How can I find a way to fix this problem?
Update at 2015/05/18
I wrote like this:
casper.waitForSelector("form input[name = ctl00$cph1$ls1$txtEmail]", function() {
this.fillSelectors('form', {
'input[name = ctl00$cph1$ls1$txtEmail]' : login_id,
'input[name = ctl00$cph1$ls1$txtPassword]' : login_password,
}, true);
}, function() {
fs.write("timeout.html", this.getHTML(), "w");
casper.capture("timeout.png");
});
I checked timeout.html by Chrome Developer tools and Firebugs, and I confirmed several times that there is the input element.
<input name="ctl00$cph1$ls1$txtEmail" id="ctl00_cph1_ls1_txtEmail" style="width:140px;" type="text">
How can I fix this problem? I already spent several hours for this issue.
Update 2015/05/19
Thanks for Darren, Urarist and Artjom I could remove the time out error, but there is still another error.
Downloaded CSV file was still registration html file, so I rewrote the code like this to find out the cause of error:
casper.waitForSelector("form input[name ='ctl00$cph1$ls1$txtEmail']", function() {
this.fillSelectors('form', {
"input[name ='ctl00$cph1$ls1$txtEmail']" : login_id,
"input[name ='ctl00$cph1$ls1$txtPassword']" : login_password,
}, true);
});/*, function() {
fs.write("timeout.html", this.getHTML(), "w");
casper.capture("timeout.png");
});*/
casper.then(function(){
fs.write("logined.html", this.getHTML(), "w");
});
In the logined.html user email was filled correctly, but password is not filled. Is there anyone who have guess for the cause of this?

The trick is to successfully log in. There are multiple ways to login. I've tried some and the only one that works on this page is by triggering the form submission using the enter key. This is done by using the PhantomJS page.sendEvent() function. The fields can be filled using casper.sendKeys().
var casper = require('casper').create();
casper.userAgent('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36 ');
casper.start("http://eoddata.com/symbols.aspx",function(){
this.sendKeys("#ctl00_cph1_ls1_txtEmail", login_id);
this.sendKeys("#ctl00_cph1_ls1_txtPassword", login_password, {keepFocus: true});
this.page.sendEvent("keypress", this.page.event.key.Enter);
});
casper.waitForUrl(/myaccount/, function(){
this.download("http://eoddata.com/Data/symbollist.aspx?e=NYSE", "nyse.csv");
});
casper.run();
It seems that it is necessary to wait for that specific page. CasperJS doesn't notice that a new page was requested and the then() functionality is not used for some reason.
Other ways that I tried were:
Filling and submitting the form with casper.fillSelectors()
Filling through the DOM with casper.evaluate() and submitting by clicking on the login button with casper.click()
Mixing all of the above.

At first glance your script looks reasonable. But there are a couple of ways to make it simpler, which should also make it more robust.
First, instead of your evaluate() line,
this.fillSelectors('form', {
'input[name = id ]' : login_id,
'input[name = pw ]' : login_password,
}, true);
The true parameter means submit it. (I guessed the form names, but I'm fairly sure you could continue to use CSS IDs if you prefer.)
But, even better is to not fill the form until you are sure it is there:
casper.waitForSelector("form input[name = id ]", function() {
this.fillSelectors('form', {
'input[name = id ]' : login_id,
'input[name = pw ]' : login_password,
}, true);
});
This would be important if the login form is being dynamically placed there by JavaScript (possibly even from an Ajax call), so won't exist on the page as soon as the page is loaded.
The other change is instead of using casper.wait(), to use one of the casper.waitForXXX() to make sure the csv file link is there before you try to download it. Waiting 3 seconds will go wrong if the remote server takes more than 3.1 seconds to respond, and wastes time if the remote server only takes 1 second to respond.
UPDATE: When you get a time-out on the waitFor lines it tells you the root of your problem is you are using a selector that is not there. This, I find, is the biggest time-consumer when writing Casper scripts. (I recently envisaged a tool that could automate trying to find a near-miss, but couldn't get anyone else interested, and it is a bit too big a project for one person.) So your troubleshooting start points will be:
Add an error handler to the timing-out waitFor() command and take a screenshot (casper.capture()).
Dump the HTML. If you know the ID of a parent div, you could give that, to narrow down how much you have to look for.
Open the page with FireBug (or tool of your choice) and poke around to find what is there. (remember you can type a jQuery command, or document.querySelector() command, in the console, which is a good way to interactively find the correct selector.)
Try with SlimerJS, instead of PhantomJS (especially if still using PhantomJS 1.x). It might be that the site uses some feature that is only supported in newer browsers.

Related

submitting simple form and capturing before and after screen

im trying to submit a simple form with phantomjs
here is my code
var webPage = require('webpage');
var page = webPage.create();
page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36';
page.onLoadFinished = function(){
page.render("after_post.png");
console.log("done2!" );
phantom.exit();
};
page.open('http://localhost/bimeh/phantom/testhtml.php', function(status) {
page.includeJs("http://code.jquery.com/jquery-latest.min.js", function() {
page.evaluate(function() {
$("[name='xxx']").val('okk');
page.render("pre_post.png");
console.log('done1!');
$('#subbtn').click();
});
});
});
the problem is i dont get the pre_post.png image her eis my output
$ phantomjs test.js
done2!
it seems onLoadFinished is called before page.evaluate can do anything ... also in the after_post.png i get picture of form before submit action
im using phantomjs 1.98 (i've downgraded from 2.1 becuz it want outputting errors apparently due to some bug in qt )
This is wrong:
page.evaluate(function() {
page.render("pre_post.png"); // <----
});
page.evaluate is as if you loaded a page in a browser and then run scripts in developer tools console. There is no variable page in there. page belongs to a PhantomJS level script:
page.open('http://localhost/bimeh/phantom/testhtml.php', function(status) {
page.includeJs("http://code.jquery.com/jquery-latest.min.js", function() {
page.render("pre_post.png");
page.evaluate(function() {
$("[name='xxx']").val('okk');
$('#subbtn').click();
});
});
});
page.onLoadFinished is called every time a page has finished loading: the first time PhantomJS opens the script and the second when form is submitted. You may keep your function as it is and in this case if form is submitted the first screenshot of original page will be overwritten with the second screenshot.
However most likely your form won't be submitted because buttons don't have a click method in PhantomJS, it was added in 2.x.
Your script also lacks a crusial thing: error control. Please use page.onError callback to catch any errors on the page (you may simply copy the function from here: http://phantomjs.org/api/webpage/handler/on-error.html )

How to execute Javascript in headless chrome using php-phantomjs?

I'm trying to run a headless browser and run JS scripts (a bot) within it using but want to run it using php. Searching on Google, I found a implementation / wrapper of PhantomJS as php-phantomjs. Please bear with me, I'm very new to this stuff.
Here what I'm trying to do is to take screenshot of alert window (this is not necessary, but just to test if JS is executed and then the screenshot is taken.)
Here is my code:
// file: app.php
$client = PhantomJsClient::getInstance();
$client->isLazy();
$location = APP_PATH . '/partials/';
$serviceContainer = ServiceContainer::getInstance();
$procedureLoader = $serviceContainer->get('procedure_loader_factory')->createProcedureLoader($location);
$client->getProcedureLoader()->addLoader($procedureLoader);
$request = $client->getMessageFactory()->createCaptureRequest();
$response = $client->getMessageFactory()->createResponse();
$request->setViewportSize(1366, 768);
$request->setHeaders([
'User-Agent' => 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36'
]);
$request->setTimeout(5000);
$request->setUrl('https://www.google.co.in');
$request->setOutputFile(APP_PATH . '/screen.jpg');
$client->send($request, $response);
Tried two custom scripts as given in the list here: http://jonnnnyw.github.io/php-phantomjs/4.0/4-custom-scripts/
// file: page_on_resource_received.partial, page_open.partial
alert("Hello");
OUTPUT: It just shows the page, not the alert window.
I repeat, Its not about taking screenshot, that's just to be sure that JS is executing.
I just want to execute my JS scripts (better say bots) like:
var form = document.getElementById('form');
var event = new Event('submit');
form.dispatchEvent(event);
or maybe using jQuery and then return the output of that page to php as response or so. So, if there is any other way to run bots using php in a headless browser please mention that in your answers or comments.
PhantomJS is headless, that is it doesn't have GUI. Therefore no window dialogs can be seen.
Try writing custom text to an element instead of alert, like
document.getElementById("#title").innerHTML = "It works!";

Automate daily csv file download from website button click

I would like to automate the process of visiting a website, clicking a button, and saving the file. The only way to download the file on this site is to click a button. You can't navigate to the file using a url.
I have been trying to use phantomjs and casperjs to automate this process, but haven't had any success.
I recently tried to use brandon's solution here
Grab the resource contents in CasperJS or PhantomJS
Here is my code for that
var fs = require('fs');
var cache = require('./cache');
var mimetype = require('./mimetype');
var casper = require('casper').create();
casper.start('http://www.example.com/page_with_download_button', function() {
});
casper.then(function() {
this.click('#download_button');
});
casper.on('resource.received', function (resource) {
"use strict";
for(i=0;i < resource.headers.length; i++){
if(resource.headers[i]["name"] == "Content-Type" && resource.headers[i]["value"] == "text/csv; charset-UTF-8;"){
cache.includeResource(resource);
}
}
});
casper.on('load.finished', function(status) {
for(i=0; i< cache.cachedResources.length; i++){
var file = cache.cachedResources[i].cacheFileNoPath;
var ext = mimetype.ext[cache.cachedResources[index].mimetype];
var finalFile = file.replace("."+cache.cacheExtension,"."+ext);
fs.write('downloads/'+finalFile,cache.cachedResources[i].getContents(),'b');
}
});
casper.run();
I think the problem could be caused by my cachePath being incorrect in cache.js
exports.cachePath = 'C:/Users/username/AppData/Local/Ofi Labs/PhantomJS';
Should I be using something in adition to the backslashes to define the path?
When I try
casperjs --disk-cache=true export_script.js
Nothing is downloaded. After a little debugging I have found that cache.cachedResources is always empty.
I would also be open to solutions outside of phantomjs/casperjs.
UPDATE
I am not longer trying to accomplish this with CasperJS/PhantomJS.
I am using the chrome extension Tampermonkey suggested by dandavis.
Tampermonkey was extremely easy to figure out.
I installed Tampermonkey, navigated to the page with the download link, and then clicked New Script under tampermonkey and added my javascript code.
document.getElementById("download_button").click();
Now every time I navigate to the page in my browser, the file is downloaded. I then created a batch script that looks like this
set date=%DATE:~10,4%_%DATE:~4,2%_%DATE:~7,2%
chrome "http://www.example.com/page-with-dl-button"
timeout 10
move "C:\Users\user\Downloads\export.csv" "C:\path\to\dir\export_%date%.csv"
I set that batch script to run nightly using the windows task scheduler.
Success!
Your button most likely issues a POST request to the server.
In order to track it:
Open Network tab in Chrome developer tools
Navigate to the page and hit the button.
Notice which request led to file download. Right click on it and copy as cURL
Run copied cURL
Once you have cURL working you can schedule downloads using cron or Task Scheduler depending on operation system you are using.

Wait for a web page alert in CasperJS

I'm a newcomer to CasperJS and after a couple hours I can login and navigate a few webpages with it, but I'm stumped by the alert message on this website: https://www.macysliquidation.com/
I need to get rid of the alert so I can login.
My simple (non-working) code is:
var casper = require('casper').create();
casper.userAgent('Mozilla/12.0 (compatible; MSIE 6.0; Windows NT 5.1)');
casper.on('remote.alert', function(message) {
this.echo('alert message: ' + message);
// how do i get rid of the popup??
this.thenClick();
});
casper.start('https://www.macysliquidation.com/');
casper.then(function() {
// login here
this.sendKeys('#txtUsername','username');
this.sendKeys('#txtPassword','password');
this.thenClick('#btnLogin');
});
casper.run(function() {
// see what went on
this.capture('page.png');
this.echo('done').exit();
});
Till the time the alert is clicked away, the login controls aren't visible/available. So the above js returns
Cannot get informations from #txtUsername: element not found
As you already noticed the function capser.waitForAlert() is available since version 1.1-beta4. You can copy the function from the code if you don't have the time to upgrade:
casper.waitForAlert = function(then, onTimeout, timeout) {
...
};
Problem:
Alerts and confirm just happen and they don't stop the execution in PhantomJS and CasperJS. They are also not part of the page and cannot be clicked on.
If you would register to the error events (resource.error and page.error and remote.message is always a good idea) in CasperJS, you would have seen that a specific resource error was thrown:
{"errorCode":6,"errorString":"SSL handshake failed","id":1,"url":"https://www.macysliquidation.com/"}
If you would have checked the status of the page, you would have seen that it wasn't loaded.
Solution:
Run CasperJS with --ignore-ssl-errors=true and depending on your PhantomJS version with --ssl-protocol=tlsv1. More information here.

casperjs testing an internal site

I am trying to run a casper test for an internal site. Its running on pre-production environment, the code so far is
var casper = require('casper').create({
verbose: true,
loglevel:"debug"
});
// listening to a custom event
casper.on('page.loaded', function() {
this.echo('The page title is ' + this.getTitle());
this.echo('value is: '+ this.getElementAttribute
('input[id="edit-capture-amount"]',
'value'));
});
casper.start('https://preprod.uk.systemtest.com', function() {
this.echo(this.getTitle());
this.capture('frontpage.png');
// emitting a custom event
this.emit('age.loaded.loaded');
});
casper.run();
as you can see its not much but my problem is the address is not reachable. The capture also shows a blank page. Not sure what i am doing wrong. I have checked the code with cnn and google urls, the title and screen capture works fine. Not sure how to make it work for an internal site.
I had the exact same problem. In my browser I could resolve the url, but capserjs could not. All I got was about::blank for a web page.
Adding the --ignore-ssl-errors=yes worked like a charm!
casperjs mytestjs //didn't work
capserjs --ignore-ssl-errors=yes mytestjs //worked perfect!
Just to be sure.
Can you reach preprod.uk.systemtest.com from the computer on which casper runs ? For example with a ping or wget.
Is there any proxy between your computer and the preprod server ? Or is your system configured to pass through a proxy that should not be used for the preprod server ?
The casper code seems to be ok.
I know this should be a comment but I don't have enough reputation to post a comment.
As far as CasperJs tests are run in localhost, for testing a custom domain/subdomain/host, some headers need to be defined.
I experienced some problems when passing only the HOST header, for instance, snapshots were not taken properly.
I added 2 more headers and now my tests run properly:
casper.on('started', function () {
var testHost = 'preprod.uk.systemtest.com';
this.page.customHeaders = {
'HOST': testHost,
'HTTP_HOST': testHost,
'SERVER_NAME': testHost
};
});
var testing_url: 'http://localhost:8000/app_test.php';
casper.start(_testing_url, function() {
this.echo('I am using symfony, so this should have to show the homepage for the domain: preprod.uk.systemtest.com');
this.echo('An the snapshot is also working');
this.capture('casper_capture.png');
}

Categories