I'm new to CasperJS. How come this.echo(this.getTitle()); works but console.log("Page Title ", document.title); doesn't? Also why isn't my document.querySelector working? Does anyone have a good explanation? Where in the CasperJS documentation can I find the answer?
Here's my code:
var casper = require('casper').create();
var url = 'http://www.example.com/';
casper.start(url, function() {
this.echo(this.getTitle()); // works
this.echo(this.getCurrentUrl()); // works
});
casper.then(function(){
this.echo(this.getCurrentUrl()); // works
console.log("this is URL: ", document.URL); // doesn't work
console.log("Page Title ", document.title); // doesn't work
var paragraph = document.querySelectorAll('p')[0].innerHTML;
console.log(paragraph); // doesn't work
});
casper.run();
EDIT:
I'm using casper.thenEvaluate and casper.evaluate now and it's still not working. Any ideas?
var casper = require('casper').create();
var url = 'http://www.example.com/';
casper.start(url, function() {
this.echo(this.getTitle()); // works
this.echo(this.getCurrentUrl()); // works
console.log('page loaded: '); // works
});
casper.thenEvaluate(function(){
var paragraph = document.querySelectorAll('p')[0].innerHTML; // doesn't work
console.log(paragraph); // doesn't work
console.log("Page Title ", document.title); // doesn't work
});
casper.run();
You have to call functions that depend on document with this.evaluate:
var paragraph = this.evaluate(function() {
return document.querySelector('p').innerHtml;
});
When in doubt, consult the docs.
CasperJS has inherited the split between DOM context (page context) and the outer context from PhantomJS. You can only access the sandboxed DOM context through casper.evaluate(). document inside of the evaluate() callback is the variable that you would expect in normal JavaScript, but there is also a document outside of evaluate() which is only a dummy object and doesn't provide access to the DOM of the page.
If you want to access DOM properties, then you need to use evaluate().
var title = casper.evaluate(function(){
return document.title;
});
But this won't work for DOM nodes, because only primitive objects can be passed out of the DOM context. The PhantomJS documentation says the following:
Note: The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine.
Closures, functions, DOM nodes, etc. will not work!
If you want to use document.querySelector(), then you need to produce a representation of a DOM node that can be passed outside:
var form = casper.evaluate(function() {
var f = document.querySelector('form');
return { html: f.innerHTML, action: f.action };
});
casper.echo(JSON.stringify(form, undefined, 4));
You can also use all of the available CasperJS functions that can provide representations of DOM nodes such as casper.getElementsInfo().
Also, have a look at Understanding the evaluate function in CasperJS.
this.getTitle() executes getTitle() function on Casper object and runs in Casper context, hence it produces the expected result.
However, 'document' is not available in Casper context. The underlying reason is that Casper is running PhantomJS, which is a web browser. So, 'document' is only available in the browser, which is one level "deeper" than the code that runs in Casper context. There is no direct way to share variables between the two environments but there is a way to pass them as parameters by copying the value.
The "bridge" between the two environments (Casper and Phantom) is Casper's 'evaluate' function. Everything inside the function, passed to 'evaluate' as a parameter, will get executed in the browser context, not in Casper context. That's an important distinction. The documentation is available here, as noted by Blender:
http://docs.casperjs.org/en/latest/modules/casper.html#evaluate
Example below:
casper.evaluate(function(username, password) {
document.querySelector('#username').value = username;
document.querySelector('#password').value = password;
document.querySelector('#submit').click();
}, 'sheldon.cooper', 'b4z1ng4');
In the given example you can see how to pass "username" and "password" parameters from Casper environment to the browser (page) environment.
The anonymous "function(username,password)" will get executed within the browser. Therefore, you can use 'document' inside it.
You can also pass the value back, which can be picked up on Casper side. I.e.
var result = casper.evaluate(function run_in_browser(){
return document.title;
});
Try this.echo(this.fetchText('p')); to get innerhtml. Refer documentation
Related
I would like to see the content of a closure in JavaScript.
In the following code, I would like to see the closure of the function closure_f returned by the anonymous function. The local scope of the anonymous function must be stored somewhere, I would like to see where it is stored. How can this be done in Node or in the browser?
var closure_F = (function(){
var info = "info-string";
var f1 = function(){
console.log(info);
};
return f1;
}());
closure_F(); // logs 'info-string' as expected.
console.log(closure_F); // This did not provide any valuable information.
WAY 1: Internal property [[Scope]]
You can modify your code by adding console.dir and then run it in the Chrome Dev Console:
var closure_F = (function(){
var info = "info-string";
var f1 = function(){
console.log(info);
};
return f1;
}());
closure_F();
console.dir(closure_F);
// console.dir prints all the properties of a specified JavaScript object
If you open the console you will see that it prints all the properties of the object (function), including the internal property [[Scopes]].
This internal property [[Scopes]] will contain any surrounding scopes of the closure_f, and its closure. See example:
Note: [[Scope]] is an internal implementation of JS and cannot be programatically accessed within the program.
WAY 2: Setting a breakpoint - debugger
Another way to see the Closure of a function is to add a debugger statement and create a break point in the function who's closure you want to inspect.
As an example you can run this in the console:
function createClosure (){
var secret = "shhhhh";
return function inner(){
debugger;
console.log(secret);
};
};
var innerFunction = createClosure();
innerFunction();
www.ecma-international.org >> [[Scope]] >> Table 9
The local scope of the anonymous function must be stored somewhere
That's an implementation detail of the JavaScript runtime. It isn't stored anywhere that is exposed to the JavaScript program.
How can this be done in Node or in the browser?
Dedicated debugging tools can inspect the data there. Set a breakpoint on the console.log call.
Note that optimisations mean that only variables used within the returned function will be visible.
I'm using PhantomJS v2.0 and CasperJS 1.1.0-beta3. I want to query a specific part inside the page DOM.
Here the code that did not work:
function myfunc()
{
return document.querySelector('span[style="color:#50aa50;"]').innerText;
}
var del=this.evaluate(myfunc());
this.echo("value: " + del);
And here the code that did work:
var del=this.evaluate(function()
{
return document.querySelector('span[style="color:#50aa50;"]').innerText;
});
this.echo("value: " + del);
It seems to be the same, but it works different, I don't understand.
And here a code that did also work:
function myfunc()
{
return document.querySelector('span[style="color:#50aa50;"]').innerText;
}
var del=this.evaluate(myfunc);
this.echo("value: " + del);
The difference here, I call the myfunc without the '()'.
Can anyone explain the reason?
The problem is this:
var text = this.evaluate(myfunc());
Functions in JavaScript are first class citizen. You can pass them into other functions. But that's not what you are doing here. You call the function and pass the result into evaluate, but the result is not a function.
Also casper.evaluate() is the page context, and only the page context has access to the document. When you call the function (with ()) essentially before executing casper.evaluate(), you erroneously try to access the document, when it is not possible.
The difference to casper.evaluate(function(){...}); is that the anonymous function is defined and passed into the evaluate() function.
There are cases where a function should be called instead of passed. For example when currying is done, but this is not applicable to casper.evaluate(), because it is sandboxed and the function that is finally run in casper.evaluate() cannot use variables from outside. It must be self contained. So the following code will also not work:
function myFunc2(a){
return function(){
// a is from outer scope so it will be inaccessible in `evaluate`
return a;
};
}
casper.echo(casper.evaluate(myFunc2("asd"))); // null
You should use
var text = this.evaluate(myfunc);
to pass a previously defined function to run in the page context.
It's also not a good idea to use reserved keywords like del as variable names.
My company allows us to write code in a javascript editor online. Other libraries are preloaded, so the code we write has access to these libraries.
Specifically, we can use Underscore.js and jQuery.js functions in our code. We can also use our very own library Graphie.js.
In an effort to save myself time, I have slowly built up my own personal set of functions which I copy and paste into every code I write. That set of functions is now so long that I want to fetch it externally (in order to save space, etc).
$.getScript( 'url/to/myfunctions.js' )
I tried the above code, but it was too good to be true. This jQuery function getScript seems to run myfunctions as their own independent unit. This fails because myfunctions use our Graphie.js functions within them.
$.get( 'url/to/myfunctions', eval )
This above code fetches and successfully evals my code (i configured my server to do so). Also too good to be true. Any jQuery and Underscode functions in my code actually work. But any Graphie functions in my code cause an error.
Instead of
$.get( 'url/to/myfunctions', eval );
try
$.get( 'url/to/myfunctions', function(code) { eval(code); } );
This way the eval function is going to be executed within the same scope as the rest of your code, rather than within the scope of jQuery. After the code has been fetched and executed, you can continue with the execution of the rest of your code:
$.get( 'url/to/myfunctions', function(code) {
eval(code);
callback();
});
function callback() {
// Your code goes here
}
Explanation
For the purpose of the explanation, let's use this simplified model of the environment, in which your code is being executed:
// JQuery is defined in the global scope
var $ = {
get: function( url, fn ) {
var responses = {
"url/to/myfunctions": "try {\
if(graphie) log('Graphie is visible.');\
} catch (e) {\
log('Graphie is not visible. (' + e + ')');\
}"
}; fn( responses[url] );
}
};
(function() {
// Graphie is defined in a local scope
var graphie = {};
(function() {
// Your code goes here
$.get( "url/to/myfunctions", eval );
$.get( "url/to/myfunctions", function(code) { eval (code); } );
})();
})();
The output: <ol id="output"></ol>
<script>
function log(msg) {
var el = document.createElement("li");
el.appendChild(document.createTextNode(msg));
output.appendChild(el);
}
</script>
As you can see, the function passed to $.get gets executed inside its body. If you only pass eval to $.get, then you don't capture the local variable graphie, which is then invisible to the evaluated code. By wrapping eval inside an anonymous function, you capture the reference to the local variable graphie, which is then visible to the evaluated code.
I'd advise against the use of eval. However, you can follow the following model.
First in your myFunctions.js, wrap all your code into a single function.
(function(_, $, graphie) {
// declare all your functions here which makes use of the paramters
}) // we will be calling this anonymous function later with parameters
Then after getting the script you could do
$.get( 'url/to/myfunctions', function(fn){
var el = document.createElement('script');
el.type = 'text/javascript';
el.text = fn + '(_, jQuery, Graphie);';
document.head.appendChild(el);
});
Note that, I've put Graphie as the parameter, but I'm not sure of it. So put your correct graphie variable there.
Assuming that you have ajax access to this script (since that is what $.get is doing in your sample code shown), you could attempt to use jQuery's .html() to place the script which should execute it with the page's variable environment.
$.ajax({
url: 'url/to/myfunctions.js',
type: 'GET',
success: function (result) {
var script = '<scr'+'ipt>'+result+'</scr'+'ipt>';
var div = $("<div>");
$("body").append(div);
div.html(script);
}
});
Internally, this script will end up being executed by jQuery's globalEval function. https://github.com/jquery/jquery/blob/1.9.1/src/core.js#L577
// Evaluates a script in a global context
// Workarounds based on findings by Jim Driscoll
// http://weblogs.java.net/blog/driscoll/archive/2009/09/08/eval-javascript-global-context
globalEval: function( data ) {
if ( data && jQuery.trim( data ) ) {
// We use execScript on Internet Explorer
// We use an anonymous function so that context is window
// rather than jQuery in Firefox
( window.execScript || function( data ) {
window[ "eval" ].call( window, data );
} )( data );
}
}
I also asked a question related to this here: Why is it that script will run from using jquery's html but not from using innerHTML?
Thanks to everyone's help, here is the solution that worked...
The myfunctions.js file has to be wrapped in a function:
function everything(_,$,Graphie){
// every one of myfunctions now must be attached to the Graphie object like this:
Graphie.oneOfMyFunctions = function(input1,input2,etc){
// content of oneOfMyFunctions
}
// the rest of myfunctions, etc.
}
Then in my code I can retrieve it with:
$.get( '//path/to/myfunctions', eval )
everything(_,jQuery,mygraphievar);
Somehow, the code being evaled didn't have access to the global variable mygraphievar, which is why it had to be passed in and NOT part of the evaled code (here Amit made a small error).
Also, the everything function is executed OUTSIDE of the $.get() so that the changes to mygraphievar are made before any other code below gets executed.
One should notice that $.get() is actually an asynchronous function and will not call eval until after other code is executed. This causes the code to fail the very first time I run it, but after the first time the functions get saved in memory and then everything works correctly. The proper solution would be to write ALL of the code I want to execute in the callback function of the $.get(), but I was lazy.
One should also know that a slightly simpler solution is possible with $.getScript() but I don't have time to verify it.
This is a bit of a tricky question.
I am very familiar with javascript, however I am on a project that auto-crawls a website using PhantomJS and CasperJS. These are entirely new subjects to me.
I was able to figure out how to use Casper and navigate, log in to pages, etc, however it is unweildy as the general flow seems to be:
casper.start('http://google.fr/');
casper.then(function() {
this.echo("I'm in your google.");
});
casper.then(function() {
this.echo('Now, let me write something');
});
casper.then(function() {
this.echo('Oh well.');
});
casper.run();
My problem with this is that I want to do all sorts of things with the website, depending on what data is gotten with it. I can't pre-layout the sequence of navigations and not have it change. I hope this makes sense.
To solve this, I created a Javascript Navigator object with builtin functions. My general concept was:
navigator.logIn(function()
{
navigator.actionA(parameters, function()
{
if (navigator.data.a == navigator.data.b) {
navigator.actionB();
} else {
navigator.actionC();
}
});
});
And embedded in each of these functions would be casper functions.
Here is a shortened version of my actual code, and where things started getting funky:
var casper = require('casper').create({
clientScripts: [ 'jquery.min.js' ],
onError: function(self, m) {
console.log('FATAL:' + m);
self.exit();
},
});
var navigator = new _Navigator();
function _Navigator() { }
_Navigator.prototype.logIn = function(aCallback)
{
var self = this;
casper.start('https://website/login.asp', function()
{
if (1 == 1) {
this.evaluate(function() {
$("input[name=blah]").val('blahblah');
});
// ... A LOT MORE CODE
aCallback();
}
});
}
_Navigator.prototype.search = function(aDataSet, aCallback)
{
var self = this;
console.log('this works');
casper.then(function(){
console.log('this works');
});
var firstName = 'foobar';
casper.then(function(){
console.log('this works');
this.evaluate(function()
{
console.log('this no longer works!!');
$('input[id=blah]').val(firstName);
aCallback();
});
});
}
navigator.logIn(function() {
// LOG IN RUNS, AND CALLS BACK SUCCESSFULLY...
navigator.search({'dataset'}, function()
{
console.log('This never runs');
});
});
casper.run();
You'll notice that in the navigator.login function, I call casper.start(); In this, the evaluation function works fine, however then I do a callback function within that casper.start(); In my callback, I call the next function, navigator.search, which I suppose is still technically executing in the casper.start?
When I try running casper.evaluate within this new function called by the first callback function, everything seems to behave fine with the exception that casper.evaluate no longer works! It seems to eat the function, not printing any console logs or anything.
I have tried everything on this. I am not sure how to do this correctly. Does anyone have any suggestions on what I am doing wrong? Thanks.
I know this is quite old, but: What's going on here is a combination of two issues:
casper.evaluate() seems to eat all errors within the current stack - onError won't run from inside an .evaluate() callback.
Functions used in .evaluate are not standard closures - they're sandboxed, and have no access to variables outside their scope, unless passed as explicit arguments to casper.evaluate. So in the evaluated function where you call aCallback() there's no aCallback in scope, and the function will fail (silently) with a ReferenceError.
casper.evaluate() is as a window onto the headless browser session.
Anything that happens in functions passed to evaluate doesn't appear on your local console.
However, you can either log any value returned from evaluate or print all output by setting up a listener:
casper.on('remote.message', function(message) {
console.log(message);
});
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Javascript OOP return value from function
I have a class defined like this
function SocialMiner(tabUrl)
{
var verbose=true;
var profileArray=new Array();
this.tabUrl=tabUrl;
this.getTabUrl=function(callback)
{
chrome.tabs.getSelected(null, function(tab)
{
callback(tab.url);
});
}
this.setTabUrlValue=function(pageUrl)
{
this.tabUrl=pageUrl;
console.log("22"+this.tabUrl); //this statement shows url correctly
}
}
When I call this method like these
miner.getTabUrl(miner.setTabUrlValue);
miner.logToConsole("1"+miner.tabUrl); //This statement returns undefined
The console.log inside callback correctly outputs url , however, the tabUrl property of miner ojbect is undefined , as seen in second console.log. Why is it so ?
The solution is to save a reference to this within the constructor (available later on via closure):
var that = this; //in the top of the SocialMiner constructor function
and in setTabUrlValue use:
that.tabUrl=pageUrl;
I suspect running a method as a function (callback) loses scope, i.e. doesn't know of any this anymore. In other words, it runs within the scope of the constructor, not as a method of the instance using it. A variable referencing this in the constructor scope is available to the function, and that points to the right this on instance creation.
You could also force callback to run in the current instance scope like this:
callback.call(this,tab.url);
In that case you can leave this.tabUrl=pageUrl; as it is.
This is an simplification of your code. The methods return this to be able to directly reference a property of the instance (see console.log last line):
function Some(){
var that = this; // note: not used in this example
this.getA = function(callback){
someval = 'foobar';
callback.call(this,someval);
return this;
};
this.getB = function(val){
this.val = val;
return this;
};
}
var some = new Some;
console.log( some.getA(some.getB).val ); //=> foobar
Taking a look # your code again, I think you're loosing scope twice, because callback is called from within another callback. That's why I think your code on that spot should be:
chrome.tabs.getSelected(
null,
function(tab) {
callback.call(that,tab.url); //< use that here
}
);
Furthermore, in you code # github, I don't see any instantiation of the miner instance.
this is a tricky beast in JavaScript and as others have pointed out is the key to the issue. The problem with using this everywhere is that it's value can change depending on who/where the function is called from (for example, see the call and apply methods in JavaScript). I'm guessing that if you wrote the value of this to the console in the the callback from the chrome.tabs.getSelected function you'd find it isn't your miner any more.
The solution is to capture a reference to the this that you're actually interested in when you know for sure it's the right one & then use that reference from then on. Might make more sense to see it commented in-line in your example:
function SocialMiner(tabUrl)
{
//At this point we know "this" is our miner object, so let's store a
//reference to it in some other (not so transient) variable...
var that = this;
var verbose=true;
var profileArray=new Array();
this.tabUrl=tabUrl;
this.getTabUrl=function(callback)
{
chrome.tabs.getSelected(null, function(tab)
{
//at this point "this" is whatever the "chrome.tabs.getSelected"
//method has decided it is (probably a reference to the tab or something)
callback(tab.url);
});
}
this.setTabUrlValue=function(pageUrl)
{
//because this can be called from anywhere, including the chrome callback
//above, who knows what "this" refers to here (but "that" is definitely
//still your miner)
that.tabUrl=pageUrl;
console.log("22"+that.tabUrl);
}
}
You can see how much this shifts around in libraries that use callbacks heavily like jQuery, where often this is set to convenient values, but certainly not the same this that was logically in scope when you made the initial call.
EDIT: Looking at the full source (& example) you posted, this is just a timing issue where obviously the chrome.tabs.getSelected is returning asynchronously after your "second" call to log goes through...
console.log("5");
miner.getTabUrl(miner.setTabUrlValue); //setTabUrlValue is logging with '22'
console.log("6");
miner.logToConsole("1"+miner.tabUrl);
console.log("7");
// Output:
5
6
1 undefined //the chrome.tabs.getSelected hasn't returned yet...
7
22 http://url //now it has (so if you tried to use miner.tabUrl now you'd be all good...
The solution is to put all the stuff after the get/set into the callback, since you don't want anything happening until after that tabUrl is finished being set... so something like this:
console.log("5");
miner.getTabUrl(function(pageUrl) {
miner.setTabUrlValue(pageUrl);
console.log("6");
miner.logToConsole("1"+miner.tabUrl);
console.log("7");
});
Hopefully that will see you getting your results in the order you expect them.
I think this happens because closure vars do not survive a function call.