I am trying to use PhantomJS on a page with basic always auth, for example, this page
http://alexturpin.net/auth (test:rosebud)
Using the following code
var webpage = require('webpage');
page = webpage.create();
page.settings = {
userName: "test",
password: "rosebud"
};
page.open("http://alexturpin.net/auth/", function(status) {
console.log(status);
var retval = page.evaluate(function() {
return "test";
});
console.log(retval);
});
I get this output
$ phantomjs test.js
success
null
Whatever I try, evaluate will keep returning null, even though the page seems to have been opened fine because status contains "success".
If I decide to open a page with no basic auth, like http://alexturpin.net/noauth, I still get the same results. Only when I finally remove the authentication settings altogether before opening the page does it work.
The use of authentication settings seem to be conflicting with the evaluate. Is this a bug in PhantomJS, or did I miss something?
With that syntax you are wiping out the settings object, replacing it with only userName and password. Use that code to set them:
page.settings.userName = "test";
page.settings.password = "rosebud";
Probably PhantomJS should handle that better, i.e. not rely on the settings object for defaults.
This is a closure issue, try to put your console.log in a callback
Related
I have a script main.js that consists of the following code:
function showListing() {
var url = $("#url")[0].value
var listingID = extractListingID(url)
if (listingID) {
alert(`ListingID is ${listingID}`)
location.href = `/listing/${listingID}`
}
}
function extractListingID(url) {
var found = url.match(/\d{9,}/)
if (found)
return found[0]
}
I expect it to redirect to /listing/XXX, where XXX is the value returned from extractListingID(). However, it redirects to /? (e.g., 127.0.0.1:5000/? on the debug server).
Here's the network log from the devtools:
The server is a locally run python/flask application that successfully returns a page when I open it directly at https://127.0.0.1:5000/listing/XXX.
Any idea why that happens?
It turns out the script was called from a button that didn't have a type set. So, after pressing this button, the showListing() function started along with the standard submit process that interfered with it.
After setting the button to type="button" everything works fine.
In Chrome I have populate an on line mapping tool (Kumu) with a JSON file from the JS Console with:
Workflows.setCurrentMapSource("MY_JSON_LINK");
where MY_JSON_LINK was:
https://XXXXXX/json?key=MTE3.DI4LYA.ZrzRFJ5o7Q5m3nLe6d6JGFISdKI
But the Link is no longer active so when I go to the Kumu page I get the error:
Unable to open map
Is there a way to break the connection from the JS Console? I have searched but have not found anything that works
Thanks
I'm on phone so I can't give you the code, but what you can do is override the XMLHttpRequest methods and then you can manipulate any requests done on the page.
But this must of course be done BEFORE the requests are done so you'll probably need Tampermonkey userscript. Example:
const originalOpen = XMLHttpRequest.prototype.open;
XMLHttpRequest.prototype.open = function (){
//do what you need
originalOpen. apply(this, arguments);
}
So for example if you want to protect some link from being accessed, you can do this:
const originalOpen = XMLHttpRequest.prototype.open;
const REGEX_TEST_URL = /https?:\/\/XXXXXX\/json?key=(.*?)/
XMLHttpRequest.prototype.open = function (method, url){
console.log("Open: ", url);
/// if you want to kill access to that URL
if(REGEX_TEST_URL.test(url))
throw new Error("Blocked loading of URL "+url)
//Otherwise allow normal operatio to proceed
originalOpen.apply(this, arguments);
}
You can test this even here on stackoverflow.
I am trying to automate a site in a WPF application with WebBrowser control.
The site checks for the javascript window.name in each page and throws an error if this does not match with the preset value.
Look at the sample below.
var id="1234";
if (window.name != id)
{
window.open("home.html", id)
}
Is there a way to get this value and set it when I create a new WebBrowser object?
I tried the following and my problem is resolved. Hope this may help somebody.
I first navigated the page to a blank page with this code.
var html = string.Format(
"<html><body><h4>Opening ...</h4><script type='text/javascript'>window.open('about:blank', '{0}');</script></body></html>",
popupWindowName);
var w = new Browser();
w.NavigateToString(html);
And then in the page is load completed event, I navigated to the original URL.
w.Navigate("https://somesite.com/page.aspx",
null, null, h);
The popup window name was changed to what I wanted and the session continuted correctly. This is not a solution to the problem I faced, but it is more like a work around.
I also had to deal with the popups that kept coming. I had handled the NewWindow2 event to handle the popups.
I am trying to match a token (string token) in the RSS feed using casperjs waitFor() but it does not seem to work. There are other ways (not using polling) to get around but I need to poll for it. Here is the code snippet:
casper.then(function() {
this.waitFor(function matchToken() {
return this.evaluate(function() {
if(!this.resourceExists(token)) {
this.reload();
return false;
}
return true;
});
});
});
The updates to rss url are not dynamic and hence, a refresh would be needed to check for the token. But it seems (from the access log) that I am not getting any hits (reload not working) on the rss url. Ideally, I would want to refresh the page if it doesn't see the token and then check for the token again & it should keep doing that until the waitFor times out.
I also tried using assertTextExists() instead of resourceExists() but even that did not work.
I am using PhantomJS (1.9.7) & the url is: https://secure.hyper-reach.com:488/rss/323708
The token I am looking for is --> item/272935. If you look at the url I have mentioned above, you will find this in a each guid tag. The reason why I am including "item/" also as a part of my token is so that it doesn't match any other numbers incorrectly.
evaluate() is the sandboxed page context. Anything inside of it doesn't have access to variables defined outside and this refers to window of the page and not casper. You don't need the evaluate() function here, since you don't access the page context.
The other thing is that casper.resourceExists() works on the resource meta data such as URL and request headers. It seems that you want to check the content of the resource. If you used casper.thenOpen() or casper.open() to open the RSS feed, then you can check with casper.getPageContent(), if the text exists.
The actual problem with your code is that you mix synchronous and asynchronous code in a way that won't work. waitFor() is the wrong tool for the job, because you need to reload in the middle of its execution, but the check function is called so fast that there probably won't be a complete page load to actually test it.
You need to recursively check whether the document is changed to your liking.
var tokenTrials = 0,
tokenFound = false;
function matchToken(){
if (this.getPageContent().indexOf(token) === -1) {
// token was not found
tokenTrials++;
if (tokenTrials < 50) {
this.reload().wait(1000).then(matchToken);
}
} else {
tokenFound = true;
}
}
casper.then(matchToken).then(function(){
test.assertTrue(tokenFound, "Token was found after " + tokenTrials + " trials");
});
I am trying to learn PhantomJS. I would appreciate if you can help me understand why the code below gives me an error(shown below) and help me fix the error. I am trying to execute some javascript on a page using phantomjs. The code lines in the evaluate function work well when I enter them in Chrome console, i.e., they give the expected result (document.title).
Thank you.
PhantomJS Code
var page = require('webpage').create();
var url = 'http://www.google.com';
page.open(url, function(status) {
var title = page.evaluate(function(query) {
document.querySelector('input[name=q]').setAttribute('value', query);
document.querySelector('input[name="btnK"]').click();
return document.title;
}, 'phantomJS');
console.log(title);
phantom.exit()
})
Error
TypeError: 'null' is not an object (evaluating 'document.querySelector('input[name="btnK"]').click')
phantomjs://webpage.evaluate():4
phantomjs://webpage.evaluate():7
phantomjs://webpage.evaluate():7
null
Edit 1: In response to Andrew's answer
Andrew, it is strange but on my computer, the button is an input element. The following screenshot shows the result on my computer.
Edit 2: click event unreliable
Sometimes, the following click event works, sometimes it does not.
document.querySelector('input[name="btnK"]')
Not clear to me what is happening.
About the answer
For future readers, in addition to the answer, the gist by Artjom B. is helpful in understanding what is happening. However, for a more robust solution, I think something like the waitfor.js example will have to be used (as suggested in the answer). I hope it is okay to copy and paste Artjom B.'s gist here. While the gist below works (with form submit); it is still not clear to me why it does not work if I try to simulate the click button on the input. If anyone can clarify that, it would be great.
// Gist by Artjom B.
var page = require('webpage').create();
var url = 'http://www.google.com';
page.open(url, function(status) {
var query = 'phantomJS';
page.evaluate(function(query) {
document.querySelector('input[name=q]').value = query;
document.querySelector('form[action="/search"]').submit();
}, query);
setTimeout(function(){
var title = page.evaluate(function() {
return document.title;
});
console.log(title);
phantom.exit();
}, 2000);
});
Google uses a form for submitting its queries. It's also highly likely that google has changed the prototype methods for their search buttons, so it's not really the best site to test web scraping.
The easiest way to do this is to actually perform a form submit, which slightly tweaks your example.
var page = require('webpage').create();
var url = 'http://www.google.com';
page.open(url, function(status) {
var query = 'phantomJS';
var title = page.evaluate(function(query) {
document.querySelector('input[name=q]').value = query;
document.querySelector('form[action="/search"]').submit();
return document.title
}, query);
console.log(title);
phantom.exit();
});
Note that you will likely need to consider that the response is async from this call, so getting the title directly will likely result in an undefined error (you need to account for the time it takes for the page to load before looking up data; you can review this in their waitfor.js example).
You can open google.com and try document.querySelector('input[name="btnK"]') in the console, it's null.
Actully try replace input with button:
document.querySelector('button[name="btnK"]')