Checking whether website showDirectoryPicker JS function with Puppeteer - javascript

Hello I want to get check whether the website has showDirectoryPicker function with the puppeteer.
Currently my code looks like this:
'use strict';
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch({ headless:false,executablePath: '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome', });
const [page] = await browser.pages();
await page.goto('https://example.com');
console.log(await page.evaluate(() => typeof showDirectoryPicker === 'function'));
await browser.close();
} catch (err) {
console.error(err);
}
})();
Currently this statement
console.log(await page.evaluate(() => typeof showDirectoryPicker === 'function'));
returns True for the every website since it is a valid JS function. However, I want to get True if the analyzed website has the showDirectoryPicker function.

If I understand your question correctly, you are trying to evaluate if the page calls the showDirectoryPicker() method, not if the browser supports it. One way to approach this would be to override the method with your own implementation that then reports back to Puppeteer if it gets called by the page. See my StackOverflow answer on overriding a function with a variant that logs whenever it gets called. You can then catch this log output with Puppeteer:
page.on('console', (message) => {
/*
Check that the message is what your overridden
custom variant logs.
*/
});

Related

Puppeteer's `page.evaluate` result differs from devTools console

I need to check for a certain service worker being registered. Unfortunately, page.evaluate returns undefined no matter what I do.
let page = await chrome.newPage();
await page.goto('http://127.0.0.1:8089/');
await page.waitFor(10000);
const isCorrectSW = await page.evaluate(async () => {
await navigator
.serviceWorker
.getRegistrations()
.then(registrations =>
registrations[0].active.scriptURL.endsWith('/target.js')
);
});
console.log(isCorrectSW);
isCorrectSW ends up being undefined, but if I enable devtools and run the same statement in the Chromium instance's devtools, I get the correct result. I can also observe the service worker attached in the browser's dev tools.
Is this a Puppeteer bug, or am I doing something incorrectly?
According to the documentation, page.evaluate returns undefined when the function passed returns a non-serializable value.
In your scenario, the function you are passing into page.evaulate does not return anything.
You are already using async, you can switch the function you are passing to be:
async () => {
const registrations = await navigator.serviceWorker.getRegistrations()
return registrations[0].active.scriptURL.endsWith('/target.js')
}

Pupeteer execute command in Devtools Console

So I have an line which I can just paste manually into the Devtools Console in a browser. Is there any way to make pupeteer execute it? After searching I havent found anything, sorry if this has been answered already, I am quite new.
For those who care its an Line to buy an listing of an Item, Example:
BuyMarketListing('listing', '3555030760772417847', 730, '2', '24716958303')
It looks like you're looking for page.evaluate(). Here is a link to the Puppeteer's documentation for it. You can pass in a string or an anonymous function containing the lines you want to evaluate in the page.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com');
await page.evaluate(() => { insert lines here }); // page.evaluate() should run the lines in the browser console
await browser.close();
})();

Is Puppeteer making my NodeJS process to exit?

I'm playing with Puppeteer and wrote this example (should probably never happen in production, but still):
const puppeteer = require('puppeteer');
(async () => {
// creating browser instance and closing it
const browser = await puppeteer.launch({ headless: false })
browser.disconnect()
await browser.close()
console.log('first check') // everything is ok here, message is printed
// opening page on the closed browser instance. Just for testing purposes
const page = await browser.newPage()
await page.goto('http://google.com')
// never printed
console.log('second check')
})()
So basically, I am trying to create a new page on a closed instance of the browser. Obviously, no page is opening because browser instance is closed. But I am expecting some error. Instead nothing happens and the second console.log is never executed!.
Question. If no error is thrown, why does the program never reach the second console.log? Does puppeteer somehow closes the process of my NodeJS application? Or I am missing something?
puppeteer version: latest - 5.3.1 (also 3.0.0)
By the way, if I use some earlier puppeteer version (2.0.0), same code is failing with error as I expect:
Error: WebSocket is not open: readyState 2 (CLOSING)
Update.
After debugging a bit the internals of Puppeteer I found out the following:
They have a Connection class with the map of callbacks as a property. Whenever we call the newPage method, a connection with new id is created as well as a new corresponding Promise. This promise resolve and reject functions are assigned to the callbacks map:
send(method, ...paramArgs) {
const params = paramArgs.length ? paramArgs[0] : undefined;
const id = this._rawSend({ method, params });
return new Promise((resolve, reject) => {
this._callbacks.set(id, { resolve, reject, error: new Error(), method });
});
}
Then, the Connection class has the _onMessage(message) callback. Whenever some data (message) is received, they inspect the message to find out if it is an OK or an ERROR message. After this they invoke the stored resolve or reject callback.
But since the browser instance is my example is already closed, the message never arrives and the Promise is neither resolved nor rejected.
And after small research, I found out that NodeJS is not able to track such a Promises. Example:
(async () => {
const promise = new Promise((resolve, reject) => {
if (true === false) {
resolve(13) // this will never happen
}
})
const value = await promise
console.log(value) // we never come here
})()
I agree that this seems to be a bug. I see the issue you made and added a potential fix.
Adding this as the first thing in Connection.send() seems to fix the issue:
if (this._closed)
return Promise.reject(new Error(`Protocol error (${method}): Target closed.`));
In the mean time, I have added this to my code so at least it doesn't die silently with no indication that it failed:
process.on('beforeExit', (code) => {
//beforeExit will run if out of callbacks, but not on an exit()
console.log('We seem to be exiting purely because there are no more awaits scheduled instead of having reached and exit. Assuming this is bad behavior from the browser process. previous exit code: ', code);
process.exit(1);
});
//my code goes here
asdf()
process.exit(0);//will exit without triggering the beforeExit message.
Honestly the behavior of Node in silently exiting seems like it is a little lacking. You can set an exitCode, but having a program completely able to run up to an await then die silently without triggering exception handlers or finally blocks is a little gross.
You don't see any error probably because you don't wait for the async function to settle. If you attach a catch handler most likely you'll catch the error:
const puppeteer = require('puppeteer');
(async () => {
// creating browser instance and closing it
const browser = await puppeteer.launch({ headless: false })
browser.disconnect()
await browser.close()
console.log('first check') // everything is ok here, message is printed
// opening page on the closed browser instance. Just for testing purposes
const page = await browser.newPage()
await page.goto('http://google.com')
// never printed
console.log('second check')
})()
.then(() => console.log('done'))
.catch(e => console.error(e)); // <= HERE
Or use try/catch:
const puppeteer = require("puppeteer");
(async () => {
try {
// creating browser instance and closing it
const browser = await puppeteer.launch({ headless: false });
browser.disconnect();
await browser.close();
console.log("first check"); // everything is ok here, message is printed
// opening page on the closed browser instance. Just for testing purposes
const page = await browser.newPage();
await page.goto("http://google.com");
// never printed
console.log("second check");
} catch (e) {
console.error(e);
}
})();

Puppeteer evaluate function

I'm new to pupetteer and I'm trying to understand how it's actually working through some examples:
So basically what I'm trying to do in this example is to extract number of views of a Youtube video. I've written a js line on the Chrome console that let me extract this information:
document.querySelector('#count > yt-view-count-renderer > span.view-count.style-scope.yt-view-count-renderer').innerText
Which worked well. However when I did the same with my pupetteer code he doesn't recognize the element I queried.
const puppeteer = require('puppeteer')
const getData = async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://www.youtube.com/watch?v=T5GSLc-i5Xo')
await page.waitFor(1000)
const result = await page.evaluate(() => {
let views = document.querySelector('#count > yt-view-count-renderer > span.view-count.style-scope.yt-view-count-renderer').innerText
return {views}
})
browser.close()
return result
}
getData().then(value => {
console.log(value)
})
I finally did it using ytInitialData object. However I'd like to understand the reason why my first code didn't work.
Thanks
It seems that wait for 1000 is not enough.
Try your solution with https://try-puppeteer.appspot.com/ and you will see.
However if you try the following solution, you will get the correct result
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.youtube.com/watch?v=T5GSLc-i5Xo');
await page.waitForSelector('span.view-count');
const views = await page.evaluate(() => document.querySelector('span.view-count').textContent);
console.log('Number of views: ' + views);
await browser.close();
Do not use hand made timeout to wait a page to load, unless you are testing whether the page can only in that amount of time. Differently from selenium where sometimes you do not have a choice other than using a timeout, with puppeteer you should always find some await function you can use instead of guessing a "good" timeout. As answered by Milan Hlinák, look into the page HTML code and figure out some HTML tag you can wait on, instead of using a timeout. Usually, wait for the HTML element(s) you test require in order to work properly. On you case, the span.view-count, as already answered by Milan Hlinák:
await page.waitForSelector('span.view-count');

How to emulate mouseover or run JS function on page with PhantomJS in NodeJS

NodeJS, PhantomJS, content parsing with Cheerio
Need to parse webpage, that contains dynamically loaded div(hint). The event can be on many table td's, here is an example
When I 'mouseover' on specific td I see this orange block with data, it's dynamically loaded with function, like this
onmouseover="page.hist(this,'P-0.00-0-0','355svxv498x0x0',417,event,0,1)"
I can view this info only after the page is loaded. Need to a specific row, only Marathonbet.
When the function runs, the text is loaded into another div (id='tooltip') and shown to the user.
I use phantom to parse the content of this page, everything OK with static values, but how I can receive this dynamically generated block to my rendered web page inside node router?
I see 2 ways:
Emulate mouse move on this coordinates to show needed text, but
there is a problem, how I can known it's coords?
Emulate function start after page is loaded and i known they codes
('355svxv498x0x0',417), but how I can run this function from node,
from phantom?
Here is some code, that recieve static page content in my router
```
phantom.create(config.phantomParams).then(ph => {
_ph = ph;
return _ph.createPage();
}).then(page => {
_page = page;
return _page.on('onConsoleMessage', function (msg) {
console.log(msg);
});
}).then(() => {
return _page.on('viewportSize', {width: 1920, height: 1080});
}).then(() => {
return _page.on('dpi', 130)
}).then(() => {
_page.setting('userAgent', config.userAgent);
return _page.open(matchLink);
}).then(() => {
return _page.property('content');
}).then(content => {
let $ = cheerio.load(content);
// working with content and get needed elements
console.log($.html());
}).then(() => {
_page.close();
_ph.exit();
});
```
Should I use Casper/Spooky, or anyone can explain how to use it in this case?
UPD. Trying with puppeteer, the code
```
let matchLink = 'http://www.oddsportal.com/soccer/world/club-friendly/san-carlos-guadalupe-xnsUg7zB/';
(async () => {
const browser = await puppeteer.launch({
args: [
'--proxy-server=46.101.167.43:80',
]});
const page = await browser.newPage();
await browser.userAgent(config.userAgent);
await page.setViewport({width: 1440, height: 960});
await page.goto(matchLink);
await page.evaluate(() => page.hist(this,'P-0.00-0-0','355svxv464x0x7omg7',381,event,0,1));
let bodyHTML = await page.evaluate(() => document.body.innerHTML);
console.log(bodyHTML);
await page.screenshot({path: 'example.png'});
await browser.close();
})();
```
Get
```
(node:8591) UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'stopPropagation' of undefined
at toolTip (http://www.oddsportal.com/res/x/global-180713073352.js:1:145511)
at TableSet.historyTooltip (http://www.oddsportal.com/res/x/global-180713073352.js:1:631115)
at PageEvent.PagePrototype.hist (http://www.oddsportal.com/res/x/global-180713073352.js:1:487314)
at __puppeteer_evaluation_script__:1:13
at ExecutionContext.evaluateHandle (/home/gil/Projects/oddsbot/node_modules/puppeteer/lib/ExecutionContext.js:97:13)
at <anonymous>
at process._tickCallback (internal/process/next_tick.js:188:7)
```
Error in target JS file, may be something with request..
Since you're open to suggestions I propose puppeteer It's a native node.js module that opens pages in the newest Chromium (especially useful since PhantomJS is very outdated) and is close to PhantomJS in terms of doing thinkgs.
If you also use node.js 8.x, async/await syntax is available for working with promises and it makes scraping with puppeteer a breeze.
So to run that function in puppeteer you would run
await page.evaluate(() => page.hist(this,'P-0.00-0-0','355svxv498x0x0',417,event,0,1) );
Update
Puppeteer has lots of convenience helpers, one of them is page.hover that literally will hover a pointer over an element:
await page.hover('td.some_selector');
But should you want to continue using Phantomjs and the excellent phantom module, you can:
_page.evaluate(function() {
page.hist(this,'P-0.00-0-0','355svxv498x0x0',417,event,0,1)
})
Documents on page.evaluate: http://phantomjs.org/api/webpage/method/evaluate.html

Categories