I'm playing with Puppeteer and wrote this example (should probably never happen in production, but still):
const puppeteer = require('puppeteer');
(async () => {
// creating browser instance and closing it
const browser = await puppeteer.launch({ headless: false })
browser.disconnect()
await browser.close()
console.log('first check') // everything is ok here, message is printed
// opening page on the closed browser instance. Just for testing purposes
const page = await browser.newPage()
await page.goto('http://google.com')
// never printed
console.log('second check')
})()
So basically, I am trying to create a new page on a closed instance of the browser. Obviously, no page is opening because browser instance is closed. But I am expecting some error. Instead nothing happens and the second console.log is never executed!.
Question. If no error is thrown, why does the program never reach the second console.log? Does puppeteer somehow closes the process of my NodeJS application? Or I am missing something?
puppeteer version: latest - 5.3.1 (also 3.0.0)
By the way, if I use some earlier puppeteer version (2.0.0), same code is failing with error as I expect:
Error: WebSocket is not open: readyState 2 (CLOSING)
Update.
After debugging a bit the internals of Puppeteer I found out the following:
They have a Connection class with the map of callbacks as a property. Whenever we call the newPage method, a connection with new id is created as well as a new corresponding Promise. This promise resolve and reject functions are assigned to the callbacks map:
send(method, ...paramArgs) {
const params = paramArgs.length ? paramArgs[0] : undefined;
const id = this._rawSend({ method, params });
return new Promise((resolve, reject) => {
this._callbacks.set(id, { resolve, reject, error: new Error(), method });
});
}
Then, the Connection class has the _onMessage(message) callback. Whenever some data (message) is received, they inspect the message to find out if it is an OK or an ERROR message. After this they invoke the stored resolve or reject callback.
But since the browser instance is my example is already closed, the message never arrives and the Promise is neither resolved nor rejected.
And after small research, I found out that NodeJS is not able to track such a Promises. Example:
(async () => {
const promise = new Promise((resolve, reject) => {
if (true === false) {
resolve(13) // this will never happen
}
})
const value = await promise
console.log(value) // we never come here
})()
I agree that this seems to be a bug. I see the issue you made and added a potential fix.
Adding this as the first thing in Connection.send() seems to fix the issue:
if (this._closed)
return Promise.reject(new Error(`Protocol error (${method}): Target closed.`));
In the mean time, I have added this to my code so at least it doesn't die silently with no indication that it failed:
process.on('beforeExit', (code) => {
//beforeExit will run if out of callbacks, but not on an exit()
console.log('We seem to be exiting purely because there are no more awaits scheduled instead of having reached and exit. Assuming this is bad behavior from the browser process. previous exit code: ', code);
process.exit(1);
});
//my code goes here
asdf()
process.exit(0);//will exit without triggering the beforeExit message.
Honestly the behavior of Node in silently exiting seems like it is a little lacking. You can set an exitCode, but having a program completely able to run up to an await then die silently without triggering exception handlers or finally blocks is a little gross.
You don't see any error probably because you don't wait for the async function to settle. If you attach a catch handler most likely you'll catch the error:
const puppeteer = require('puppeteer');
(async () => {
// creating browser instance and closing it
const browser = await puppeteer.launch({ headless: false })
browser.disconnect()
await browser.close()
console.log('first check') // everything is ok here, message is printed
// opening page on the closed browser instance. Just for testing purposes
const page = await browser.newPage()
await page.goto('http://google.com')
// never printed
console.log('second check')
})()
.then(() => console.log('done'))
.catch(e => console.error(e)); // <= HERE
Or use try/catch:
const puppeteer = require("puppeteer");
(async () => {
try {
// creating browser instance and closing it
const browser = await puppeteer.launch({ headless: false });
browser.disconnect();
await browser.close();
console.log("first check"); // everything is ok here, message is printed
// opening page on the closed browser instance. Just for testing purposes
const page = await browser.newPage();
await page.goto("http://google.com");
// never printed
console.log("second check");
} catch (e) {
console.error(e);
}
})();
Related
Only 'text' is output to the console, but 'text2' and 'text3' are not output, because exit from the stream is faster. This is the most simplified code of the real project structure. I can't figure out why this is happening and what to do about it.
This is the stream handler:
async function func1() {
await console.log('text2')
}
async function func() {
await console.log('text')
await func1();
await console.log('text3')
}
async function handler() {
await func()
await process.exit(123)
}
handler();
The only thing I can add is that the code above is contained in a handler.js file and is run like this:
const {Worker} = require('worker_threads')
const myWorker = new Worker('./handler.js')
myWorker.on('exit', (data) => {
console.log('Worker exit: ' + data)
})
И вывод следующий:
console
Despite the lack of a true, reproducible example, for your use case; this question is interesting and the answer wasn't obvious to find. I am not as read up on the worker_threads API as id like to be but ill offer my two cents.
After doing some research/testing (and not really having much to go on in terms of your specific use case), i believe it is because you are calling process.exit inside the worker thread. Since worker threads add their console statements to the main threads call stack, it takes some time before they all run. Calling process.exit here must be removing any remaining operations from that worker on the main threads call stack before they have a chance to run.
When you remove process.exit, all log statements run and the worker exits naturally.
If you need to close the worker thread at a particular point in time (which is what i think the real question here is), you might be better off sending a message back to the main thread using the parentPort.postMessage() method and then having the main thread terminate the worker:
// worker.js
const {parentPort} = require('worker_threads');
async function func1() {
await console.log('text2')
}
async function func() {
await console.log('text')
await func1();
await console.log('text3')
}
async function handler() {
await func()
parentPort.postMessage({kill: true});
}
handler();
Then listen for that message event and terminate the worker from the main thread:
// index.js
const {Worker} = require('worker_threads')
const myWorker = new Worker('./handler.js')
myWorker.on('exit', (data) => {
console.log('Worker exit: ' + data)
})
myWorker.on('message', async msg => {
if (msg.kill) {
console.log('killing worker with code');
await myWorker.terminate();
}
})
There are some gatchas here as the terminate method terminates the thread "as soon as possible" and you are relying on events and will need to prevent the worker from continuing on while the main thread executes the terminate method. However, without more information i cant be of much help. From our comments it might also be worth looking into child processes for this. Hope this helps
I am evaluating codeToBeEvaluated function in browser using puppeteer. codeToBeEvaluated has a while loop which is supposed to show an alert (see LINE B) after every 10s.
The problem is when I run this script, I don't see any alerts. Code execution doesn't wait for 10s in Line A. Rather process exits immediately. I have no idea why. I am really curious why it's not working.
Interestingly, when I run only the codeToBeEvaluated function in the browser's console, it works perfectly fine and I see the alerts that I am supposed to see.
It would be awesome if someone could explain this behaviour. I am not looking for any workarounds to show up alerts. I just want to understand this behaviour.
const puppeteer = require("puppeteer");
// this function will be executed in the browser
// it should create an alert after some time interval
const codeToBeEvaluated = async () => {
let index_2 = 0;
// a promise that resolves after ms*1000 seconds
async function sleep(ms) {
return new Promise((resolve) => {
setTimeout(resolve, ms);
});
}
// await within function's while loop
async function whileWrapper() {
while (index_2 < 10) {
await sleep(10000); // LINE A: execution doesn't stop at all :/
alert("hello " + index_2); // LINE B: does NOT execute at all
index_2++;
}
}
whileWrapper();
};
async function main(url) {
const browser = await puppeteer.launch({
// headless: false,
devtools: true,
// args: ["--no-sandbox", "--disable-setuid-sandbox"],
});
const page = await browser.newPage();
await page.goto(url);
await page.evaluate(codeToBeEvaluated);
browser.close();
}
main("https://www.google.com/search?q=hello");
the reason for the difference between executing codeToBeEvaluated on DevTools manually vs. from your puppeteer script is that:
on DevTools console the script has infinite time to execute a longer async command (except if you are quickly closing the browser while it is still running)
in your puppeteer script you have other commands following the page.evaluate like browser.close (which I advise you to put after an await as it returns a promise!), so the browser is closed before the function would finish
you need to await whileWrapper()'s promise too, so changing your code in LINE 25 to the following will make it behave as you would have expected:
await whileWrapper();
I need to check for a certain service worker being registered. Unfortunately, page.evaluate returns undefined no matter what I do.
let page = await chrome.newPage();
await page.goto('http://127.0.0.1:8089/');
await page.waitFor(10000);
const isCorrectSW = await page.evaluate(async () => {
await navigator
.serviceWorker
.getRegistrations()
.then(registrations =>
registrations[0].active.scriptURL.endsWith('/target.js')
);
});
console.log(isCorrectSW);
isCorrectSW ends up being undefined, but if I enable devtools and run the same statement in the Chromium instance's devtools, I get the correct result. I can also observe the service worker attached in the browser's dev tools.
Is this a Puppeteer bug, or am I doing something incorrectly?
According to the documentation, page.evaluate returns undefined when the function passed returns a non-serializable value.
In your scenario, the function you are passing into page.evaulate does not return anything.
You are already using async, you can switch the function you are passing to be:
async () => {
const registrations = await navigator.serviceWorker.getRegistrations()
return registrations[0].active.scriptURL.endsWith('/target.js')
}
Hello I want to get check whether the website has showDirectoryPicker function with the puppeteer.
Currently my code looks like this:
'use strict';
const puppeteer = require('puppeteer');
(async function main() {
try {
const browser = await puppeteer.launch({ headless:false,executablePath: '/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome', });
const [page] = await browser.pages();
await page.goto('https://example.com');
console.log(await page.evaluate(() => typeof showDirectoryPicker === 'function'));
await browser.close();
} catch (err) {
console.error(err);
}
})();
Currently this statement
console.log(await page.evaluate(() => typeof showDirectoryPicker === 'function'));
returns True for the every website since it is a valid JS function. However, I want to get True if the analyzed website has the showDirectoryPicker function.
If I understand your question correctly, you are trying to evaluate if the page calls the showDirectoryPicker() method, not if the browser supports it. One way to approach this would be to override the method with your own implementation that then reports back to Puppeteer if it gets called by the page. See my StackOverflow answer on overriding a function with a variant that logs whenever it gets called. You can then catch this log output with Puppeteer:
page.on('console', (message) => {
/*
Check that the message is what your overridden
custom variant logs.
*/
});
I'm experimenting with Puppeteer Cluster and I just don't understand how to use queuing properly. Can it only be used for calls where you don't wait for a response? I'm using Artillery to fire a bunch of requests simultaneously, but they all fail while only some fail when I have the command execute directly.
I've taken the code straight from the examples and replaced execute with queue which I expected to work, except the code doesn't wait for the result. Is there a way to achieve this anyway?
So this works:
const screen = await cluster.execute(req.query.url);
But this breaks:
const screen = await cluster.queue(req.query.url);
Here's the full example with queue:
const express = require('express');
const app = express();
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 2,
});
await cluster.task(async ({ page, data: url }) => {
// make a screenshot
await page.goto('http://' + url);
const screen = await page.screenshot();
return screen;
});
// setup server
app.get('/', async function (req, res) {
if (!req.query.url) {
return res.end('Please specify url like this: ?url=example.com');
}
try {
const screen = await cluster.queue(req.query.url);
// respond with image
res.writeHead(200, {
'Content-Type': 'image/jpg',
'Content-Length': screen.length //variable is undefined here
});
res.end(screen);
} catch (err) {
// catch error
res.end('Error: ' + err.message);
}
});
app.listen(3000, function () {
console.log('Screenshot server listening on port 3000.');
});
})();
What am I doing wrong here? I'd really like to use queuing because without it every incoming request appears to slow down all the other ones.
Author of puppeteer-cluster here.
Quote from the docs:
cluster.queue(..): [...] Be aware that this function only returns a Promise for backward compatibility reasons. This function does not run asynchronously and will immediately return.
cluster.execute(...): [...] Works like Cluster.queue, just that this function returns a Promise which will be resolved after the task is executed. In case an error happens during the execution, this function will reject the Promise with the thrown error. There will be no "taskerror" event fired.
When to use which function:
Use cluster.queue if you want to queue a large number of jobs (e.g. list of URLs). The task function needs to take care of storing the results by printing them to console or storing them into a database.
Use cluster.execute if your task function returns a result. This will still queue the job, so this is like calling queue in addition to waiting for the job to finish. In this scenario, there is most often a "idling cluster" present which is used when a request hits the server (like in your example code).
So, you definitely want to use cluster.execute as you want to wait for the results of the task function. The reason, you do not see any errors is (as quoted above) that the errors of the cluster.queue function are emitted via a taskerror event. The cluster.execute errors are directly thrown (Promise is rejected). Most likely, in both cases your jobs fail, but it is only visible for the cluster.execute