Puppeteer wait until page is completely loaded - javascript

I am working on creating PDF from web page.
The application on which I am working is single page application.
I tried many options and suggestion on https://github.com/GoogleChrome/puppeteer/issues/1412
But it is not working
const browser = await puppeteer.launch({
executablePath: 'C:\\Program Files (x86)\\Google\\Chrome\\Application\\chrome.exe',
ignoreHTTPSErrors: true,
headless: true,
devtools: false,
args: ['--no-sandbox', '--disable-setuid-sandbox']
});
const page = await browser.newPage();
await page.goto(fullUrl, {
waitUntil: 'networkidle2'
});
await page.type('#username', 'scott');
await page.type('#password', 'tiger');
await page.click('#Login_Button');
await page.waitFor(2000);
await page.pdf({
path: outputFileName,
displayHeaderFooter: true,
headerTemplate: '',
footerTemplate: '',
printBackground: true,
format: 'A4'
});
What I want is to generate PDF report as soon as Page is loaded completely.
I don't want to write any type of delays i.e. await page.waitFor(2000);
I can not do waitForSelector because the page has charts and graphs which are rendered after calculations.
Help will be appreciated.

You can use page.waitForNavigation() to wait for the new page to load completely before generating a PDF:
await page.goto(fullUrl, {
waitUntil: 'networkidle0',
});
await page.type('#username', 'scott');
await page.type('#password', 'tiger');
await page.click('#Login_Button');
await page.waitForNavigation({
waitUntil: 'networkidle0',
});
await page.pdf({
path: outputFileName,
displayHeaderFooter: true,
headerTemplate: '',
footerTemplate: '',
printBackground: true,
format: 'A4',
});
If there is a certain element that is generated dynamically that you would like included in your PDF, consider using page.waitForSelector() to ensure that the content is visible:
await page.waitForSelector('#example', {
visible: true,
});

Sometimes the networkidle events do not always give an indication that the page has completely loaded. There could still be a few JS scripts modifying the content on the page. So watching for the completion of HTML source code modifications by the browser seems to be yielding better results. Here's a function you could use -
const waitTillHTMLRendered = async (page, timeout = 30000) => {
const checkDurationMsecs = 1000;
const maxChecks = timeout / checkDurationMsecs;
let lastHTMLSize = 0;
let checkCounts = 1;
let countStableSizeIterations = 0;
const minStableSizeIterations = 3;
while(checkCounts++ <= maxChecks){
let html = await page.content();
let currentHTMLSize = html.length;
let bodyHTMLSize = await page.evaluate(() => document.body.innerHTML.length);
console.log('last: ', lastHTMLSize, ' <> curr: ', currentHTMLSize, " body html size: ", bodyHTMLSize);
if(lastHTMLSize != 0 && currentHTMLSize == lastHTMLSize)
countStableSizeIterations++;
else
countStableSizeIterations = 0; //reset the counter
if(countStableSizeIterations >= minStableSizeIterations) {
console.log("Page rendered fully..");
break;
}
lastHTMLSize = currentHTMLSize;
await page.waitForTimeout(checkDurationMsecs);
}
};
You could use this after the page load / click function call and before you process the page content. e.g.
await page.goto(url, {'timeout': 10000, 'waitUntil':'load'});
await waitTillHTMLRendered(page)
const data = await page.content()

In some cases, the best solution for me was:
await page.goto(url, { waitUntil: 'domcontentloaded' });
Some other options you could try are:
await page.goto(url, { waitUntil: 'load' });
await page.goto(url, { waitUntil: 'domcontentloaded' });
await page.goto(url, { waitUntil: 'networkidle0' });
await page.goto(url, { waitUntil: 'networkidle2' });
You can check this at puppeteer documentation:
https://pptr.dev/#?product=Puppeteer&version=v11.0.0&show=api-pagewaitfornavigationoptions

I always like to wait for selectors, as many of them are a great indicator that the page has fully loaded:
await page.waitForSelector('#blue-button');

In the latest Puppeteer version, networkidle2 worked for me:
await page.goto(url, { waitUntil: 'networkidle2' });

Wrap the page.click and page.waitForNavigation in a Promise.all
await Promise.all([
page.click('#submit_button'),
page.waitForNavigation({ waitUntil: 'networkidle0' })
]);

I encountered the same issue with networkidle when I was working on an offscreen renderer. I needed a WebGL-based engine to finish rendering and only then make a screenshot. What worked for me was a page.waitForFunction() method. In my case the usage was as follows:
await page.goto(url);
await page.waitForFunction("renderingCompleted === true")
const imageBuffer = await page.screenshot({});
In the rendering code, I was simply setting the renderingCompleted variable to true, when done. If you don't have access to the page code you can use some other existing identifier.

You can also use to ensure all elements have rendered
await page.waitFor('*')
Reference: https://github.com/puppeteer/puppeteer/issues/1875

As for December 2020, waitFor function is deprecated, as the warning inside the code tell:
waitFor is deprecated and will be removed in a future release. See
https://github.com/puppeteer/puppeteer/issues/6214 for details and how
to migrate your code.
You can use:
sleep(millisecondsCount) {
if (!millisecondsCount) {
return;
}
return new Promise(resolve => setTimeout(resolve, millisecondsCount)).catch();
}
And use it:
(async () => {
await sleep(1000);
})();

Keeping in mind the caveat that there's no silver bullet to handle all page loads, one strategy is to monitor the DOM until it's been stable (i.e. has not seen a mutation) for more than n milliseconds. This is similar to the network idle solution but geared towards the DOM rather than requests and therefore covers a different subset of loading behaviors.
Generally, this code would follow a page.waitForNavigation({waitUntil: "domcontentloaded"}) or page.goto(url, {waitUntil: "domcontentloaded"}), but you could also wait for it alongside, say, waitForNetworkIdle() using Promise.all() or Promise.race().
Here's a simple example:
const puppeteer = require("puppeteer"); // ^14.3.0
const waitForDOMStable = (
page,
options={timeout: 30000, idleTime: 2000}
) =>
page.evaluate(({timeout, idleTime}) =>
new Promise((resolve, reject) => {
setTimeout(() => {
observer.disconnect();
const msg = `timeout of ${timeout} ms ` +
"exceeded waiting for DOM to stabilize";
reject(Error(msg));
}, timeout);
const observer = new MutationObserver(() => {
clearTimeout(timeoutId);
timeoutId = setTimeout(finish, idleTime);
});
const config = {
attributes: true,
childList: true,
subtree: true
};
observer.observe(document.body, config);
const finish = () => {
observer.disconnect();
resolve();
};
let timeoutId = setTimeout(finish, idleTime);
}),
options
)
;
const html = `<!DOCTYPE html><html lang="en"><head>
<title>test</title></head><body><h1></h1><script>
(async () => {
for (let i = 0; i < 10; i++) {
document.querySelector("h1").textContent += i + " ";
await new Promise(r => setTimeout(r, 1000));
}
})();
</script></body></html>`;
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
await page.setContent(html);
await waitForDOMStable(page);
console.log(await page.$eval("h1", el => el.textContent));
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
For pages that continually mutate the DOM more often than the idle value, the timeout will eventually trigger and reject the promise, following the typical Puppeteer fallback. You can set a more aggressive overall timeout to fit your needs or tailor the logic to ignore (or only monitor) a particular subtree.

Answers so far haven't mentioned a critical fact: it's impossible to write a one-size-fits-all waitUntilPageLoaded function that works on every page. If it were possble, Puppeteer would surely provide it.
Such a function can't rely on a timeout, because there's always some page that takes longer to load than that timeout. As you extend the timeout to reduce the failure rate, you introduce unnecessary delays when working with fast pages. Timeouts are generally a poor solution, opting out of Puppeteer's event-driven model.
Waiting for idle network requests might not always work if the responses involve long-running DOM updates that take longer than 500ms to trigger a render.
Waiting for the DOM to stop changing might miss slow network requests, long-delayed JS triggers, or ongoing DOM manipulation that might cause the listener never to settle, unless specially handled.
And, of course, there's user interaction: captchas, prompts and cookie/subscription modals that need to be clicked through and dismissed before the page is in a sensible state for a full-page screenshot (for example).
Since every page has different, arbitrary JS behavior, the typical approach is to write event-driven logic that works for a specific page. Making precise, directed assumptions is much better than cobbling together a boatload of hacks that tries to solve every edge case.
If your use case is to write a load event that works on every page, my suggestion is to use some combination of the tools described here that is most balanced to meet your needs (speed vs. accuracy, development time/code complexitiy vs accuracy, etc). Use fail-safes for everything rather than blindly assuming all pages will cooperate with your assumptions. Think hard about what extent you really need to try to handle every web page. Prepare to compromise and accept some degree of failures you can live with.
Here's a quick rundown of the strategies you can mix and match to wait for loads to fit your needs:
page.goto() and page.waitForNavigation() default to the load event, which "is fired when the whole page has loaded, including all dependent resources such as stylesheets and images" (MDN), but this is often too pessimistic; there's no need to wait for a ton of data you don't care about. Often the data is available without waiting for all external resources, so domcontentloaded should be faster. See my post Avoiding Puppeteer Antipatterns for further discussion.
On the other hand, if there are JS-triggered networks requests after load, you'll miss that data. Hence networkidle2 and networkidle0, which wait 500 ms after the number of active network requests are 2 or 0. The motivation for the 2 version is that some sites keep ongoing requests open, which would cause networkidle0 to time out.
If you're waitng for a specific network response that might have a payload (or, for the general case, implementing your own network idle monitor), use page.waitForResponse(). page.waitForRequest(), page.waitForNetworkIdle() and page.on("request", ...) are also useful here.
If you're waiting for a particular selector to be visible, use page.waitForSelector(). If you're waiting for a load on a specific page, identify a selector that indicates the state you want to wait for. Generally speaking, for scripts specific to one page, this is the main tool to wait for the state you want, whether you're extracting data or clicking something. Frames and shadow roots thwart this function.
page.waitForFunction() lets you wait for an arbitrary predicate, for example, checking that the page's HTML or a specific list is a certain length. It's also useful for quickly dipping into frames and shadow roots to wait for predicates that depend on nested state. This function is also handy for detecting DOM mutations.
The most general tool is page.evaluate(), which plugs code into the browser. You can put just about any conditions you want here; most other Puppeteer functions are convenience wrappers for common cases you could implement by hand with evaluate.

I can't leave comments, but I made a python version of Anand's answer for anyone who finds it useful (i.e. if they use pyppeteer).
async def waitTillHTMLRendered(page: Page, timeout: int = 30000):
check_duration_m_secs = 1000
max_checks = timeout / check_duration_m_secs
last_HTML_size = 0
check_counts = 1
count_stable_size_iterations = 0
min_stabe_size_iterations = 3
while check_counts <= max_checks:
check_counts += 1
html = await page.content()
currentHTMLSize = len(html);
if(last_HTML_size != 0 and currentHTMLSize == last_HTML_size):
count_stable_size_iterations += 1
else:
count_stable_size_iterations = 0 # reset the counter
if(count_stable_size_iterations >= min_stabe_size_iterations):
break
last_HTML_size = currentHTMLSize
await page.waitFor(check_duration_m_secs)

For me the { waitUntil: 'domcontentloaded' } is always my go to.
I found that networkidle doesnt work well...

Related

How can I get Puppeteer to take PDF of new page? [duplicate]

I submit a form using the following code and i want Puppeteer to wait page load after form submit.
await page.click("button[type=submit]");
//how to wait until the new page loads before taking screenshot?
// i don't want this:
// await page.waitFor(1*1000); //← unwanted workaround
await page.screenshot({path: 'example.png'});
How to wait for page load with puppeteer?
You can wait for navigation asynchronously to avoid getting null on redirection,
await Promise.all([
page.click('button[type=submit]'),
page.waitForNavigation({waitUntil: 'networkidle2'})
]);
This will help you if the page.click already triggers a navigation.
await page.waitForNavigation();
According to the Official Documentation, you should use:
page.waitForNavigation(options)
options <Object> Navigation parameters which might have the following properties:
timeout <number> Maximum navigation time in milliseconds, defaults to 30 seconds, pass 0 to disable timeout. The default value can be changed by using the page.setDefaultNavigationTimeout(timeout) method.
waitUntil <string|Array<string>> When to consider navigation succeeded, defaults to load. Given an array of event strings, navigation is considered to be successful after all events have been fired. Events can be either:
load - consider navigation to be finished when the load event is fired.
domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.
networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
returns: <Promise<[?Response]>> Promise which resolves to the main resource response. In case of multiple redirects, the navigation will resolve with the response of the last redirect. In case of navigation to a different anchor or navigation due to History API usage, the navigation will resolve with null.
Readability:
You can use page.waitForNavigation() to wait for a page to navigate:
await page.waitForNavigation();
Performance:
But since page.waitForNavigation() is a shortcut for page.mainFrame().waitForNavigation(), we can use the following for a minor performance enhancement:
await page._frameManager._mainFrame.waitForNavigation();
Sometimes even using await page.waitForNavigation() will still result in a Error: Execution context was destroyed, most likely because of a navigation.
In my case, it was because the page was redirecting multiple times. The API says the default waitUntil option is Load—this required me to wait for navigation each redirect (3 times).
Using only a single instance of page.waitForNavigation with the waitUntil option networkidle2 worked well in my case:
await button.click();
await page.waitForNavigation({waitUntil: 'networkidle2'});
Finally, the API suggests using a Promise.All to prevent a race condition. I haven't needed this but provide it for completeness:
await Promise.all([button.click(), page.waitForNavigation({waitUntil:'networkidle2'})])
If all else fails, you can use page.waitForSelector as recommended on a Puppeteer github issue—or in my case, page.waitForXPath()
I know it is bit late to answer this. It may be helpful for those who are getting below exception while doing waitForNavigation.
(node:14531) UnhandledPromiseRejectionWarning: TimeoutError:
Navigation Timeout Exceeded: 30000ms exceeded
at Promise.then (/home/user/nodejs/node_modules/puppeteer/lib/LifecycleWatcher.js:142:21)
at -- ASYNC --
at Frame. (/home/user/nodejs/node_modules/puppeteer/lib/helper.js:111:15)
at Page.waitForNavigation (/home/user/nodejs/node_modules/puppeteer/lib/Page.js:649:49)
at Page. (/home/user/nodejs/node_modules/puppeteer/lib/helper.js:112:23)
at /home/user/nodejs/user/puppeteer/example7.js:14:12
at
The correct code that worked for me is as below.
await page.click('button[id=start]', {waitUntil: 'domcontentloaded'});
Similarly if you are going to a new page, code should be like
await page.goto('here goes url', {waitUntil: 'domcontentloaded'});
i suggest to wrap page.to in a wrapper and wait for everything loaded
this is my wrapper
loadUrl: async function (page, url) {
try {
await page.goto(url, {
timeout: 20000,
waitUntil: ['load', 'domcontentloaded', 'networkidle0', 'networkidle2']
})
} catch (error) {
throw new Error("url " + url + " url not loaded -> " + error)
}
}
now you can use this with
await loadUrl(page, "https://www.google.com")
None of the above answers solved my issue. Sometimes waitForNavigation just timeout. I came up with other solution using the waitForFunction, checking if document is in ready state.
await page.waitForFunction(() => document.readyState === "complete");
await Promise.all([
page.click(selectors.submit),
page.waitForNavigation({ waitUntil: 'networkidle0' }),
]);
This would be the first priority to use as it waits for all network to complete and assumes it is done when you don't have more than 0 network call for 500ms.
you can also use
await page.waitForNavigation({ waitUntil: 'load' })
or else, you can use
await page.waitForResponse(response => response.ok())
this function can also be used in various places as it only allows to proceed further when all the calls are a success that is when all the response status is ok i.e (200-299)
This worked for me:
await Promise.all([
page.goto(URL),
page.waitForNavigation({ waitUntil: 'networkidle0' }),
]);
console.log('page loaded')
For some reason I was not able to click button (Handled an event, not in form)
<button onclick="someFunction();" class="button button2">Submit</button>
The problem was that page was rendered on server side. Thus the button didn't existed whenever I waited for input field await page.waitForSelector('button.button2')
The solution was to bind page.goto(URL) and page.waitForNavigation({ waitUntil: 'networkidle0' }) in Promise
await Promise.all([
page.goto(URL),
page.waitForNavigation({ waitUntil: 'networkidle0' }),
]);
console.log('page loaded')
await page.waitForSelector('button.button2')
console.log('button is here');
If submitting the form opens some other page, then you may just want to wait for a selector in that page. I have often had issues using page.waitForNavigation() since it's options don't really ensure we have effectively navigated to another page.
// login page
page.click("#login");
// homepage, after login
page.waitForSelector("#home", {visible: true}); // page.waitForXpath()
Of you course you can increase the wait time for the selector.
This works for me
Puppeteer version: 19.2.2
page.click(".clickable-selector");
await page.waitForNavigation({ waitUntil: "load" });
Note:
If you do this inside a loop. ( scrapping page-1, click to page-2, scrapping page-2 and so on... )
await page.waitForSelector(".clickable-selector", { visible: true });
Wait for this clickable selector before doing any other scrapping on the page.
I ran into a scenario, where there was the classic POST-303-GET and an input[type=submit] was involved. It seems that in this case, the click of the button won't resolve until after the associated form's submission and redirection, so the solution was to remove the waitForNavigation, because it was executed after the redirection and thus was timing out.
Please try
await page.waitForNavigation()
or
await page.waitForSelector("#indecator_of_any_element_of_you_are_waiting_for")

How to click a list item with Puppeteer?

I'm new to puppeteer and I'm trying to click on a selector from a dropdown menu the MR element here
I've tried using await page.click('.mat-option ng-star-inserted mat-active');
and also
await page.select('#mat-option-0');
here is my code, would anyone be able to help me fix this issue and understand how to resolve it in the future? I'm not to sure what methods to be using with each elelement, I think it's every time I introduce a class with spaces in the name could that be the issue?
and does anyone have any best practices for when codings things like this?
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://www.game.co.uk/en/-2640058?cm_sp=NintendoFormatHub-_-Accessories-_-espot-_-PikaCase');
await console.log('Users navigated to site :)');
await page.waitFor(2300);
await page.click('.cookiePolicy_inner--actions');
await page.waitFor(1000);
await page.click('.addToBasket');
await page.waitFor(1300);
await page.click('.secure-checkout');
await page.waitFor(2350);
await page.click('.cta-large');
await page.waitFor(1200);
await page.goto('https://checkout.game.co.uk/contact');
await page.waitFor(500);
await page.click('.mat-form-field-infix');
await page.waitForSelector('.ng-tns-c17-1 ng-trigger ng-trigger-transformPanel mat-select-panel mat-primary');
await page.click('.mat-option ng-star-inserted mat-active');
})();
There are a couple of issues with the script, let's see them:
you are using waitFor() with a number of miliseconds, this is brittle because you never know if perhaps some action will take longer, and if it does not, you will waste time; you can substitute these waits with waitForSelector(); in fact, if you use VSCode (and perhaps other IDEs), it will notify you that this method is deprecated, don't ignore these warnings:
when I use DevTools, no element is returned for .mat-option ng-star-inserted mat-active selector, but I can find the desired element with #mat-option-0 selector, or I can use the longer version, but have to use a dot (.) before each class and delete spaces between them like so .mat-option.ng-star-inserted.mat-active, you can see a CSS reference here, the point is that with spaces, it looks for descendants, which is not what you want
These two changes should give you what you need, this is a result when running on my side, you can see that Mr. has been selected:
I got there with this script:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://www.game.co.uk/en/-2640058?cm_sp=NintendoFormatHub-_-Accessories-_-espot-_-PikaCase');
await console.log('Users navigated to site :)');
await page.waitForSelector('.cookiePolicy_inner--actions');
await page.click('.cookiePolicy_inner--actions');
await page.waitForSelector('.addToBasket');
await page.click('.addToBasket');
await page.waitForSelector('.secure-checkout');
await page.click('.secure-checkout');
await page.waitForSelector('.cta-large');
await page.click('.cta-large');
await page.goto('https://checkout.game.co.uk/contact');
await page.waitForSelector('.mat-form-field-infix');
await page.click('.mat-form-field-infix');
await page.waitForSelector('#mat-option-0');
await page.click('#mat-option-0');
})();
However, this is still not ideal because:
you handle the cookie bar with clicks, try to find a way without clicking; perhaps injecting a cookie that disables the cookie bar (if possible)
the code is one big piece that is perhaps ok for now and this example but might become unmaintainable if you keep adding lines to it; try to reuse code in functions and methods

Puppeteer evaluate function

I'm new to pupetteer and I'm trying to understand how it's actually working through some examples:
So basically what I'm trying to do in this example is to extract number of views of a Youtube video. I've written a js line on the Chrome console that let me extract this information:
document.querySelector('#count > yt-view-count-renderer > span.view-count.style-scope.yt-view-count-renderer').innerText
Which worked well. However when I did the same with my pupetteer code he doesn't recognize the element I queried.
const puppeteer = require('puppeteer')
const getData = async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://www.youtube.com/watch?v=T5GSLc-i5Xo')
await page.waitFor(1000)
const result = await page.evaluate(() => {
let views = document.querySelector('#count > yt-view-count-renderer > span.view-count.style-scope.yt-view-count-renderer').innerText
return {views}
})
browser.close()
return result
}
getData().then(value => {
console.log(value)
})
I finally did it using ytInitialData object. However I'd like to understand the reason why my first code didn't work.
Thanks
It seems that wait for 1000 is not enough.
Try your solution with https://try-puppeteer.appspot.com/ and you will see.
However if you try the following solution, you will get the correct result
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.youtube.com/watch?v=T5GSLc-i5Xo');
await page.waitForSelector('span.view-count');
const views = await page.evaluate(() => document.querySelector('span.view-count').textContent);
console.log('Number of views: ' + views);
await browser.close();
Do not use hand made timeout to wait a page to load, unless you are testing whether the page can only in that amount of time. Differently from selenium where sometimes you do not have a choice other than using a timeout, with puppeteer you should always find some await function you can use instead of guessing a "good" timeout. As answered by Milan Hlinák, look into the page HTML code and figure out some HTML tag you can wait on, instead of using a timeout. Usually, wait for the HTML element(s) you test require in order to work properly. On you case, the span.view-count, as already answered by Milan Hlinák:
await page.waitForSelector('span.view-count');

Wait for actions to finish before executing again in puppeteer

I have a puppeteer script that inputs some text into a field, submits the query, and processes the results.
Currently, the script only processes 1 search term at a time, but I need it to be able to process an array of items consecutively.
I figured I would just put the code in a loop (see code below), however, it just types in all the items from the array at once into the field and doesn't execute the code block for each search term:
for (const search of searchTerms) {
await Promise.all([
page.type('input[name="q"]', 'in:spam ' + search + String.fromCharCode(13)),
page.waitForNavigation({
waitUntil: 'networkidle2'
})
]);
const count = await page.evaluate((sel) => {
return document.querySelectorAll(sel)[1].querySelectorAll('tr').length;
}, 'table[id^=":"]');
if (count > 0) {
const more = await page.$x('//span[contains(#class, "asa") and contains(#class, "bjy")]');
await more[1].click();
await page.waitFor(1250);
const markRead = await page.$x('//div[text()="Mark all as read"]');
await markRead[0].click();
const selectAll = await page.$x('//span[#role="checkbox"]');
await selectAll[1].click();
const move = await page.$x('//div[#act="8"]');
await move[0].click();
await page.waitFor(5000);
}
}
I tried using a recursion function from Nodejs Synchronous For each loop
I also tried using a function generator with yields, as well as promises and even tried the eachSeries function from the async package from this post Nodejs Puppeteer Wait to finish all code from loop
Nothing I tried was successful. Any help would be appreciated, thanks!
There is no way to visit two websites at same time with same tab. You can try it on your browser to make sure.
Jokes aside, if you want to search multiple items, you have to create a page or tab for that.
for (const search of searchTerms) {
const newTab = await browser.newPage()
// other modified code here
}
... wait that will still search one by one. But if you use a map with concurrency limit, it will work well.
We can use p-all for this.
const pAll = require('p-all');
const actions = []
for (const search of searchTerms) {
actions.push(async()=>{
const newTab = await browser.newPage()
// other modified code here
})
}
pAll(actions, {concurrency: 2}) // <-- set how many to search at once
So we are looping thru each term, and adding a new promise on the action list. Adding functions won't take much time. And then we can run the promise chain.
You will still need to modify the code above to have what you desire.
Peace!

Puppeteer wait page load after form submit

I submit a form using the following code and i want Puppeteer to wait page load after form submit.
await page.click("button[type=submit]");
//how to wait until the new page loads before taking screenshot?
// i don't want this:
// await page.waitFor(1*1000); //← unwanted workaround
await page.screenshot({path: 'example.png'});
How to wait for page load with puppeteer?
You can wait for navigation asynchronously to avoid getting null on redirection,
await Promise.all([
page.click('button[type=submit]'),
page.waitForNavigation({waitUntil: 'networkidle2'})
]);
This will help you if the page.click already triggers a navigation.
await page.waitForNavigation();
According to the Official Documentation, you should use:
page.waitForNavigation(options)
options <Object> Navigation parameters which might have the following properties:
timeout <number> Maximum navigation time in milliseconds, defaults to 30 seconds, pass 0 to disable timeout. The default value can be changed by using the page.setDefaultNavigationTimeout(timeout) method.
waitUntil <string|Array<string>> When to consider navigation succeeded, defaults to load. Given an array of event strings, navigation is considered to be successful after all events have been fired. Events can be either:
load - consider navigation to be finished when the load event is fired.
domcontentloaded - consider navigation to be finished when the DOMContentLoaded event is fired.
networkidle0 - consider navigation to be finished when there are no more than 0 network connections for at least 500 ms.
networkidle2 - consider navigation to be finished when there are no more than 2 network connections for at least 500 ms.
returns: <Promise<[?Response]>> Promise which resolves to the main resource response. In case of multiple redirects, the navigation will resolve with the response of the last redirect. In case of navigation to a different anchor or navigation due to History API usage, the navigation will resolve with null.
Readability:
You can use page.waitForNavigation() to wait for a page to navigate:
await page.waitForNavigation();
Performance:
But since page.waitForNavigation() is a shortcut for page.mainFrame().waitForNavigation(), we can use the following for a minor performance enhancement:
await page._frameManager._mainFrame.waitForNavigation();
Sometimes even using await page.waitForNavigation() will still result in a Error: Execution context was destroyed, most likely because of a navigation.
In my case, it was because the page was redirecting multiple times. The API says the default waitUntil option is Load—this required me to wait for navigation each redirect (3 times).
Using only a single instance of page.waitForNavigation with the waitUntil option networkidle2 worked well in my case:
await button.click();
await page.waitForNavigation({waitUntil: 'networkidle2'});
Finally, the API suggests using a Promise.All to prevent a race condition. I haven't needed this but provide it for completeness:
await Promise.all([button.click(), page.waitForNavigation({waitUntil:'networkidle2'})])
If all else fails, you can use page.waitForSelector as recommended on a Puppeteer github issue—or in my case, page.waitForXPath()
I know it is bit late to answer this. It may be helpful for those who are getting below exception while doing waitForNavigation.
(node:14531) UnhandledPromiseRejectionWarning: TimeoutError:
Navigation Timeout Exceeded: 30000ms exceeded
at Promise.then (/home/user/nodejs/node_modules/puppeteer/lib/LifecycleWatcher.js:142:21)
at -- ASYNC --
at Frame. (/home/user/nodejs/node_modules/puppeteer/lib/helper.js:111:15)
at Page.waitForNavigation (/home/user/nodejs/node_modules/puppeteer/lib/Page.js:649:49)
at Page. (/home/user/nodejs/node_modules/puppeteer/lib/helper.js:112:23)
at /home/user/nodejs/user/puppeteer/example7.js:14:12
at
The correct code that worked for me is as below.
await page.click('button[id=start]', {waitUntil: 'domcontentloaded'});
Similarly if you are going to a new page, code should be like
await page.goto('here goes url', {waitUntil: 'domcontentloaded'});
i suggest to wrap page.to in a wrapper and wait for everything loaded
this is my wrapper
loadUrl: async function (page, url) {
try {
await page.goto(url, {
timeout: 20000,
waitUntil: ['load', 'domcontentloaded', 'networkidle0', 'networkidle2']
})
} catch (error) {
throw new Error("url " + url + " url not loaded -> " + error)
}
}
now you can use this with
await loadUrl(page, "https://www.google.com")
None of the above answers solved my issue. Sometimes waitForNavigation just timeout. I came up with other solution using the waitForFunction, checking if document is in ready state.
await page.waitForFunction(() => document.readyState === "complete");
await Promise.all([
page.click(selectors.submit),
page.waitForNavigation({ waitUntil: 'networkidle0' }),
]);
This would be the first priority to use as it waits for all network to complete and assumes it is done when you don't have more than 0 network call for 500ms.
you can also use
await page.waitForNavigation({ waitUntil: 'load' })
or else, you can use
await page.waitForResponse(response => response.ok())
this function can also be used in various places as it only allows to proceed further when all the calls are a success that is when all the response status is ok i.e (200-299)
This works for me
Puppeteer version: 19.2.2
page.click(".clickable-selector");
await page.waitForNavigation({ waitUntil: "load" });
Note:
If you do this inside a loop. ( scrapping page-1, click to page-2, scrapping page-2 and so on... )
await page.waitForSelector(".clickable-selector", { visible: true });
Wait for this clickable selector before doing any other scrapping on the page.
This worked for me:
await Promise.all([
page.goto(URL),
page.waitForNavigation({ waitUntil: 'networkidle0' }),
]);
console.log('page loaded')
For some reason I was not able to click button (Handled an event, not in form)
<button onclick="someFunction();" class="button button2">Submit</button>
The problem was that page was rendered on server side. Thus the button didn't existed whenever I waited for input field await page.waitForSelector('button.button2')
The solution was to bind page.goto(URL) and page.waitForNavigation({ waitUntil: 'networkidle0' }) in Promise
await Promise.all([
page.goto(URL),
page.waitForNavigation({ waitUntil: 'networkidle0' }),
]);
console.log('page loaded')
await page.waitForSelector('button.button2')
console.log('button is here');
If submitting the form opens some other page, then you may just want to wait for a selector in that page. I have often had issues using page.waitForNavigation() since it's options don't really ensure we have effectively navigated to another page.
// login page
page.click("#login");
// homepage, after login
page.waitForSelector("#home", {visible: true}); // page.waitForXpath()
Of you course you can increase the wait time for the selector.
I ran into a scenario, where there was the classic POST-303-GET and an input[type=submit] was involved. It seems that in this case, the click of the button won't resolve until after the associated form's submission and redirection, so the solution was to remove the waitForNavigation, because it was executed after the redirection and thus was timing out.
Please try
await page.waitForNavigation()
or
await page.waitForSelector("#indecator_of_any_element_of_you_are_waiting_for")

Categories