Puppeteer - Wait for network requests to complete after page.select() - javascript

Is there a way to wait for network requests to resolve after performing an action on a page, before performing a new action in Puppeteer?
I need to interact with a select menu on the page using page.select() which causes dynamic images and fonts to load into the page. I need to wait for these requests to complete before executing the next action.
--
Caveats:
I cannot reload the page or go to a new url.
I do not know what the request types might be, or how many
--
// launch puppeteer
const browser = await puppeteer.launch({});
// load new page
const page = await browser.newPage();
// go to URL and wait for initial requests to resolve
await page.goto(pageUrl, {
waitUntil: "networkidle0"
});
// START LOOP
for (let value of lotsOfValues) {
// interact with select menu
await page.select('select', value);
// wait for network requests to complete (images, fonts)
??
// screenshot page with new content
await pageElement.screenshot({
type: "jpeg",
quality: 100
});
} // END LOOP
// close
await browser.close();

The answer to this lies in using page.setRequestInterception(true); and monitoring subsequent requests, waiting for them to resvolve before moving on to the next task (thanks #Guarev for the point in the right direction).
This module (https://github.com/jtassin/pending-xhr-puppeteer) does exactly that, but for XHR requests. I modified it to look for 'image' and 'font' types.
Final code looks something like this:
// launch puppeteer
const browser = await puppeteer.launch({});
// load new page
const page = await browser.newPage();
// go to URL and wait for initial requests to resolve
await page.goto(pageUrl, {
waitUntil: "networkidle0"
});
// enable this here because we don't want to watch the initial page asset requests (which page.goto above triggers)
await page.setRequestInterception(true);
// custom version of pending-xhr-puppeteer module
let monitorRequests = new PuppeteerNetworkMonitor(page);
// START LOOP
for (let value of lotsOfValues) {
// interact with select menu
await page.select('select', value);
// wait for network requests to complete (images, fonts)
await monitorRequests.waitForAllRequests();
// screenshot page with new content
await pageElement.screenshot({
type: "jpeg",
quality: 100
});
} // END LOOP
// close
await browser.close();
NPM Module
class PuppeteerNetworkMonitor {
constructor(page) {
this.promisees = [];
this.page = page;
this.resourceType = ['image'];
this.pendingRequests = new Set();
this.finishedRequestsWithSuccess = new Set();
this.finishedRequestsWithErrors = new Set();
page.on('request', (request) => {
request.continue();
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.add(request);
this.promisees.push(
new Promise(resolve => {
request.resolver = resolve;
}),
);
}
});
page.on('requestfailed', (request) => {
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.delete(request);
this.finishedRequestsWithErrors.add(request);
if (request.resolver) {
request.resolver();
delete request.resolver;
}
}
});
page.on('requestfinished', (request) => {
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.delete(request);
this.finishedRequestsWithSuccess.add(request);
if (request.resolver) {
request.resolver();
delete request.resolver;
}
}
});
}
async waitForAllRequests() {
if (this.pendingRequestCount() === 0) {
return;
}
await Promise.all(this.promisees);
}
pendingRequestCount() {
return this.pendingRequests.size;
}
}
module.exports = PuppeteerNetworkMonitor;

For anyone still interested in the solution #danlong posted above but wants it in a more modern way, here is the TypeScript version for it:
import { HTTPRequest, Page, ResourceType } from "puppeteer";
export class PuppeteerNetworkMonitor {
page: Page;
resourceType: ResourceType[] = [];
promises: Promise<unknown>[] = [];
pendingRequests = new Set();
finishedRequestsWithSuccess = new Set();
finishedRequestsWithErrors = new Set();
constructor(page: Page, resourceType: ResourceType[]) {
this.page = page;
this.resourceType = resourceType;
this.finishedRequestsWithSuccess = new Set();
this.finishedRequestsWithErrors = new Set();
page.on(
"request",
async (
request: HTTPRequest & { resolver?: (value?: unknown) => void },
) => {
await request.continue();
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.add(request);
this.promises.push(
new Promise((resolve) => {
request.resolver = resolve;
}),
);
}
},
);
page.on(
"requestfailed",
(request: HTTPRequest & { resolver?: (value?: unknown) => void }) => {
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.delete(request);
this.finishedRequestsWithErrors.add(request);
if (request.resolver) {
request.resolver();
delete request.resolver;
}
}
},
);
page.on(
"requestfinished",
(request: HTTPRequest & { resolver?: (value?: unknown) => void }) => {
if (this.resourceType.includes(request.resourceType())) {
this.pendingRequests.delete(request);
this.finishedRequestsWithSuccess.add(request);
if (request.resolver) {
request.resolver();
delete request.resolver;
}
}
},
);
}
async waitForAllRequests() {
if (this.pendingRequestCount() === 0) {
return;
}
await Promise.all(this.promises);
}
pendingRequestCount() {
return this.pendingRequests.size;
}
}
I did change one thing, where instead of hard-coding what resource type to look for in the network requests, I am passing the resource types to look for as one of the constructor arguments. That should make this class more generic.
I've tested this code with my API that uses Puppeteer, and it works great.
For the usage of this class, it would be similar to what #danlong posted above like this:
// other necessary puppeteer code here...
const monitorNetworkRequests = new PuppeteerNetworkMonitor(page, ["image"]);
await monitorNetworkRequests.waitForAllRequests();

Related

How to run multiple browsers in Playwright concurrently in Typescript

I'm writing a script for a small poc where I run a Playwright script. The script opens a browser and page, then fills in and submits a form, and then closes. Now I want to perform this multiple times, so I've created a function which returns a promise. That's the Playwright script. Now I want to execute this function multiple times at the same time. I'm using Promise.allSettled(). Now what happens is that the first run goes accordingly, but the runs after all get the message: "browserType.launch: Timeout 180000ms exceeded"
This is the Playwright function:
import { chromium } from "playwright-chromium";
async function service(term: string, lat: number, lng: number) {
return new Promise(async (resolve, reject) => {
try {
const browser = await chromium.launch();
const context = await browser.newContext();
const page = await browser.newPage();
await page.goto("www.some-url.com", { waitUntil: "networkidle" });
resolve("Success!");
} catch (error) {
reject(error);
}
});
}
export default service;
And this is the method calling the function and thus executing the function a couple of times:
const isFulfilled = <T>(
input: PromiseSettledResult<T>
): input is PromiseFulfilledResult<T> => input.status === "fulfilled";
const isRejected = (
input: PromiseSettledResult<unknown>
): input is PromiseRejectedResult => input.status === "rejected";
const array = await constructArray();
const tasks = array.map(async (element) => {
return await playwrightScript();
});
const responses = await Promise.allSettled(tasks);
const fulfilled = getWebsitesResponses.filter(isFulfilled);
const rejected = getWebsitesResponses.filter(isRejected);
Is it possible to achieve what I want like this? And how should I tackle this?

Http requests being dropped in Chrome Extension

Summary:
I've built a chrome extension that reaches out to external API to fetch some data. Sometimes that data returns quickly, sometimes it takes 4 seconds or so. I'm often doing about 5-10 in rapid succession (this is a scraping tool).
Previously, a lot of requests were dropped because the service worker in V3 of Manifest randomly shuts down. I thought I had resolved that. Then I realized there was a race condition because local storage doesn't have a proper queue.
Current Error - Even with all these fixes, requests are still being dropped. The external API returns the correct data successfully, but it seems like the extension never gets it. Hoping someone can point me in the right direction.
Relevant code attached, I imagine it will help someone dealing with these queue and service worker issues.
Local Storage queue
let writing: Map<string, Promise<any>> = new Map();
let updateUnsynchronized = async (ks: string[], f: Function) => {
let m = await new Promise((resolve, reject) => {
chrome.storage.local.get(ks, res => {
let m = {};
for (let k of ks) {
m[k] = res[k];
}
maybeResolveLocalStorage(resolve, reject, m);
});
});
// Guaranteed to have not changed in the meantime
let updated = await new Promise((resolve, reject) => {
let updateMap = f(m);
chrome.storage.local.set(updateMap, () => {
maybeResolveLocalStorage(resolve, reject, updateMap);
});
});
console.log(ks, 'Updated', updated);
return updated;
};
export async function update(ks: string[], f: Function) {
let ret = null;
// Global lock for now
await navigator.locks.request('global-storage-lock', async lock => {
ret = await updateUnsynchronized(ks, f);
});
return ret;
}
Here's the main function
export async function appendStoredScrapes(
scrape: any,
fromHTTPResponse: boolean
) {
let updated = await update(['urlType', 'scrapes'], storage => {
const urlType = storage.urlType;
const scrapes = storage.scrapes;
const {url} = scrape;
if (fromHTTPResponse) {
// We want to make sure that the url type at time of scrape, not time of return, is used
scrapes[url] = {...scrapes[url], ...scrape};
} else {
scrapes[url] = {...scrapes[url], ...scrape, urlType};
}
return {scrapes};
});
chrome.action.setBadgeText({text: `${Object.keys(updated['scrapes']).length}`});
}
Keeping the service worker alive
let defaultKeepAliveInterval = 20000;
// To avoid GC
let channel;
// To be run in content scripts
export function contentKeepAlive(name : string) {
channel = chrome.runtime.connect({ name });
channel.onDisconnect.addListener(() => contentKeepAlive(name));
channel.onMessage.addListener(msg => { });
}
let deleteTimer = (chan : any) => {
if (chan._timer) {
clearTimeout(chan._timer);
delete chan._timer;
}
}
let backgroundForceReconnect = (chan : chrome.runtime.Port) => {
deleteTimer(chan);
chan.disconnect();
}
// To be run in background scripts
export function backgroundKeepAlive(name : string) {
chrome.runtime.onConnect.addListener(chan => {
if (chan.name === name) {
channel = chan;
channel.onMessage.addListener((msg, chan) => { });
channel.onDisconnect.addListener(deleteTimer);
channel._timer = setTimeout(backgroundForceReconnect, defaultKeepAliveInterval, channel);
}
});
}
// "Always call sendResponse() in your chrome.runtime.onMessage listener even if you don't need
// the response. This is a bug in MV3." — https://stackoverflow.com/questions/66618136/persistent-service-worker-in-chrome-extension
export function defaultSendResponse (sendResponse : Function) {
sendResponse({ farewell: 'goodbye' });
}
Relevant parts of background.ts
backgroundKeepAlive('extension-background');
let listen = async (request, sender, sendResponse) => {
try {
if (request.message === 'SEND_URL_DETAIL') {
const {url, website, urlType} = request;
await appendStoredScrapes({url}, false);
let data = await fetchPageData(url, website, urlType);
console.log(data, url, 'fetch data returned background');
await appendStoredScrapes(data, true);
defaultSendResponse(sendResponse);
} else if (request.message === 'KEEPALIVE') {
sendResponse({isAlive: true});
} else {
defaultSendResponse(sendResponse);
}
} catch (e) {
console.error('background listener error', e);
}
};
chrome.runtime.onMessage.addListener(function (request, sender, sendResponse) {
listen(request, sender, sendResponse);
});

Progressive Web App doesn't fetch my cached files

I am trying to turn my weather app into a PWA, and I want to create an offline page if the user lose the connection.
So I've managed to put the html and related ressources (like scripts or svg) into the browser's cache, but when I got offline, only the html page load, and not the other stuff...
Here is the files that are in the cache :
And here is the errors that occur in the console and in the network when I got offline :
As you see, only the KUTE.js library (that doesn't work even if apparently loaded ???) that comes from a CDN and the things imported by the CSS (I put the CSS directly in my html page) are loaded.
--- If you wonder what's the "en" file, it's because I made a translation system with Express, Ejs and cookies, and so when you go to /en or /fr in the url, it will translate the page either in english or french. ---
Finally, here is the code of my service worker :
const OFFLINE_VERSION = 1;
const CACHE_NAME = "offline";
const OFFLINE_URL = "offline.html";
const BASE = location.protocol + "//" + location.host;
const CACHED_FILES = [
"https://cdn.jsdelivr.net/npm/kute.js#2.1.2/dist/kute.min.js",
`${BASE}/src/favicon/favicon.ico`,
`${BASE}/src/favicon/android-chrome-192x192.png`,
`${BASE}/src/favicon/android-chrome-512x512.png`,
`${BASE}/src/favicon/apple-touch-icon.png`,
`${BASE}/src/favicon/favicon-16x16.png`,
`${BASE}/src/favicon/favicon-32x32.png`,
`${BASE}/src/svg/layered-waves.svg`,
`${BASE}/js/background.js`,
`${BASE}/js/animation-blob.js`
];
self.addEventListener('install', (event) => {
event.waitUntil((async() => {
const cache = await caches.open(CACHE_NAME);
await Promise.all(
[...CACHED_FILES, OFFLINE_URL].map((path) => {
return cache.add(new Request(path, {cache: "reload"}));
})
);
})());
self.skipWaiting();
});
self.addEventListener('activate', (event) => {
event.waitUntil((async () => {
if ("navigationPreload" in self.registration) {
await self.registration.navigationPreload.enable();
}
})());
self.clients.claim();
});
self.addEventListener('fetch', (event) => {
if(event.request.mode === "navigate") {
event.respondWith((async() => {
try {
const preloadResponse = await event.preloadResponse;
if(preloadResponse) {
return preloadResponse;
}
return await fetch(event.request);
} catch(e) {
const cache = await caches.open(CACHE_NAME);
return await cache.match(OFFLINE_URL);
}
})());
}
});
It's the "regular" code for creating an offline page, except that I add multiple files to the cache.
So do you know why I can't fetch my other cached files ?
Thank you in advance !
this function act as a proxy to know if it should fetch data from cache or net wetwork :
self.addEventListener('fetch', (event) => {
if(event.request.mode === "navigate") {
event.respondWith((async() => {
try {
const preloadResponse = await event.preloadResponse;
if(preloadResponse) {
return preloadResponse;
}
return await fetch(event.request);
} catch(e) {
const cache = await caches.open(CACHE_NAME);
return await cache.match(OFFLINE_URL);
}
})());
}
});
but this line if(event.request.mode === "navigate") make a special behavior only if it's a navigate request (when navigator load a new page). So you need some minor change to make it work, you can try something like this :
self.addEventListener('fetch', (event) => {
event.respondWith((async() => {
try {
const preloadResponse = await event.preloadResponse;
if(preloadResponse) {
return preloadResponse;
}
return await fetch(event.request);
} catch(e) {
const cache = await caches.open(CACHE_NAME);
return await cache.match(event.request);
}
})());
});
This will normaly make it work but now every resquest will now pass through ServiceWorker and not only navigate one

Puppeteer reload the page until some specific style changed

I want to open the web browser and continue reloading until the reloaded page has different style then move to next functions to execute unless continue reloading.
let's say i have a P tag then reload the page because display: block:
<p id="notYetStarted" style="display: block;">You need to reload the page if u can read me!</p>
but stop reloading the page because the display property of P tag is display: none; for now (and in this case instead of reloading, continue execute other codes):
<p id="notYetStarted" style="display: none;">You need to reload the page if u can read me!</p>
i tried to use Recursive function but not working:
(async () => {
try {
//init a browser tab and wait until completely loaded then go to next step
const browser = await puppeteer.launch({headless:false, args: ['--no-sandbox'] });
const page = await browser.newPage();
await page.setViewport({width:1366, height: 768})
await page.goto(url, { waitUntil: 'networkidle2' });
// wait for Recursive function to be resolve
await checkPTag(page)
// we are here because p.display:none
// continue execute other codes :)
}catch(err) {
console.log(err)
}
})();
const checkPTag = (page) => {
return new Promise(async (resolve, reject) => {
//search the dom for p tag and check it's display property
let result = await isPTagAvailable(page)
if(result === 'not started') {
//reload the page cz p.display:block
await page.reload({ waitUntil: ["networkidle0", "domcontentloaded"] })
//Recursive calling again
await checkPTag(page)
}else if(result === 'started') {
//no need reload the page cz p.none
resolve('started')
}
})
}
const isPTagAvailable = (page) => {
return new Promise (async (resolve, reject) => {
await page.waitForSelector('#body');
const pTags = await page.$$eval(
'#body',
nodes =>
nodes.map(element => {
const p = element.querySelector('p#notYetStarted');
console.log(p)
return JSON.parse(JSON.stringify(getComputedStyle(element, null).display));
} )
);
const pDisplay = pTags[0]
if(pDisplay === 'block') {
resolve('not started')
}else {
resolve('started')
}
})
}
the above code open a web browser and wait until dom completely loaded and get the display value of P tag and since it is block then reload the page so far so good but then if display value changing to none but still it is try to reload the page.
sry for long code
I think your code is just loading the same cache as the first request. So you should add some random number in the end of the URL to make sure the response isn't the same as the cache of the first response.
const puppeteer = require ('puppeteer')
const urlPage = 'http://localhost/testing/test_display_none.html'
;(async () => {
const browser = await puppeteer.launch ({
headless: false,
devtools: false
})
const [page] = await browser.pages ()
page.setDefaultNavigationTimeout(0)
const functionToExecute = async () => {
// Code to run if P tag display is none (hidden)
console.log ('P tag display = none\n Executing next defined function...')
}
const ifTagPdisplayed = async () => {
const openPage = await page.goto ( urlPage + '?r=' + Date.now() , { waitUntil: 'networkidle2', timeout: 0 } )
const elemExist = await page.waitForSelector ('#notYetStarted', { timeout: 0 })
const getDisplay = await page.evaluate ( () => document.querySelector('#notYetStarted').style.display === 'none' )
if ( !getDisplay ) {
await ifTagPdisplayed ()
} else {
await functionToExecute ()
}
}
await ifTagPdisplayed ()
})()

How to store every network request into an array given the loop behaviour of request.continue()?

I'm trying to get all the network requests when a page is accessed and store them into an array.
My code looks like this:
await page.setRequestInterceptionEnabled(true);
page.on('request', request => {
if(request.url) {
var networkRequests = request.url;
var networkArray = [];
for (var i = 0; i < networkRequests; i++) {
networkArray.push(networkRequests[i]);
}
console.log(networkArray);
console.log(typeof networkArray);
request.continue();
} else {
request.abort();
}
});
await page.goto('http://www.example.com', {waitUntil: 'networkidle'});
I find that the problem is with the request.continue(). It creates several iterations for each fetched request, and for each iteration it shows that request and returns it as string.
That means that I end up with several strings.
The problem is that I couldn't managed to insert all those strings into one array, so I can make use of them lately. I tried several for loops but didn't succeed.
A quick fix has been found in the meantime:
const puppeteer = require('puppeteer');
function extractRequests(url) {
return new Promise((resolve, reject) => {
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setRequestInterceptionEnabled(true);
let result = [];
page.on('request', request => {
if (request.url) {
var networkRequests = request.url;
result.push(networkRequests);
request.continue();
} else {
request.abort();
}
});
page.goto(url, {
waitUntil: 'networkidle'
})
.then( _=> setTimeout( _=> resolve(result), 1000));
})();
});
}
extractRequests('http://example.com').then(requests => {
console.log(requests.filter(x => x.includes('event-name') && x.includes('other-event-name')));
process.exit(0);
});

Categories