This question already has answers here:
Chrome stalls when making multiple requests to same resource?
(3 answers)
Closed last month.
The fetch request in javascript:
async () => {
try {
const urls = [
"http://localhost:3001/foo",
"http://localhost:3001/foo",
"http://localhost:3001/foo",
"http://localhost:3001/foo",
];
const requests = urls.map((url) => fetch(url));
const responses = await Promise.all(requests);
const errors = responses.filter((response) => !response.ok);
}
}
I want the request to fire in parallel. But in dev tools in chrome this is shown:
Now when i disable the cache in devtools, it does work in parallel:
Confused on how to make it work without disabling the cache in devtools.
Thanks
Ive found the answer in this post:
Chrome stalls when making multiple requests to same resource?
Apparently browsers try to cache requests to the same endpoint.
Related
Here's the scoop.
I'm trying to use Puppeteer v18.0.5 with the bundled chromium browser against a specific website. I'm using Node v16.16.0 However, when I enable request interception via page.setRequestInterception(true), all of the HTTPRequests for any image resources are lost. My handler is invoked far less while intercepting than when not intercepting. The page never fires any requests for images. But when I disable the interception, the page loads normally. Yes, I know about invoking continue() on all requests. I'm currently doing that in the request handler on the page.
I've also poured over the Puppeteer issues pages and have found similar symptoms on some of the earlier Puppeteer versions, but they were all different issues that have all been resolved since those early versions. This seems unique.
I've looked through Puppeteer source code as well as CDP events to try and find any explanation, but have found none.
As an important note for anyone trying to reproduce this, you must be proxied through a server in the London general area in order to successfully load this site.
Here's my code to reproduce:
const puppeteer = require('puppeteer');
(async () => {
const options = {
browserWidth: 1366,
browserHeight: 983,
intercepting: false
};
const browser = await puppeteer.launch(
{
args: [`--window-size=${options.browserWidth},${options.browserHeight}`],
defaultViewport: {width: options.browserWidth, height: options.browserHeight},
headless: false
}
);
const page = (await browser.pages())[0];
page.on('request', async (request) => {
console.log(`Request: ${request.method()} | ${request.url()} | ${request.resourceType()} | ${request._requestId}`);
if (options.intercepting) await request.continue();
});
await page.setRequestInterception(options.intercepting);
await page.goto('https://vegas.williamhill.com', {waitUntil: 'networkidle2', timeout: 65000});
// To give a moment to view the page in headful mode before closing browser.
await new Promise(resolve => setTimeout(resolve, 5000));
await browser.close();
})();
Here's what the page looks like with intercepting disabled:
Expected Page Load
Here's what the page looks like with intercepting enabled and continuing all requests.
Page load while intercepting and continuing all requests
With request interception disabled my handler is invoked for 104 different requests. But with the interception enabled it's only invoked 22 times. I'm not hitting a navigation timeout as the .goto() method returns before my timeout each time.
Any insight into what configuration/strategy I'm missing would be immensely appreciated.
Maybe you are incepting some javascript files that initiate the requests that you are not seeing?
i would like to understand DOS (denial-of-service) attacks better and I would like to know what my options are for learning about it with an example.
i have a basic express server.
app.get('/ping', (req, res) => {
res.send({ pong: 'pong', time: new Date().valueOf(), memory: process.memoryUsage()})
})
I will separately create some javascript code that with make multiple requests to the server. but I don't know to devise strategies to try and bring down the server (consider that this is all running on localhost)
I want to see what the upper limit of making requests is possible when locally testing this. i am experiencing what is described here: Sending thousands of fetch requests crashes the browser. Out of memory
... the suggestions on that thread are more along the lines of "browser running out of memory" and that I should "throttle requests".... but I am actively trying to max out the requests the browser can make without crashing. so far my observations are that the server does not have any difficulty. (so maybe I should also make requests from my phone and tablet?)
the code have run on the browser isn't much more than:
const makeRequestAndAlogTime = () => {
const startTime = new Date().valueOf();
fetch('http://localhost:4000/ping')
.then(async (response) => {
const { time, memory } = await response.json();
console.log({
startTime: 0,
processTime: time - startTime,
endTime: new Date().valueOf() - startTime,
serverMemory: memory,
browserMemory: performance['memory']
})
})
}
for(let x = 0; x < 100; x++) {
makeRequestAndAlogTime()
}
but depending on what value I put in for the number of times to go through the for loop, performance is slower and eventually crashes (as expected)... but I want to know if there is a way I could automate determining the upper limit of requests that I can make on my browsers?
There's a very common problem I have seen from many people who use different versions of their site for mobile and desktop, many themes have this feature. The issue is Cloudflare caches the same page regardless of the user device causing mixes and inconsistencies between desktop and mobile versions.
The most common solution is to separate the mobile version into another URL, but in my case, I want to use the same URL and make Cloudflare cache work for both desktop and mobile properly.
I found this very nice guide showing how to fix this issue, however, the worker code seems to be outdated, I had to modify some parts to make it work.
I created a new subdomain for my workers and then assigned the route to my site so it starts running.
The worker is caching everything, however, it does not have the desired feature of having different cached versions according to the device.
async function run(event) {
const { request } = event;
const cache = caches.default;
// Read the user agent of the request
const ua = request.headers.get('user-agent');
let uaValue;
if (ua.match(/mobile/i)) {
uaValue = 'mobile';
} else {
uaValue = 'desktop';
}
console.log(uaValue);
// Construct a new response object which distinguishes the cache key by device
// type.
const url = new URL(request.url);
url.searchParams.set('ua', uaValue);
const newRequest = new Request(url, request);
let response = await cache.match(newRequest);
if (!response) {
// Use the original request object when fetching the response from the
// server to avoid passing on the query parameters to our backend.
response = await fetch(request, { cf: { cacheTtl: 14400 } });
// Store the cached response with our extended query parameters.
event.waitUntil(cache.put(newRequest, response.clone()));
}
return response;
}
addEventListener('fetch', (event) => {
event.respondWith(run(event));
});
it is indeed detecting the right user agent, but it should be having two separate cache versions according to the assigned query string...
I think maybe I'm missing some configuration, I don't know why it's not working as expected. As it is right now I still get mixed my mobile and desktop cache versions.
The problem here is that fetch() itself already does normal caching, independent of your use of the Cache API around it. So fetch() might still return a cached response that is for the wrong UA.
If you could make your back-end ignore the query parameter, then you could include the query in the request passed to fetch(), so that it correctly caches the two results differently. (Enterprise customers can use custom cache keys as a way to accomplish this without changing the URL.)
If you do that, then you can also remove the cache.match() and cache.put() calls since fetch() itself will handle caching.
Using Node.js, Chrome and puppeteer as headless on ubuntu server, I'm scraping a few different websites. One of the occasional task is to interact with the loaded page (click on a link to open another page and then possibly do another click to accept the terms and such).
I can do all this just fine, but I'm trying to understand how it will work if I have multiple pages open simultaneously and am trying to interact with different loaded pages at the same time (overlapping times).
To visualize this, I'm thinking how a user will do the same job. They'll have to open multiple browser windows, open the page and switch between them to see and then click on links.
But using puppeteer, we have separate browser object, we don't need to see the window or page to know where to click. We can traverse it through the browser object and then do a click on desired element without looking (headless).
I'm thinking I should be able to do multiple pages at the same time as long as I have CPU and memory available to handle them.
Does anyone have any experience with puppeteer interacting with multiple websites simultaneously? Anything I need to watch out for?
This is the problem the library puppeteer-cluster (I'm the author) is addressing. It allows you to build a pool of pages (or browsers) to use and run tasks inside.
You find several general code samples in the repository (and also on stackoverflow). Let me address your specific use case of running different tasks with an example.
Code Sample
The following code creates two tasks:
crawl: Opens the page and extracts an URL to then start the second task
screenshot: Takes a screenshot of the extracted URL
The process is started by queuing the crawl task with the URLs.
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({ // use four pages in parallel
concurrency: Cluster.CONCURRENCY_PAGE,
maxConcurrency: 4,
});
// We define two tasks
const crawl = async ({ page, data: url }) => {
await page.goto(url);
const extractedURL = /* ... */; // extract an URL (or multiple) from the document somehow
cluster.queue(extractedURL, screenshot);
};
const screenshot = async ({ page, data: url }) => {
await page.goto(url);
await page.screenshot();
};
// Crawl some pages
cluster.queue('https://www.google.com/', crawl);
cluster.queue('https://github.com/', crawl);
// Wait until everything is done and close the cluster
await cluster.idle();
await cluster.close();
})();
This is a minimal example. I left out error handling, monitoring and the setup options.
I can usually get 5 or so browsers going on a 4GB server, if you're just popping urls off a queue it's pretty straightforward:
const puppeteer = require('puppeteer');
let queue = [
'http://www.amazon.com',
'http://www.google.com',
'http://www.fabebook.com',
'http://www.reddit.com',
]
const doQueue = async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
let url
while(url = queue.shift()){
await page.goto(url)
console.log(await page.title())
}
await browser.close()
}
[1,2,3].map(() => doQueue())
I am trying to upload a file using puppeteer and browserWSEndpoint, the error message I am getting is
"Uncaught (in promise) Error: File chooser handling does not work with multiple connections to the same page".
Here is my code:
const puppeteer = require('puppeteer');
async function getTest() {
const browser = await puppeteer.connect({
browserWSEndpoint: 'wss://chrome.browserless.io'
});
const page = (await browser.pages())[0];
await page.goto('https://someWebSite');
//DO STUFF
console.log("before upload"); //code runs until here
const [fileChooser] = await Promise.all([page.waitForFileChooser(),page.click('#uploadTrigger'),]);
await fileChooser.accept(['C:\\myProgram\\pic.jpg']);
await page.click('#edit-submit');
}
getTest().then(console.log);
I must mention that if I don't use browserWSEndpoint, and use this code at the beginning instead, everything works fine.
const browser = await puppeteer.launch({headless: false, defaultViewport:null});
Honnestly I am pretty lost with browserWSEndpoint, I used info from this post How to run Puppeteer code in any web browser?
which led me to browserless.io, copied the code and it works.
Now this is my precise question, my error indicates does not work with multiple connections to the same page. How exactly am I connecting with multiple connections? Maybe I can resolve this issue and then I could use const [fileChooser].
My main issue is that I need to upload a file, using browserless
Others seem to have the same problem according to https://github.com/GoogleChrome/puppeteer/issues/4783, but using chromuim is not an option if I want to use browserless
If you are the only client connected to that browser you must be connected to a browser that doesn't support the fileChooser. You should connect to a Chromium 77.0.3844.0 (r674921) or higher.