Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 11 hours ago.
Improve this question
I'm trying to scrape YouTube Shorts from a specific YouTube Channel, using Puppeteer running on MeteorJs Galaxy.
Here's the code that I've done so far:
import puppeteer from 'puppeteer';
import { YouTubeShorts } from '../imports/api/youTubeShorts'; //meteor mongo local instance
let URL = 'https://www.youtube.com/#ummahtoday1513/shorts'
const processShortsData = (iteratedData) => {
let documentExist = YouTubeShorts.findOne({ videoId:iteratedData.videoId })
if(documentExist === undefined) { //undefined meaning this incoming shorts in a new one
YouTubeShorts.insert({
videoId: iteratedData.videoId,
title: iteratedData.title,
thumbnail: iteratedData.thumbnail,
height: iteratedData.height,
width: iteratedData.width
})
}
}
const fetchShorts = () => {
puppeteer.launch({
headless:true,
args:[
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--single-process'
]
})
.then( async function(browser){
async function fetchingData(){
new Promise(async function(resolve, reject){
const page = await browser.newPage();
await Promise.all([
await page.setDefaultNavigationTimeout(0),
await page.waitForNavigation({waitUntil: "domcontentloaded"}),
await page.goto(URL, {waitUntil:["domcontentloaded", "networkidle2"]}),
await page.waitForSelector('ytd-rich-grid-slim-media', { visible:true }),
new Promise(async function(resolve,reject){
page.evaluate(()=>{
const trialData = document.getElementsByTagName('ytd-rich-grid-slim-media');
const titles = Array.from(trialData).map(i => {
const singleData = {
videoId: i.data.videoId,
title: i.data.headline.simpleText,
thumbnail: i.data.thumbnail.thumbnails[0].url,
height: i.data.thumbnail.thumbnails[0].height,
width: i.data.thumbnail.thumbnails[0].width,
}
return singleData
})
resolve(titles);
})
}),
])
await page.close()
})
await browser.close()
}
async function fetchAndProcessData(){
const datum = await fetchingData()
console.log('DATUM:', datum)
}
await fetchAndProcessData()
})
}
fetchShorts();
I am struggling with two things here:
Async, await, and promises, and
Finding reason behind why Puppeteer output the ProtocolError: Protocol error (Target.createTarget): Target closed. error in the console.
I'm new to puppeteer and trying to learn from various examples on StackOverflow and Google in general, but I'm still having trouble getting it right.
A general word of advice: code slowly and test frequently, especially when you're in an unfamiliar domain. Try to minimize problems so you can understand what's failing. There are many issues here, giving the impression that the code was written in one fell swoop without incremental validation. There's no obvious entry point to debugging this.
Let's examine some failing patterns.
First, basically never use new Promise() when you're working with a promise-based API like Puppeteer. This is discussed in the canonical What is the explicit promise construction antipattern and how do I avoid it? so I'll avoid repeating the answers there.
Second, don't mix async/await and then. The point of promises is to flatten code and avoid pyramids of doom. If you find you have 5-6 deeply nested functions, you're misusing promises. In Puppeteer, there's basically no need for then.
Third, setting timeouts to infinity with page.setDefaultNavigationTimeout(0) suppresses errors. It's fine if you want a long delay, but if a navigation is taking more than a few minutes, something is wrong and you want an error so you can understand and debug it rather than having the script wait silently until you kill it, with no clear diagnostics as to what went wrong or where it failed.
Fourth, watch out for pointless calls to waitForNavigation. Code like this doesn't make much sense:
await page.waitForNavigation(...);
await page.goto(...);
What navigation are you waiting for? This seems ripe for triggering timeouts, or worse yet, infinite hangs after you've set navs to never timeout.
Fifth, avoid premature abstractions. You have various helper functions but you haven't established functionally correct code, so these just add to the confused state of affairs. Start with correctness, then add abstractions once the cut points become obvious.
Sixth, avoid Promise.all() when all of the contents of the array are sequentially awaited. In other words:
await Promise.all([
await foo(),
await bar(),
await baz(),
await quux(),
garply(),
]);
is identical to:
await foo();
await bar();
await baz();
await quux();
await garply();
Seventh, always return promises if you have them:
const fetchShorts = () => {
puppeteer.launch({
// ..
should be:
const fetchShorts = () => {
return puppeteer.launch({
// ..
This way, the caller can await the function's completion. Without it, it gets launched into the void and can never be connected with the caller's flow.
Eighth, evaluate doesn't have access to variables in Node, so this pattern doesn't work:
new Promise(resolve => {
page.evaluate(() => resolve());
});
Instead, avoid the new promise antipattern and use the promise that Puppeteer already returns to you:
await page.evaluate(() => {});
Better yet, use $$eval here since it's an abstraction of the common pattern of selecting elements first thing in evaluate.
Putting all of this together, here's a rewrite:
const puppeteer = require("puppeteer"); // ^19.6.3
const url = "<Your URL>";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("ytd-rich-grid-slim-media");
const result = await page.$$eval("ytd-rich-grid-slim-media", els =>
els.map(({data: {videoId, headline, thumbnail: {thumbnails}}}) => ({
videoId,
title: headline.simpleText,
thumbnail: thumbnails[0].url,
height: thumbnails[0].height,
width: thumbnails[0].width,
}))
);
console.log(result);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Note that I ensure browser cleanup with finally so the process doesn't hang in case the code throws.
Now, all we want is a bit of text, so there's no sense in loading much of the extra stuff YouTube downloads. You can speed up the script by blocking anything unnecessary to your goal:
const [page] = await browser.pages();
await page.setRequestInterception(true);
page.on("request", req => {
if (
req.url().startsWith("https://www.youtube.com") &&
["document", "script"].includes(req.resourceType())
) {
req.continue();
}
else {
req.abort();
}
});
// ...
Note that ["domcontentloaded", "networkidle2"] is basically the same as "networkidle2" since "domcontentloaded" will happen long before "networkidle2". But please avoid "networkidle2" here since all you need is some text, which doesn't depend on all network resources.
Once you've established correctness, if you're ready to factor this to a function, you can do so:
const fetchShorts = async () => {
const url = "<Your URL>";
let browser;
try {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("ytd-rich-grid-slim-media");
return await page.$$eval("ytd-rich-grid-slim-media", els =>
els.map(({data: {videoId, headline, thumbnail: {thumbnails}}}) => ({
videoId,
title: headline.simpleText,
thumbnail: thumbnails[0].url,
height: thumbnails[0].height,
width: thumbnails[0].width,
}))
);
}
finally {
await browser?.close();
}
};
fetchShorts()
.then(shorts => console.log(shorts))
.catch(err => console.error(err));
But keep in mind, making the function responsible for managing the browser resource hampers its reusability and slows it down considerably. I usually let the caller handle the browser and make all of my scraping helpers accept a page argument:
const fetchShorts = async page => {
const url = "<Your URL>";
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("ytd-rich-grid-slim-media");
return await page.$$eval("ytd-rich-grid-slim-media", els =>
els.map(({data: {videoId, headline, thumbnail: {thumbnails}}}) => ({
videoId,
title: headline.simpleText,
thumbnail: thumbnails[0].url,
height: thumbnails[0].height,
width: thumbnails[0].width,
}))
);
};
(async () => {
let browser;
try {
browser = await puppeteer.launch();
const [page] = await browser.pages();
console.log(await fetchShorts(page));
}
catch (err) {
console.error(err);
}
finally {
await browser?.close();
}
})();
I am trying to type into YouTube's search input using Puppeteer.
Code as follows:
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://youtube.com');
await page.type('#search','a');
...
Here is the error I get:
throw new Error('Evaluation failed: ' + (0, util_js_1.getExceptionMessage)(exceptionDetails));
^
Error: Evaluation failed: Error: Cannot focus non-HTMLElement
at pptr://__puppeteer_evaluation_script__:3:23
at ExecutionContext._ExecutionContext_evaluate (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:286:15)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async ExecutionContext.evaluate (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:117:16)
at async ElementHandle.evaluate (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/JSHandle.js:105:16)
at async ElementHandle.focus (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/ElementHandle.js:486:9)
at async ElementHandle.type (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/ElementHandle.js:516:9)
at async DOMWorld.type (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:449:9)
at async /Users/benjaminrubin/Documents/Software Dev Education/Scraping with Node JS/youtubeScrape.js:60:9
I could not figure out what exactly is wrong. Several examples across the web use the exact same format. What exactly does 'Cannot focus non-HTMLElement' mean?
This is a tricky one. Google sites are notorious for breaching the "one id on a page" rule, so there's actually two elements with the id search:
<ytd-searchbox id="search"> <!-- the one you are actually selecting -->
... bunch of nodes ...
<input id="search"> <!-- the one you think you're selecting -->
await page.type('#search','a'); types into ytd-searchbox, which isn't a standard HTML element, so Puppeteer fails with the Error: Cannot focus non-HTMLElement error.
The fix is to use input#search instead:
const puppeteer = require("puppeteer"); // ^19.1.0
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto("https://youtube.com", {waitUntil: "domcontentloaded"});
await page.type("input#search", "hello world");
await page.screenshot({path: "youtube.png"});
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Although the above solution may work, this is a good example of where simply encoding your search as a URL parameter and navigating directly to the results page is easier and more efficient:
const puppeteer = require("puppeteer");
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const q = encodeURIComponent("your search here");
const url = `https://www.youtube.com/results?search_query=${q}`;
await page.goto(url, {waitUntil: "networkidle2"});
await page.screenshot({path: "youtube.png"});
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
My code below gives the error "TypeError: response.postData is not a function".
const puppeteer = require('puppeteer-extra')
const StealthPlugin = require('puppeteer-extra-plugin-stealth')
var url = ("https://www.adidas.com.tr/yeezy/product/GY1759")
function main(){
puppeteer.use(StealthPlugin())
puppeteer.launch({ headless: false }).then(async browser => {
const page = await browser.newPage()
await page.goto(url)
await page.mouse.click(1000, 40);
await page.on('response', response => {
if (response.url() === "https://www.adidas.com.tr/rhsYl92ry/P4YX/xsLtQ/uJuJSkf31r/JGkkPDcC/QGBAD/nlbSQI"){
console.log(response.postData())
}
});
})
}
main()
I want to get the request's post data, how can I do that ?
Error = "TypeError: response.postData is not a function"
There are a couple of potential misunderstandings here:
page.on doesn't return a promise, it registers a callback that runs asynchronously upon the event firing. It's misleading to await page.on because it makes it seem like your code will wait for something, but it won't. You'd need to promisify the handler if you want your code to wait for it, or chain all code that depends on the result from the callback.
Doing click(), then on() is probably not what you want. Usually, you register the on() event listener, then trigger the event that fires the request you want to capture. Otherwise, the response might arrive before you get a chance to set up your on() listener.
Answering your main question, you can access the original request on the response to get the post data: response.request().postData(). Note that this is form data only, not a JSON payload (adding a "Content-Type": "application/json" header and proper JSON body results in undefined -- see issue #5178 which shows that adding an && response.request().method() === "POST" check avoids it triggering on the preflight OPTIONS request). You might also want to use page.on("request", ...) directly, but it's not clear what you aim to accomplish ultimately.
Since your code isn't working for me (a hard-coded coordinate click is very brittle; prefer making a selection then clicking the element that you've targeted), I'll share a minimal, runnable example that you can extrapolate to your needs:
const puppeteer = require("puppeteer"); // ^13.5.1
const html = `
<body>
<script>
fetch("https://httpbin.org/post", {
method: "POST",
body: "foo=bar",
});
</script>
</body>
`;
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
const responsePromise = new Promise(resolve => {
const handler = response => {
if (response.url() === "https://httpbin.org/post") {
page.off(handler);
resolve(response.request().postData());
}
};
page.on("response", handler);
});
await page.setContent(html);
const postData = await responsePromise;
console.log(postData); // => foo=bar
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Note that all of this is roundabout and mainly for illustrative purposes. A cleaner way is to use page.waitForResponse which doesn't require promisification like on() does:
// same html and require as above
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
const responsePromise = page.waitForResponse(response =>
response.url() === "https://httpbin.org/post"
);
await page.setContent(html);
const response = await responsePromise;
console.log(response.request().postData()); // => foo=bar
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Reading into the Response type definition for Puppeteer, it looks like you have other methods you can use.
Either await response.json() or (await response.buffer()).toString()
I'm playing around with puppeteer to learn a bit about automation in the browser. I wanted to open the chromium browser visable so not in headless. I set the launch option to false, but it's still not opening Chromium.
I tried to use no sandbox args, i did even deflag the --disable-extensions in the args, but nothing helped..
There are no errors in the terminal, it just doesn't launch.
Here is my code:
const puppeteer = require ("puppeteer");
async () => {
const browser = await puppeteer.launch({ headless: false });
const page = browser.newPage();
await page.goto("https://google.de");
await browser.close();
};
Any idea why chromium is not opening? Also there are no logs about errors...
Problem
You are not calling the function, you are just defining it via async () => { ... }. This is why you are not getting any errors, as the function is not executed. In addition, as the other answer already said, you are missing an await.
Solution
Your code should look like this:
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage(); // missing await
await page.goto("https://google.de");
await browser.close();
})(); // Here, we actually call the function
newPage() returns a promise so you should await it
const puppeteer = require ("puppeteer");
async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto("https://google.de");
await browser.close();
};
I am using Puppeteer headless browser and doing WebRTC call. At the end of call I want to know statistics like bandwidth, Jitter, ICE details etc.
So far what I have been able to collect from google search is we can get the stats data using getStats api.
But in a puppeteer script how I can call getStats api, I could not find any example.
My code looks as below.
const puppeteer = require('puppeteer');
const sleep = (waitTimeInMs) => new Promise(resolve => setTimeout(resolve, waitTimeInMs));
(async () => {
const browser = await puppeteer.launch({headless: false});
const page = await browser.newPage();
await page.goto('https://janus.conf.meetecho.com/videocalltest.html');
await page.waitForSelector('#start');
await page.click('[id=start]');
await page.waitForSelector('#username', { visible: true });
await page.type('input[id="username"]', 'user1');
await page.click('button[id=register]');
await page.waitFor(5000);
await page.type('input[id=peer]', 'user0');
await page.click('button[id=call]');
await sleep(16000);
await page.click('button[id=start]');
await sleep(3000);
await browser.close();
})();
Just before browser.close(), I want to know stats data. Can you please help me to understand, how can I make use of getStats api in this context to get the stats data.
Is there any better way to get stats data then getsStats api?
You can use evaluate to get the WebRTC stats:
result = await page.evaluate(async () => await videocall.webrtcStuff.pc.getStats());
console.log(result);