Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 11 hours ago.
Improve this question
I'm trying to scrape YouTube Shorts from a specific YouTube Channel, using Puppeteer running on MeteorJs Galaxy.
Here's the code that I've done so far:
import puppeteer from 'puppeteer';
import { YouTubeShorts } from '../imports/api/youTubeShorts'; //meteor mongo local instance
let URL = 'https://www.youtube.com/#ummahtoday1513/shorts'
const processShortsData = (iteratedData) => {
let documentExist = YouTubeShorts.findOne({ videoId:iteratedData.videoId })
if(documentExist === undefined) { //undefined meaning this incoming shorts in a new one
YouTubeShorts.insert({
videoId: iteratedData.videoId,
title: iteratedData.title,
thumbnail: iteratedData.thumbnail,
height: iteratedData.height,
width: iteratedData.width
})
}
}
const fetchShorts = () => {
puppeteer.launch({
headless:true,
args:[
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--single-process'
]
})
.then( async function(browser){
async function fetchingData(){
new Promise(async function(resolve, reject){
const page = await browser.newPage();
await Promise.all([
await page.setDefaultNavigationTimeout(0),
await page.waitForNavigation({waitUntil: "domcontentloaded"}),
await page.goto(URL, {waitUntil:["domcontentloaded", "networkidle2"]}),
await page.waitForSelector('ytd-rich-grid-slim-media', { visible:true }),
new Promise(async function(resolve,reject){
page.evaluate(()=>{
const trialData = document.getElementsByTagName('ytd-rich-grid-slim-media');
const titles = Array.from(trialData).map(i => {
const singleData = {
videoId: i.data.videoId,
title: i.data.headline.simpleText,
thumbnail: i.data.thumbnail.thumbnails[0].url,
height: i.data.thumbnail.thumbnails[0].height,
width: i.data.thumbnail.thumbnails[0].width,
}
return singleData
})
resolve(titles);
})
}),
])
await page.close()
})
await browser.close()
}
async function fetchAndProcessData(){
const datum = await fetchingData()
console.log('DATUM:', datum)
}
await fetchAndProcessData()
})
}
fetchShorts();
I am struggling with two things here:
Async, await, and promises, and
Finding reason behind why Puppeteer output the ProtocolError: Protocol error (Target.createTarget): Target closed. error in the console.
I'm new to puppeteer and trying to learn from various examples on StackOverflow and Google in general, but I'm still having trouble getting it right.
A general word of advice: code slowly and test frequently, especially when you're in an unfamiliar domain. Try to minimize problems so you can understand what's failing. There are many issues here, giving the impression that the code was written in one fell swoop without incremental validation. There's no obvious entry point to debugging this.
Let's examine some failing patterns.
First, basically never use new Promise() when you're working with a promise-based API like Puppeteer. This is discussed in the canonical What is the explicit promise construction antipattern and how do I avoid it? so I'll avoid repeating the answers there.
Second, don't mix async/await and then. The point of promises is to flatten code and avoid pyramids of doom. If you find you have 5-6 deeply nested functions, you're misusing promises. In Puppeteer, there's basically no need for then.
Third, setting timeouts to infinity with page.setDefaultNavigationTimeout(0) suppresses errors. It's fine if you want a long delay, but if a navigation is taking more than a few minutes, something is wrong and you want an error so you can understand and debug it rather than having the script wait silently until you kill it, with no clear diagnostics as to what went wrong or where it failed.
Fourth, watch out for pointless calls to waitForNavigation. Code like this doesn't make much sense:
await page.waitForNavigation(...);
await page.goto(...);
What navigation are you waiting for? This seems ripe for triggering timeouts, or worse yet, infinite hangs after you've set navs to never timeout.
Fifth, avoid premature abstractions. You have various helper functions but you haven't established functionally correct code, so these just add to the confused state of affairs. Start with correctness, then add abstractions once the cut points become obvious.
Sixth, avoid Promise.all() when all of the contents of the array are sequentially awaited. In other words:
await Promise.all([
await foo(),
await bar(),
await baz(),
await quux(),
garply(),
]);
is identical to:
await foo();
await bar();
await baz();
await quux();
await garply();
Seventh, always return promises if you have them:
const fetchShorts = () => {
puppeteer.launch({
// ..
should be:
const fetchShorts = () => {
return puppeteer.launch({
// ..
This way, the caller can await the function's completion. Without it, it gets launched into the void and can never be connected with the caller's flow.
Eighth, evaluate doesn't have access to variables in Node, so this pattern doesn't work:
new Promise(resolve => {
page.evaluate(() => resolve());
});
Instead, avoid the new promise antipattern and use the promise that Puppeteer already returns to you:
await page.evaluate(() => {});
Better yet, use $$eval here since it's an abstraction of the common pattern of selecting elements first thing in evaluate.
Putting all of this together, here's a rewrite:
const puppeteer = require("puppeteer"); // ^19.6.3
const url = "<Your URL>";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("ytd-rich-grid-slim-media");
const result = await page.$$eval("ytd-rich-grid-slim-media", els =>
els.map(({data: {videoId, headline, thumbnail: {thumbnails}}}) => ({
videoId,
title: headline.simpleText,
thumbnail: thumbnails[0].url,
height: thumbnails[0].height,
width: thumbnails[0].width,
}))
);
console.log(result);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Note that I ensure browser cleanup with finally so the process doesn't hang in case the code throws.
Now, all we want is a bit of text, so there's no sense in loading much of the extra stuff YouTube downloads. You can speed up the script by blocking anything unnecessary to your goal:
const [page] = await browser.pages();
await page.setRequestInterception(true);
page.on("request", req => {
if (
req.url().startsWith("https://www.youtube.com") &&
["document", "script"].includes(req.resourceType())
) {
req.continue();
}
else {
req.abort();
}
});
// ...
Note that ["domcontentloaded", "networkidle2"] is basically the same as "networkidle2" since "domcontentloaded" will happen long before "networkidle2". But please avoid "networkidle2" here since all you need is some text, which doesn't depend on all network resources.
Once you've established correctness, if you're ready to factor this to a function, you can do so:
const fetchShorts = async () => {
const url = "<Your URL>";
let browser;
try {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("ytd-rich-grid-slim-media");
return await page.$$eval("ytd-rich-grid-slim-media", els =>
els.map(({data: {videoId, headline, thumbnail: {thumbnails}}}) => ({
videoId,
title: headline.simpleText,
thumbnail: thumbnails[0].url,
height: thumbnails[0].height,
width: thumbnails[0].width,
}))
);
}
finally {
await browser?.close();
}
};
fetchShorts()
.then(shorts => console.log(shorts))
.catch(err => console.error(err));
But keep in mind, making the function responsible for managing the browser resource hampers its reusability and slows it down considerably. I usually let the caller handle the browser and make all of my scraping helpers accept a page argument:
const fetchShorts = async page => {
const url = "<Your URL>";
await page.goto(url, {waitUntil: "domcontentloaded"});
await page.waitForSelector("ytd-rich-grid-slim-media");
return await page.$$eval("ytd-rich-grid-slim-media", els =>
els.map(({data: {videoId, headline, thumbnail: {thumbnails}}}) => ({
videoId,
title: headline.simpleText,
thumbnail: thumbnails[0].url,
height: thumbnails[0].height,
width: thumbnails[0].width,
}))
);
};
(async () => {
let browser;
try {
browser = await puppeteer.launch();
const [page] = await browser.pages();
console.log(await fetchShorts(page));
}
catch (err) {
console.error(err);
}
finally {
await browser?.close();
}
})();
I am trying to type into YouTube's search input using Puppeteer.
Code as follows:
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://youtube.com');
await page.type('#search','a');
...
Here is the error I get:
throw new Error('Evaluation failed: ' + (0, util_js_1.getExceptionMessage)(exceptionDetails));
^
Error: Evaluation failed: Error: Cannot focus non-HTMLElement
at pptr://__puppeteer_evaluation_script__:3:23
at ExecutionContext._ExecutionContext_evaluate (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:286:15)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async ExecutionContext.evaluate (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:117:16)
at async ElementHandle.evaluate (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/JSHandle.js:105:16)
at async ElementHandle.focus (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/ElementHandle.js:486:9)
at async ElementHandle.type (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/ElementHandle.js:516:9)
at async DOMWorld.type (/Users/benjaminrubin/node_modules/puppeteer/lib/cjs/puppeteer/common/DOMWorld.js:449:9)
at async /Users/benjaminrubin/Documents/Software Dev Education/Scraping with Node JS/youtubeScrape.js:60:9
I could not figure out what exactly is wrong. Several examples across the web use the exact same format. What exactly does 'Cannot focus non-HTMLElement' mean?
This is a tricky one. Google sites are notorious for breaching the "one id on a page" rule, so there's actually two elements with the id search:
<ytd-searchbox id="search"> <!-- the one you are actually selecting -->
... bunch of nodes ...
<input id="search"> <!-- the one you think you're selecting -->
await page.type('#search','a'); types into ytd-searchbox, which isn't a standard HTML element, so Puppeteer fails with the Error: Cannot focus non-HTMLElement error.
The fix is to use input#search instead:
const puppeteer = require("puppeteer"); // ^19.1.0
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.goto("https://youtube.com", {waitUntil: "domcontentloaded"});
await page.type("input#search", "hello world");
await page.screenshot({path: "youtube.png"});
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
Although the above solution may work, this is a good example of where simply encoding your search as a URL parameter and navigating directly to the results page is easier and more efficient:
const puppeteer = require("puppeteer");
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const q = encodeURIComponent("your search here");
const url = `https://www.youtube.com/results?search_query=${q}`;
await page.goto(url, {waitUntil: "networkidle2"});
await page.screenshot({path: "youtube.png"});
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
I have a simple piece of code
describe('My First Puppeeteer Test', () => {
it('Should launch the browser', async function() {
const browser = await puppeteer.launch({ headless: false})
const page = await browser.newPage()
await page.goto('https://github.com/login')
await page.type('#login_field', testLogin)
await page.type('#password', testPassword)
await page.click('[name="commit"]')
await page.waitForNavigation()
let [element] = await page.$x('//h3[#class="text-normal"]')
let helloText = await page.evaluate(element => element.textContent, element);
console.log(helloText);
browser.close();
})
})
Everything worked before but today I get an error + my stacktrace:
Error: Evaluation failed: TypeError: Cannot read properties of undefined (reading 'textContent')
at puppeteer_evaluation_script:1:21
at ExecutionContext._evaluateInternal (node_modules\puppeteer\lib\cjs\puppeteer\common\ExecutionContext.js:221:19)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at async ExecutionContext.evaluate (node_modules\puppeteer\lib\cjs\puppeteer\common\ExecutionContext.js:110:16)
at async Context. (tests\example.tests.js:16:22)
How I can resolve this?
Kind regards
While I haven't tested the code due to the login and I assume your selectors are correct, the main problem is almost certainly that
await page.click('[name="commit"]')
await page.waitForNavigation()
creates a race condition. The docs clarify:
Bear in mind that if click() triggers a navigation event and there's a separate page.waitForNavigation() promise to be resolved, you may end up with a race condition that yields unexpected results. The correct pattern for click and wait for navigation is the following:
const [response] = await Promise.all([
page.waitForNavigation(waitOptions),
page.click(selector, clickOptions),
]);
As a side point, it's probably better to do waitForXPath rather than $x, although this seems less likely the root problem. Don't forget to await all promises such as browser.close().
const puppeteer = require("puppeteer");
let browser;
(async () => {
browser = await puppeteer.launch({headless: true});
const [page] = await browser.pages();
await page.goto('https://github.com/login');
await page.type('#login_field', testLogin);
await page.type('#password', testPassword);
// vvvvvvvvvvv
await Promise.all([
page.click('[name="commit"]'),
page.waitForNavigation(),
]);
const el = await page.waitForXPath('//h3[#class="text-normal"]');
// ^^^^^^^^^^^^
//const el = await page.waitForSelector("h3.text-normal"); // ..or
const text = await el.evaluate(el => el.textContent);
console.log(text);
//await browser.close();
//^^^^^ missing await, or use finally as below
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
Additionally, if you're using Jest, once you get things working, you might want to move the browser and page management to beforeEach/afterEach or beforeAll/afterAll blocks. It's faster to use the same browser instance for all test cases, and pages can be opened and closed before/after each case.
I'm playing around with puppeteer to learn a bit about automation in the browser. I wanted to open the chromium browser visable so not in headless. I set the launch option to false, but it's still not opening Chromium.
I tried to use no sandbox args, i did even deflag the --disable-extensions in the args, but nothing helped..
There are no errors in the terminal, it just doesn't launch.
Here is my code:
const puppeteer = require ("puppeteer");
async () => {
const browser = await puppeteer.launch({ headless: false });
const page = browser.newPage();
await page.goto("https://google.de");
await browser.close();
};
Any idea why chromium is not opening? Also there are no logs about errors...
Problem
You are not calling the function, you are just defining it via async () => { ... }. This is why you are not getting any errors, as the function is not executed. In addition, as the other answer already said, you are missing an await.
Solution
Your code should look like this:
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage(); // missing await
await page.goto("https://google.de");
await browser.close();
})(); // Here, we actually call the function
newPage() returns a promise so you should await it
const puppeteer = require ("puppeteer");
async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto("https://google.de");
await browser.close();
};
New to JavaScript and trying to understand how to run the following simple test, which loads the google home page, and gets the title. This title is then tested.
const puppeteer = require("puppeteer");
var page_title = "blank";
assert = require("assert");
async function run() {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto("http://www.google.co.uk");
page_title = await page.title();
console.log("Page Title: ", page_title)
await browser.close();
}
run();
describe("Google", function() {
it("Title contains Google", async function() {
assert.equal(page_title, "Google");
});
});
The issue is the describe/it block runs before the page_title is obtained. Please could someone advise how I should actually be structuring this?
You just need read the mocha documentation. No need to digging deeper, async code located on the TOC.
mocha offer 3 ways:
callback
Simply invoke the callback when your test is complete. By adding a callback (usually named done) to it().
promise
async and await
So it revised like this with async and await :
const puppeteer = require("puppeteer");
var page_title = "blank";
assert = require("assert");
describe("Google", function() {
// this.timeout(0);
it("Title contains Google", async ()=> {
const browser = await puppeteer.launch(); //headless by default
const page = await browser.newPage();
await page.goto("http://www.google.co.uk");
page_title = await page.title();
console.log("Page Title: ", page_title);
assert.equal(page_title, "Google");
await browser.close()
});
});
My advice is quick reading on every explanation on TOC, and read brief explanation async and await