puppeteer trouble. Or, at least, javascript trouble; you decide, please - javascript

Can someone please explain what might be going wrong here:
await page.click('some-selector-that-devtools-confirms-is-definitely-there')
let peeps = await page.evaluate(() =>
{
document.querySelector('some-selector-that-devtools-confirms-is-definitely-there')
}
);
console.log('classes: '+peeps.classList)
I've tried page.wait...., to no avail, same error
Error
TypeError: Cannot read property 'classList' of undefined
Incidentally, is there a best practice for finding out if an element has a certain css class?

You have two problems here.
You don't return document.querySelector('some-selector-that-devtools-confirms-is-definitely-there') so the variable peeps will be undefined
You expect you can return any value with page.evaluate(). but acutely you can only return a serializable value, so it is not possible to return an element or NodeList back from the page environment using this method.
Example to return classlist by page.evaluate().
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://google.com", { waitUntil: "networkidle2" });
const classList = await page.evaluate(() => {
return [...document.querySelector("div").classList];
});
console.log(classList);
await browser.close();
})();

There are two main problems with your code:
Your evaluate method is not returning anything;
You need to access the classList inside the evaluate method
Here's an example:
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://stackoverflow.com/");
const classes = await page.evaluate(() => {
return document.querySelector("body").classList;
});
console.log(classes);
await browser.close();
})();
As an alternative approach, you could use getProperty("className"):
const puppeteer = require("puppeteer");
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("https://stackoverflow.com/");
const el = await page.$("body");
const className = await el.getProperty("className");
const classes = className._remoteObject.value.split(" ");
console.log(classes);
await browser.close();
})();

Related

Puppeteer getting Error: Evaluation failed: TypeError: Cannot read properties of null (reading 'innerText') when returning multiple values

Using puppeteer to get some data from a site. I need to return multiple values gotten from the site but for some reason I can only return one at a time, anytime I try returning multiple values(like the one in the code below), I get the following error: Error: Evaluation failed: TypeError: Cannot read properties of null (reading 'innerText') when returning multiple values and I can't find out why.
Code
(async () => {
try {
const chromeBrowser = await puppeterr.launch({ headless: true });
const page = await chromeBrowser.newPage();
await page.goto("https://www.sec.gov/edgar/search/#/category=form-cat2", {timeout: 0});
const getInfo = await page.evaluate(() => {
const secTableEN = document.querySelector(".table td.entity-name");
const secTableFiled = document.querySelector(".table td.entity-filed");
const secTableLink = document.querySelector(".table td.filetype");
return {
secTableEN: secTableEN.innerText,
secTableFiled: secTableFiled.innerText,
};
})
console.log(getInfo);
await page.close();
await chromeBrowser.close();
} catch (e) {
console.error(e)
}
})();
Two problems:
The page loads the data dynamically, so you should waitForSelector before querying.
.entity-filed should be .filed.
const puppeteer = require("puppeteer"); // ^19.0.0
const url = "<your URL>";
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
const $ = (...args) => page.waitForSelector(...args);
const text = async (...args) =>
(await $(...args)).evaluate(el => el.textContent.trim());
await page.goto(url, {waitUntil: "domcontentloaded"});
const info = {
secTableEN: await text(".table td.entity-name"),
secTableFiled: await text(".table td.filed"),
secTableLink: await text(".table td.filetype"),
};
console.log(info);
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
As an aside, I wouldn't use {timeout: 0}. If the page doesn't load after a minute or so, something is wrong and you should probably report an error rather than hang the script forever.
Another approach is to avoid the DOM and simply intercept the API response with the payload you're interested in:
// ... same boilerplate as above ...
browser = await puppeteer.launch();
const [page] = await browser.pages();
const resP = page.waitForResponse(res =>
res.url() === "https://efts.sec.gov/LATEST/search-index"
);
await page.goto(url, {waitUntil: "domcontentloaded"});
const res = await resP;
const data = JSON.parse(await res.text());
const hit = data.hits.hits[0]._source;
const info = {
secTableEN: hit.display_names[0],
secTableFiled: hit.file_date,
secTableLink: hit.file_type // slightly different output than from the DOM
};
console.log(info);
// ...

Having trouble scraping a particular element on a website using Puppeteer

I am trying to scrape the key features part of the website with the URL of: "https://www.alpinestars.com/products/stella-missile-v2-1-piece-suit-1" using puppeteer - however, whenever I try to use a selector that works on the chrome console for the website the output for my code is always an empty array or object. For example both document.querySelectorAll("#key\ features > p") and document.getElementById('key features') both return as empty arrays or objects when I output it through my code but work via chrome console.
I have attached my code below:
const puppeteer = require('puppeteer');
async function getDescripData(url) {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
await page.goto(url);
const descripFeatures = await page.evaluate(() => {
const tds = Array.from(document.getElementById('key features'))
console.log(tds)
return tds.map(td => td.innerText)
});
console.log(descripFeatures)
await browser.close();
return {
features: descripFeatures
}
}
How should I go about overcoming this issue?
Thanks in advance!
Your problem is in Array.from you are passing a non-iterable object and return null.
This works for me:
const puppeteer = require('puppeteer');
const url = 'https://www.alpinestars.com/products/stella-missile-v2-1-piece-suit-1';
(async () => {
const browser = await puppeteer.launch({
headless: false,
defaultViewport: null,
args: ['--start-maximized'],
devtools: true
});
const page = (await browser.pages())[0];
await page.goto(url);
const descripFeatures = await page.evaluate(() => {
const tds = document.getElementById('key features').innerText;
return tds.split('• ');
});
console.log(descripFeatures)
await browser.close();
})();

storing page to variable in puppeteer doesn't work

I'm trying to store the page result to a variable so I can use it to access other page but encountered an error "TypeError: Cannot read property 'waitForSelector' of undefined"
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.instagram.com/accounts/login/');
await page.waitForSelector('input[name="username"]');
await page.type('input[name="username"]', 'username');
await page.type('input[name="password"]', 'password');
const mainPage = await page.click('button[type="submit"]');
await mainPage.pdf({path: 'page.pdf', format: 'A4'});
mainPage.goto(https://www.instagram.com/direct/inbox/);
mainPage.waitForSelector('button[name="Send Message"]');
//some additional code
})();
page.click won't return a page. You can use waitForNavigation there.
await Promise.all([
page.waitForNavigation(),
page.click('button[type="submit"]')]);

Is puppeteer supplying real time data

I'm trying to web scrape a live scores every score change. Can puppeteer do this? If it can what should I add in this code so it returns live data.
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('site to go');
await page.waitForSelector('input[name="username"]');
await page.type('input[name="username"]', 'username');
await page.type('input[name="password"]', 'password');
await page.click('button[type="submit"]');
let score = await page.evaluate(() => document.getElementById("scores").innerHTML);
})();
You could use exposeFunction to register a callback function:
await page.exposeFunction('newScore', s => console.log(s));
Then you can call that function on the DOMSubtreeModified event:
page.evaluate(() => document.getElementById('scores')
.addEventListener('DOMSubtreeModified', () => newScore(element.innerHTML)));

Puppeteer unable to use get property

Cannot read property 'getProperty' of undefined is the error that I get.
const puppeteer = require('puppeteer');
async function scrapeUdemy(url) {
try {
const browser = await puppeteer.launch({headless: false, slowmo: 250});
const page = await browser.newPage()
await page.goto(url)
const [el] = await page.$x('//*[#id="udemy"]/div[1]/div[4]/div/div/div[2]/div/div/div[1]/a/div[1]/div[1]');
const txt = await el.getProperty('textContent');
const rawTxt = await src.jsonValue();
console.log({srcTxt});
browser.close();
}
catch(err) {
console.log(err.message);
}
}
scrapeUdemy('https://www.udemy.com/user/eren-cem-salta/')
I tried using other versions but does not work. It is not working with the catch block too.
The element that you want to get is loaded with AJAX after the page started and you have to wait until it appears in the DOM:
await page.waitForSelector('[data-purpose="course-card-container"] div.udlite-heading-sm');
And why not use the same selector to get all of the cards:
const titles = await page.evaluate(() => {
const nodes = document.querySelectorAll(
'[data-purpose="course-card-container"] div.udlite-heading-sm'
);
return [...nodes].map((node) => node.textContent);
})

Categories