I'm using the following in puppeteer to try return the number in the inner text of the below element. I've tried many different ways but keep getting an empty object returned, what am I doing wrong?!
if (await page.$('.s-pagination-item.s-pagination-next.s-pagination-button.s-pagination-separator') !== null) {
var lastPageNumber = await page.evaluate(() => document.querySelector('s-pagination-item.s-pagination-disabled'), a => a.innerText);
} else {
var lastPageNumber = 1;
}
To get the innerText value of a given object you would need to do it like this:
var lastPageNumber = await page.evaluate(
() => document.querySelector('s-pagination-item.s-pagination-disabled').innerText
);
so I'm having this issue trying to scrape a web-table. Im able to extract tablenodes by using the 'firstChild' and 'lastElementChild' as a single child node. My problem here is that i want to extract all the childnodes(rows/cells) in map or array so i can iterate and extract data in a loop.
NOTE: im using puppeteer therefore ASYNC function
here is a code-snippet:
const [table] = await page.$x(xpath);
const tbody = await table.getProperty('lastElementChild'); //<-- in this case tbody is lastchild
const rows = Array.from(await tbody.getProperties('childNodes')); // <-- LINE OF THE PROBLEM
const cell = await rows.getProperty('firstChild') // <-- using firstChild for testing (ideally 'childNodes' with forEach())
const data = await cell.getProperty('innerText');
const txt = await data.jsonValue();
console.log(txt);
i found another way...
here is the solution:
const row = await page.evaluate(() => {
let row = document.querySelector('.fluid-table__row'); //<-- this refers to a HTML class
let cells = [];
row.childNodes.forEach(function(cell){
cells.push(cell.textContent)
})
return cells;
})
console.log(row);
I am trying to get the element of day 18, and check if it has disabled on its class.
<div class="react-datepicker__day react-datepicker__day--tue" aria-label="day-16" role="option">16</div>
<div class="react-datepicker__day react-datepicker__day--wed react-datepicker__day--today" aria-label="day-17" role="option">17</div>
<div class="react-datepicker__day react-datepicker__day--thu react-datepicker__day--disabled" aria-label="day-18" role="option">18</div>
this is my code, assume
this.xpath = 'xpath=.//*[contains(#class, "react-datepicker__day") and not (contains(#class, "outside-month")) and ./text()="18"]'
async isDateAvailable () {
const dayElt = await this.page.$(this.xpath)
console.log(dayElt.classList.contains('disabled'))) \\this should return true
I can't seem to make it work. Error says TypeError: Cannot read property 'contains' of undefined. Can you help point what I am doing wrong here?
Looks like you can just write
await expect(page.locator('.selector-name')).toHaveClass(/target-class/)
/target-class/ - slashes is required because it's RegExp
For check few classes by one a call I use this helper (It's because api way doesn't work for me https://playwright.dev/docs/test-assertions#locator-assertions-to-have-class):
async function expectHaveClasses(locator: Locator, className: string) {
// get current classes of element
const attrClass = await locator.getAttribute('class')
const elementClasses: string[] = attrClass ? attrClass.split(' ') : []
const targetClasses: string[] = className.split(' ')
// Every class should be present in the current class list
const isValid = targetClasses.every(classItem => elementClasses.includes(classItem))
expect(isValid).toBeTruthy()
}
In className you can write few classes separated by space:
const result = await expectHaveClasses(page.locator('.item'), 'class-a class-b')
You have to evaluate it inside the browser. $ will return an ElementHandle which is a wrapper around the browser DOM element, so you have to use e.g. evaluate then on it. Or simply $eval which will lookup the element, pass it into a callback which gets executed inside the browsers JavaScript engine. This means something like that would work:
// #ts-check
const playwright = require("playwright");
(async () => {
const browser = await playwright.chromium.launch();
const context = await browser.newContext();
const page = await context.newPage();
await page.setContent(`
<div id="a1" class="foo"></div>
`)
console.log(
await page.$eval("#a1", el => el.classList.contains("foo1"))
)
await browser.close();
})();
I am trying to scrape a one-page website. There are multiple selection combinations that would result in different search redirects. I wrote a for loop in the page.evaluate's call back function to click the different selections and did the click search in every button. However, I got error: Converting circular structure to JSON Are you passing a nested JSHandle?
Please help!
My current version of code looks like this:
const res = await page.evaluate(async (i, courseCountArr, page) => {
for (let j = 1; j < courseCountArr[i]; j++) {
await document.querySelectorAll('.btn-group > button, .bootstrap-select > button')['1'].click() // click on school drop down
await document.querySelectorAll('div.bs-container > div.dropdown-menu > ul > li > a')[`${j}`].click() // click on each school option
await document.querySelectorAll('.btn-group > button, .bootstrap-select > button')['2'].click() // click on subject drop down
const subjectLen = document.querySelectorAll('div.bs-container > div.dropdown-menu > ul > li > a').length // length of the subject drop down
for (let k = 1; k < subjectLen; k++) {
await document.querySelectorAll('div.bs-container > div.dropdown-menu > ul > li > a')[`${k}`].click() // click on each subject option
document.getElementById('buttonSearch').click() //click on search button
page.waitForSelector('.strong, .section-body')
return document.querySelectorAll('.strong, .section-body').length
}
}
}, i, courseCountArr, page);
Why the error happens
While you haven't shown enough code to reproduce the problem (is courseCountArr an array of ElementHandles? Passing page to evaluate won't work either, that's a Node object), here's a minimal reproduction that shows the likely pattern:
const puppeteer = require("puppeteer");
let browser;
(async () => {
const html = `<ul><li>a</li><li>b</li><li>c</li></ul>`;
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
// ...
const nestedHandle = await page.$$("li"); // $$ selects all matches
await page.evaluate(els => {}, nestedHandle); // throws
// ...
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
The output is
TypeError: Converting circular structure to JSON
--> starting at object with constructor 'BrowserContext'
| property '_browser' -> object with constructor 'Browser'
--- property '_defaultContext' closes the circle Are you passing a nested JSHandle?
at JSON.stringify (<anonymous>)
Why is this happening? All code inside of the callback to page.evaluate (and family: evaluateHandle, $eval, $$eval) is executed inside the browser console programmatically by Puppeteer. The browser console is a distinct environment from Node, where Puppeteer and the ElementHandles live. To bridge the inter-process gap, the callback to evaluate, parameters and return value are serialized and deserialized.
The consequence of this is that you can't access any Node state like you're attempting with page.waitForSelector('.strong, .section-body') inside the browser. page is in a totally different process from the browser. (As an aside, document.querySelectorAll is purely synchronous, so there's no point in awaiting it.)
Puppeteer ElementHandles are complex structures used to hook into the page's DOM that can't be serialized and passed to the page as you're trying to do. Puppeteer has to perform the translation under the hood. Any ElementHandles passed to evaluate (or have .evaluate() called on them) are followed to the DOM node in the browser that they represent, and that DOM node is what your evaluate's callback is invoked with. Puppeteer can't do this with nested ElementHandles, as of the time of writing.
Possible fixes
In the above code, if you change .$$ to .$, you'll retrieve only the first <li>. This singular, non-nested ElementHandle can be converted to an element:
// ...
const handle = await page.$("li");
const val = await page.evaluate(el => el.innerText, handle);
console.log(val); // => a
// ...
Or:
const handle = await page.$("li");
const val = await handle.evaluate(el => el.innerText);
console.log(val); // => a
Making this work on your example is a matter of either swapping the loop and the evaluate call so that you access courseCountArr[i] in Puppeteer land, unpacking the nested ElementHandles into separate parameters to evaluate, or moving most of your console browser calls to click on things back to Puppeteer (depending on your use case and goals with the code).
You could apply the evaluate call to each ElementHandle:
const nestedHandles = await page.$$("li");
for (const handle of nestedHandles) {
const val = await handle.evaluate(el => el.innerText);
console.log(val); // a b c
}
To get an array of results, you could do:
const nestedHandles = await page.$$("li");
const vals = await Promise.all(
nestedHandles.map(el => el.evaluate(el => el.innerText))
);
console.log(vals); // [ 'a', 'b', 'c' ]
You can also unpack the ElementHandles into arguments for evaluate and use the (...els) parameter list in the callback:
const nestedHandles = await page.$$("li");
const vals = await page.evaluate((...els) =>
els.map(e => e.innerText),
...nestedHandles
);
console.log(vals); // => [ 'a', 'b', 'c' ]
If you have other arguments in addition to the handles you can do:
const nestedHandle = await page.$$("li");
const vals = await page.evaluate((foo, bar, ...els) =>
els.map(e => e.innerText + foo + bar)
, 1, 2, ...nestedHandle);
console.log(vals); // => [ 'a12', 'b12', 'c12' ]
or:
const nestedHandle = await page.$$("li");
const vals = await page.evaluate(({foo, bar}, ...els) =>
els.map(e => e.innerText + foo + bar)
, {foo: 1, bar: 2}, ...nestedHandle);
console.log(vals); // => [ 'a12', 'b12', 'c12' ]
Another option may be to use $$eval, which selects multiple handles, then runs a callback in browser context with the array of selected elements as its parameter:
const vals = await page.$$eval("li", els =>
els.map(e => e.innerText)
);
console.log(vals); // => [ 'a', 'b', 'c' ]
This is probably cleanest if you're not doing anything else with the handles in Node.
Similarly, you can totally bypass Puppeteer and do the entire selection and manipulation in browser context:
const vals = await page.evaluate(() =>
[...document.querySelectorAll("li")].map(e => e.innerText)
);
console.log(vals); // => [ 'a', 'b', 'c' ]
(note that getting the inner text throughout is just a placeholder for whatever browser code of arbitrary complexity you might have)
I wrote a little utility to solve this problem
const jsHandleToJSON = (jsHandle) => {
if (jsHandle.length > 0) {
let json = []
for (let i = 0; i < jsHandle.length; i++) {
json.push(jsHandleToJSON(jsHandle[i]))
}
return json
} else {
let json = {}
const keys = Object.keys(jsHandle)
for (let i = 0; i < keys.length; i++) {
if (typeof jsHandle[keys[i]] !== 'object') {
json[keys[i]] = jsHandle[keys[i]]
} else if (['elements', 'element'].includes(keys[i])) {
json[keys[i]] = jsHandleToJSON(jsHandle[keys[i]])
} else {
console.log(`skipping field ${keys[i]}`)
}
}
return json
}
}
It will create a new object with all the primitive fields of the jsHandle (recursively) and parse some extra jsHandle properties ['elements', 'element'], skips the others.
You could add more properties in there if you need them (but adding all of them will result in a infinite loop).
To make the log into puppeteer working you need to add the following line before the evaluate
page.on('console', message => console.log(`${message.type()}: ${message.text()}`))
So I'm trying to crawl a site using Puppeteer. All the data I'm looking to grab is in multiple tables. Specifically, I'm trying to grab the data from a single table. I was able to grab the specific table using a very verbose .querySelector(table.myclass ~ table.myclass), so now my issue is, my code is grabbing the first item of each table (starting from the correct table, which is the 2nd table), but I can't find a way to get it to just grab all the data in only the 2nd table.
const puppeteer = require('puppeteer');
const myUrl = "https://coolurl.com";
(async () => {
const browser = await puppeteer.launch({
headless: true
});
const page = (await browser.pages())[0];
await page.setViewport({
width: 1920,
height: 926
});
await page.goto(myUrl);
let gameData = await page.evaluate(() => {
let games = [];
let gamesElms = document.querySelectorAll('table.myclass ~ table.myclass');
gamesElms.forEach((gameelement) => {
let gameJson = {};
try {
gameJson.name = gameelement.querySelector('.myclass2').textContent;
} catch (exception) {
console.warn(exception);
}
games.push(gameJson);
});
return games;
})
console.log(gameData);
browser.close();
})();
You can use either of the following methods to select the second table:
let gamesElms = document.querySelectorAll('table.myclass')[1];
let gamesElms = document.querySelector('table.myclass:nth-child(2)');
Additionally, you can use the example below to push all of the data from the table to an array:
let games = Array.from(document.querySelectorAll('table.myclass:nth-child(2) tr'), e => {
return Array.from(e.querySelectorAll('th, td'), e => e.textContent);
});
// console.log(games[rowNum][cellNum]); <-- textContent