Background:
Using NodeJS/CucumberJS/Puppeteer to build end-to-end regression test for an emberJS solution.
Problem:
Selecting (page.click) and getting textContent of one of the elements when there are several dynamic elements with the same selector? (In my case, I have 4 elements with the same selector = [data-test-foo4="true"])
I know, that with:
const text = await page.evaluate( () => document.querySelector('[data-test-foo4="true"]').textContent );
I can get the text of the first element, but how do I select the other elements with the same selector? I've tried:
var text = await page.evaluate( () => document.querySelectorAll('[data-test-foo4="true"]').textContent )[1];
console.log('text = ' + text);
but it gives me 'text = undefined'
Also, the following:
await page.click('[data-test-foo4="true"]');
selects the first elements with that selector, but how can I select the next one with that selector?
You can use Array.from() to create an array containing all of the textContent values of each element matching your selector:
const text = await page.evaluate(() => Array.from(document.querySelectorAll('[data-test-foo4="true"]'), element => element.textContent));
console.log(text[0]);
console.log(text[1]);
console.log(text[2]);
If you need to click more than one element containing a given selector, you can create an ElementHandle array using page.$$() and click each one using elementHandle.click():
const example = await page.$$('[data-test-foo4="true"]');
await example[0].click();
await example[1].click();
await example[2].click();
https://github.com/puppeteer/puppeteer/blob/v5.5.0/docs/api.md#frameselector-1
const pageFrame = page.mainFrame();
const elems = await pageFrame.$$(selector);
Not mentioned yet is the awesome page.$$eval which is basically a wrapper for this common pattern:
page.evaluate(() => callback([...document.querySelectorAll(selector)]))
For example,
const puppeteer = require("puppeteer"); // ^19.1.0
const html = `<!DOCTYPE html>
<html>
<body>
<ul>
<li data-test-foo4="true">red</li>
<li data-test-foo4="false">blue</li>
<li data-test-foo4="true">purple</li>
</ul>
</body>
</html>`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const sel = '[data-test-foo4="true"]';
const text = await page.$$eval(sel, els => els.map(e => e.textContent));
console.log(text); // => [ 'red', 'purple' ]
console.log(text[0]); // => 'red'
console.log(text[1]); // => 'purple'
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
If you want to pass additional data from Node for $$eval to use in the browser context, you can add additional arguments:
const text = await page.$$eval(
'[data-test-foo4="true"]',
(els, data) => els.map(e => e.textContent + data),
"X" // 'data' passed to the callback
);
console.log(text); // => [ 'redX', 'purpleX' ]
You can use page.$$eval to issue a native DOM click on each element or on a specific element:
// click all
await page.$$eval(sel, els => els.forEach(el => el.click()));
// click one (hardcoded)
await page.$$eval(sel, els => els[1].click());
// click one (passing `n` from Node)
await page.$$eval(sel, (els, n) => els[n].click(), n);
or use page.$$ to return the elements back to Node to issue trusted Puppeteer clicks:
const els = await page.$$('[data-test-foo4="true"]');
for (const el of els) {
await el.click();
}
// or click the n-th:
await els[n].click();
Pertinent to OP's question, you can always access the n-th item of these arrays with the usual syntax els[n] as shown above, but often, it's best to select based on the :nth-child pseudoselector. This depends on how the elements are arranged in the DOM, though, so it's not as general of a solution as array access.
Related
When taking screenshots using puppeteer, dynamic elements with the .menu__link class are required to change innerHTML to a stub.
I use BackstopJs puppet/onReady.js
When I try this, only the first element on the page is replaced:
module.exports = async (page) => {
const myLocalValue = "Test";
const tweets = await page.$$('.menu__link');
for (const tweet of tweets) {
await page.$eval('.menu__link', (el, value) => el.innerHTML = value, myLocalValue)
}
};
And this code does not work at all:
module.exports = async (page) => {
const myLocalValue = "Test";
const tweets = await page.$$('.menu__link');
for (const tweet of tweets) {
await page.$eval(tweet, (el, value) => el.innerHTML = value, myLocalValue)
}
};
Please tell me how to replace innerHTML on the entire page for all .menu__link using puppeteer?
You can use $$eval
await page.$$eval('. menu__link', (links, value) => links.forEach(el => el.innerHTML = value), 'myLocalValue');
I am trying to scrape a one-page website. There are multiple selection combinations that would result in different search redirects. I wrote a for loop in the page.evaluate's call back function to click the different selections and did the click search in every button. However, I got error: Converting circular structure to JSON Are you passing a nested JSHandle?
Please help!
My current version of code looks like this:
const res = await page.evaluate(async (i, courseCountArr, page) => {
for (let j = 1; j < courseCountArr[i]; j++) {
await document.querySelectorAll('.btn-group > button, .bootstrap-select > button')['1'].click() // click on school drop down
await document.querySelectorAll('div.bs-container > div.dropdown-menu > ul > li > a')[`${j}`].click() // click on each school option
await document.querySelectorAll('.btn-group > button, .bootstrap-select > button')['2'].click() // click on subject drop down
const subjectLen = document.querySelectorAll('div.bs-container > div.dropdown-menu > ul > li > a').length // length of the subject drop down
for (let k = 1; k < subjectLen; k++) {
await document.querySelectorAll('div.bs-container > div.dropdown-menu > ul > li > a')[`${k}`].click() // click on each subject option
document.getElementById('buttonSearch').click() //click on search button
page.waitForSelector('.strong, .section-body')
return document.querySelectorAll('.strong, .section-body').length
}
}
}, i, courseCountArr, page);
Why the error happens
While you haven't shown enough code to reproduce the problem (is courseCountArr an array of ElementHandles? Passing page to evaluate won't work either, that's a Node object), here's a minimal reproduction that shows the likely pattern:
const puppeteer = require("puppeteer");
let browser;
(async () => {
const html = `<ul><li>a</li><li>b</li><li>c</li></ul>`;
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
// ...
const nestedHandle = await page.$$("li"); // $$ selects all matches
await page.evaluate(els => {}, nestedHandle); // throws
// ...
})()
.catch(err => console.error(err))
.finally(() => browser?.close())
;
The output is
TypeError: Converting circular structure to JSON
--> starting at object with constructor 'BrowserContext'
| property '_browser' -> object with constructor 'Browser'
--- property '_defaultContext' closes the circle Are you passing a nested JSHandle?
at JSON.stringify (<anonymous>)
Why is this happening? All code inside of the callback to page.evaluate (and family: evaluateHandle, $eval, $$eval) is executed inside the browser console programmatically by Puppeteer. The browser console is a distinct environment from Node, where Puppeteer and the ElementHandles live. To bridge the inter-process gap, the callback to evaluate, parameters and return value are serialized and deserialized.
The consequence of this is that you can't access any Node state like you're attempting with page.waitForSelector('.strong, .section-body') inside the browser. page is in a totally different process from the browser. (As an aside, document.querySelectorAll is purely synchronous, so there's no point in awaiting it.)
Puppeteer ElementHandles are complex structures used to hook into the page's DOM that can't be serialized and passed to the page as you're trying to do. Puppeteer has to perform the translation under the hood. Any ElementHandles passed to evaluate (or have .evaluate() called on them) are followed to the DOM node in the browser that they represent, and that DOM node is what your evaluate's callback is invoked with. Puppeteer can't do this with nested ElementHandles, as of the time of writing.
Possible fixes
In the above code, if you change .$$ to .$, you'll retrieve only the first <li>. This singular, non-nested ElementHandle can be converted to an element:
// ...
const handle = await page.$("li");
const val = await page.evaluate(el => el.innerText, handle);
console.log(val); // => a
// ...
Or:
const handle = await page.$("li");
const val = await handle.evaluate(el => el.innerText);
console.log(val); // => a
Making this work on your example is a matter of either swapping the loop and the evaluate call so that you access courseCountArr[i] in Puppeteer land, unpacking the nested ElementHandles into separate parameters to evaluate, or moving most of your console browser calls to click on things back to Puppeteer (depending on your use case and goals with the code).
You could apply the evaluate call to each ElementHandle:
const nestedHandles = await page.$$("li");
for (const handle of nestedHandles) {
const val = await handle.evaluate(el => el.innerText);
console.log(val); // a b c
}
To get an array of results, you could do:
const nestedHandles = await page.$$("li");
const vals = await Promise.all(
nestedHandles.map(el => el.evaluate(el => el.innerText))
);
console.log(vals); // [ 'a', 'b', 'c' ]
You can also unpack the ElementHandles into arguments for evaluate and use the (...els) parameter list in the callback:
const nestedHandles = await page.$$("li");
const vals = await page.evaluate((...els) =>
els.map(e => e.innerText),
...nestedHandles
);
console.log(vals); // => [ 'a', 'b', 'c' ]
If you have other arguments in addition to the handles you can do:
const nestedHandle = await page.$$("li");
const vals = await page.evaluate((foo, bar, ...els) =>
els.map(e => e.innerText + foo + bar)
, 1, 2, ...nestedHandle);
console.log(vals); // => [ 'a12', 'b12', 'c12' ]
or:
const nestedHandle = await page.$$("li");
const vals = await page.evaluate(({foo, bar}, ...els) =>
els.map(e => e.innerText + foo + bar)
, {foo: 1, bar: 2}, ...nestedHandle);
console.log(vals); // => [ 'a12', 'b12', 'c12' ]
Another option may be to use $$eval, which selects multiple handles, then runs a callback in browser context with the array of selected elements as its parameter:
const vals = await page.$$eval("li", els =>
els.map(e => e.innerText)
);
console.log(vals); // => [ 'a', 'b', 'c' ]
This is probably cleanest if you're not doing anything else with the handles in Node.
Similarly, you can totally bypass Puppeteer and do the entire selection and manipulation in browser context:
const vals = await page.evaluate(() =>
[...document.querySelectorAll("li")].map(e => e.innerText)
);
console.log(vals); // => [ 'a', 'b', 'c' ]
(note that getting the inner text throughout is just a placeholder for whatever browser code of arbitrary complexity you might have)
I wrote a little utility to solve this problem
const jsHandleToJSON = (jsHandle) => {
if (jsHandle.length > 0) {
let json = []
for (let i = 0; i < jsHandle.length; i++) {
json.push(jsHandleToJSON(jsHandle[i]))
}
return json
} else {
let json = {}
const keys = Object.keys(jsHandle)
for (let i = 0; i < keys.length; i++) {
if (typeof jsHandle[keys[i]] !== 'object') {
json[keys[i]] = jsHandle[keys[i]]
} else if (['elements', 'element'].includes(keys[i])) {
json[keys[i]] = jsHandleToJSON(jsHandle[keys[i]])
} else {
console.log(`skipping field ${keys[i]}`)
}
}
return json
}
}
It will create a new object with all the primitive fields of the jsHandle (recursively) and parse some extra jsHandle properties ['elements', 'element'], skips the others.
You could add more properties in there if you need them (but adding all of them will result in a infinite loop).
To make the log into puppeteer working you need to add the following line before the evaluate
page.on('console', message => console.log(`${message.type()}: ${message.text()}`))
I'm trying to test amending text in an editable input which contains the title of the current record - and I want to able to test editing such text, replacing it with something else.
I know I can use await page.type('#inputID', 'blah'); to insert "blah" into the textbox (which in my case, having existing text, only appends "blah"), however, I cannot find any page methods1 that allow deleting or replacing existing text.
You can use page.evaluate to manipulate DOM as you see fit:
await page.evaluate( () => document.getElementById("inputID").value = "")
However sometimes just manipulating a given field might not be enough (a target page could be an SPA with event listeners), so emulating real keypresses is preferable. The examples below are from the informative issue in puppeteer's Github concerning this task.
Here we press Backspace as many times as there are characters in that field:
const inputValue = await page.$eval('#inputID', el => el.value);
// focus on the input field
await page.click('#inputID');
for (let i = 0; i < inputValue.length; i++) {
await page.keyboard.press('Backspace');
}
Another interesting solution is to click the target field 3 times so that the browser would select all the text in it and then you could just type what you want:
const input = await page.$('#inputID');
await input.click({ clickCount: 3 })
await input.type("Blah");
You can use the page.keyboard methods to change input values, or you can use page.evaluate().
Replace All Text:
// Using page.keyboard:
await page.focus('#example');
await page.keyboard.down('Control');
await page.keyboard.press('A');
await page.keyboard.up('Control');
await page.keyboard.press('Backspace');
await page.keyboard.type('foo');
// Using page.evaluate:
await page.evaluate(() => {
const example = document.getElementById('example');
example.value = 'foo';
});
Append Text:
// Using page.keyboard:
await page.focus('#example');
await page.keyboard.press('End');
await page.keyboard.type(' bar qux');
// Using page.evaluate:
await page.evaluate(() => {
const example = document.getElementById('example');
example.value += ' bar qux';
});
Backspace Last Character:
// Using page.keyboard:
await page.focus('#example');
await page.keyboard.press('End');
await page.keyboard.press('Backspace');
// Using page.evaluate:
await page.evaluate(() => {
const example = document.getElementById('example');
example.value = example.value.slice(0, -1);
});
Delete First Character:
// Using page.keyboard:
await page.focus('#example');
await page.keyboard.press('Home');
await page.keyboard.press('Delete');
// Using page.evaluate:
await page.evaluate(() => {
const example = document.getElementById('example');
example.value = example.value.slice(1);
});
If you are not interested in simulating any key events, you could also use puppeteer's page.$eval method as a concise means to remove the textarea's value...
await page.$eval('#inputID', el => el.value = '');
await page.type('#inputID', 'blah');
...or even completely replace the value in one step, without simulating the subsequent typing:
await page.$eval('#inputID', el => el.value = 'blah');
This works perfect for "clear only" method:
const input = await page.$('#inputID');
await input.click({ clickCount: 3 })
await page.keyboard.press('Backspace')
above answers has an ESLint issues.
the following solution passing ESLint varification:
await page.evaluate(
(selector) => { (document.querySelector(selector).value = ''); },
inputBoxSelector,
);
Use the Keyboard API which simulates keystrokes:
await page.focus(css); // CSS selector of the input element
await page.keyboard.down('Shift');
await page.keyboard.press('Home');
await page.keyboard.up('Shift'); // Release the pressed 'Shift' key
await page.keyboard.press('Backspace');
This keystroke is cross-platform as opposed to using ctrl + A(does not work in Mac to select all characters in a input field)
The most clean way for me is:
Setup
const clearInput = async (page, { selector }) => {
const input = await page.$(selector)
await input.click({ clickCount: 3 })
await page.keyboard.press('Backspace')
}
Usage
const page = await context.newPage()
await clearInput(page, { selector: 'input[name="session[username_or_email]"]' })
await clearInput(page, { selector: 'input[name="session[password]"]' })
Well, the reason you want to delete existing text generally may be want to replace it.
You can use page.evalute
let title = getTitle()
let selector = getSelector()
await page.evaluate(
({selector, title}) => {
let el = document.querySelector(selector)
if ('value' in el) el.value = title
else el.innerText = title
},
{selector, title}
)
someField.type("");
Pass the empty string before typing your content.
This worked for me.
So I'm trying to crawl a site using Puppeteer. All the data I'm looking to grab is in multiple tables. Specifically, I'm trying to grab the data from a single table. I was able to grab the specific table using a very verbose .querySelector(table.myclass ~ table.myclass), so now my issue is, my code is grabbing the first item of each table (starting from the correct table, which is the 2nd table), but I can't find a way to get it to just grab all the data in only the 2nd table.
const puppeteer = require('puppeteer');
const myUrl = "https://coolurl.com";
(async () => {
const browser = await puppeteer.launch({
headless: true
});
const page = (await browser.pages())[0];
await page.setViewport({
width: 1920,
height: 926
});
await page.goto(myUrl);
let gameData = await page.evaluate(() => {
let games = [];
let gamesElms = document.querySelectorAll('table.myclass ~ table.myclass');
gamesElms.forEach((gameelement) => {
let gameJson = {};
try {
gameJson.name = gameelement.querySelector('.myclass2').textContent;
} catch (exception) {
console.warn(exception);
}
games.push(gameJson);
});
return games;
})
console.log(gameData);
browser.close();
})();
You can use either of the following methods to select the second table:
let gamesElms = document.querySelectorAll('table.myclass')[1];
let gamesElms = document.querySelector('table.myclass:nth-child(2)');
Additionally, you can use the example below to push all of the data from the table to an array:
let games = Array.from(document.querySelectorAll('table.myclass:nth-child(2) tr'), e => {
return Array.from(e.querySelectorAll('th, td'), e => e.textContent);
});
// console.log(games[rowNum][cellNum]); <-- textContent
Does anybody know how to get the innerHTML or text of an element? Or even better; how to click an element with a specific innerHTML? This is how it would work with normal JavaScript:
var found = false
$(selector).each(function() {
if (found) return;
else if ($(this).text().replace(/[^0-9]/g, '') === '5' {
$(this).trigger('click');
found = true
}
});
Thanks in advance for any help!
This is how i get innerHTML:
page.$eval(selector, (element) => {
return element.innerHTML
})
Returning innerHTML of an Element
You can use the following methods to return the innerHTML of an element:
page.$eval()
const inner_html = await page.$eval('#example', element => element.innerHTML);
page.evaluate()
const inner_html = await page.evaluate(() => document.querySelector('#example').innerHTML);
page.$() / elementHandle.getProperty() / jsHandle.jsonValue()
const element = await page.$('#example');
const element_property = await element.getProperty('innerHTML');
const inner_html = await element_property.jsonValue();
Clicking an Element with Specific innerHTML
You can use the following methods to click on an element based on the innerHTML that is contained within the element:
page.$$eval()
await page.$$eval('.example', elements => {
const element = elements.find(element => element.innerHTML === '<h1>Hello, world!</h1>');
element.click();
});
page.evaluate()
await page.evaluate(() => {
const elements = [...document.querySelectorAll('.example')];
const element = elements.find(element => element.innerHTML === '<h1>Hello, world!</h1>');
element.click();
});
page.evaluateHandle() / elementHandle.click()
const element = await page.evaluateHandle(() => {
const elements = [...document.querySelectorAll('.example')];
const element = elements.find(element => element.innerHTML === '<h1>Hello, world!</h1>');
return element;
});
await element.click();
This should work with puppeteer:)
const page = await browser.newPage();
const title = await page.evaluate(el => el.innerHTML, await page.$('h1'));
You can leverage the page.$$(selector) to get all your target elments and then use page.evaluate() to get the content(innerHTML), then apply your criteria. It should look something like:
const targetEls = await page.$$('yourFancySelector');
for(let target of targetEls){
const iHtml = await page.evaluate(el => el.innerHTML, target);
if (iHtml.replace(/[^0-9]/g, '') === '5') {
await target.click();
break;
}
}
I can never get the .innerHtml to work reliable. I always do the following:
let els = page.$$('selector');
for (let el of els) {
let content = await (await el.getProperty('textContent')).jsonValue();
}
Then you have your text in the 'content' variable.
With regard to this part of your question...
"Or even better; how to click an element with a specific innerHTML."
There are some particulars around innerHTML, innerText, and textContent that might give you grief. Which you can work-around using a sufficiently loose XPath query with Puppeteer v1.1.1.
Something like this:
const el = await page.$x('//*[text()[contains(., "search-text-here")]]');
await el[0].click({
button: 'left',
clickCount: 1,
delay: 50
});
Just keep in mind that you will get an array of ElementHandles back from that query. So... the particular item you are looking for might not be at [0] if your text isn't unique.
Options passed to .click() aren't necessary if all you need is a single left-click.
You can simply write as below. (no need await sentence in the last part)
const center = await page.$eval('h2.font-34.uppercase > strong', e => e.innerHTML);
<div id="innerHTML">Hello</div>
var myInnerHtml = document.getElementById("innerHTML").innerHTML;
console.log(myInnerHtml);