Target URL: http://www.supremenewyork.com/shop/jackets/uaxjeqvro/fm9kozqa6
Target Element: #s
Problem: Can't select a value from the drop down. I've tried multiple things, only related question I could find on Stack Overflow was this,[How to select an option from dropdown select but none of these answers describe how to select the option via the text of the element rather than the value of the option.
This should work, tested with version 1.7.0 on
https://try-puppeteer.appspot.com/
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('http://www.supremenewyork.com/shop/jackets/uaxjeqvro/fm9kozqa6');
let $elemHandler = await page.$('#s');
let properties = await $elemHandler.getProperties();
for (const property of properties.values()) {
const element = property.asElement();
if (element){
let hText = await element.getProperty("text");
let text = await hText.jsonValue();
if(text==="Large"){
let hValue = await element.getProperty("value");
let value = await hValue.jsonValue();
await page.select("#s",value); // or use 58730
console.log(`Selected ${text} which is value ${value}.`);
}
}
}
await browser.close();
Related
I am trying to make a simple webscraper using Node and Puppeteer to get the titles of posts on reddit, but am having issues accessing a global variable, SUBREDDIT_NAME from within only one function, extractItems(). It works fine with every other function, but for that one I have to make a local variable with the same value for it to work.
Am I completely misunderstanding variable scope in Javascript?
I have tried everything I can think of, and the only thing that works is to create a local variable inside of extractedItems() with the value of "news", otherwise I get nothing.
const fs = require('fs');
const puppeteer = require('puppeteer');
const SUBREDDIT = (subreddit_name) => `https://reddit.com/r/${subreddit_name}/`;
const SUBREDDIT_NAME= "news";
function extractItems() {
const extractedElements = document.querySelectorAll(`a[href*='r/${SUBREDDIT_NAME}/comments/'] h3`);
const items = [];
for (let element of extractedElements) {
items.push(element.innerText);
}
return items;
}
async function scrapeInfiniteScrollItems(
page,
extractItems,
itemTargetCount,
scrollDelay = 1000,
) {
let items = [];
try {
let previousHeight;5
while (items.length < itemTargetCount) {
items = await page.evaluate(extractItems);
previousHeight = await page.evaluate('document.body.scrollHeight');
await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
await page.waitForFunction(`document.body.scrollHeight > ${previousHeight}`);
await page.waitFor(scrollDelay);
}
} catch(e) { }
return items;
}
(async () => {
// Set up browser and page.
const browser = await puppeteer.launch({
headless: false,
args: ['--no-sandbox', '--disable-setuid-sandbox'],
});
const page = await browser.newPage();
page.setViewport({ width: 1280, height: 926 });
// Navigate to the demo page.
await page.goto(SUBREDDIT(SUBREDDIT_NAME));
// Scroll and extract items from the page.
const items = await scrapeInfiniteScrollItems(page, extractItems, 100);
// Save extracted items to a file.
fs.writeFileSync('./items.txt', items.join('\n') + '\n');
// Close the browser.
await browser.close();
})();
I expect a text file with the 100 first found titles, but it only works when I hardcode the subreddit into the extractItems() function.
The problem is that the extractItems function is converted to a string (without processing the template literal) and executed in the pages context where there is no SUBREDDIT_NAME variable.
You can fix that by doing something like this:
function extractItems(name) {
const extractedElements = document.querySelectorAll(`a[href*='r/${name}/comments/'] h3`);
const items = [];
for (let element of extractedElements) {
items.push(element.innerText);
}
return items;
}
page.evaluate(`(${extractItems})(${SUBREDDIT_NAME})`)
I'm trying to test amending text in an editable input which contains the title of the current record - and I want to able to test editing such text, replacing it with something else.
I know I can use await page.type('#inputID', 'blah'); to insert "blah" into the textbox (which in my case, having existing text, only appends "blah"), however, I cannot find any page methods1 that allow deleting or replacing existing text.
You can use page.evaluate to manipulate DOM as you see fit:
await page.evaluate( () => document.getElementById("inputID").value = "")
However sometimes just manipulating a given field might not be enough (a target page could be an SPA with event listeners), so emulating real keypresses is preferable. The examples below are from the informative issue in puppeteer's Github concerning this task.
Here we press Backspace as many times as there are characters in that field:
const inputValue = await page.$eval('#inputID', el => el.value);
// focus on the input field
await page.click('#inputID');
for (let i = 0; i < inputValue.length; i++) {
await page.keyboard.press('Backspace');
}
Another interesting solution is to click the target field 3 times so that the browser would select all the text in it and then you could just type what you want:
const input = await page.$('#inputID');
await input.click({ clickCount: 3 })
await input.type("Blah");
You can use the page.keyboard methods to change input values, or you can use page.evaluate().
Replace All Text:
// Using page.keyboard:
await page.focus('#example');
await page.keyboard.down('Control');
await page.keyboard.press('A');
await page.keyboard.up('Control');
await page.keyboard.press('Backspace');
await page.keyboard.type('foo');
// Using page.evaluate:
await page.evaluate(() => {
const example = document.getElementById('example');
example.value = 'foo';
});
Append Text:
// Using page.keyboard:
await page.focus('#example');
await page.keyboard.press('End');
await page.keyboard.type(' bar qux');
// Using page.evaluate:
await page.evaluate(() => {
const example = document.getElementById('example');
example.value += ' bar qux';
});
Backspace Last Character:
// Using page.keyboard:
await page.focus('#example');
await page.keyboard.press('End');
await page.keyboard.press('Backspace');
// Using page.evaluate:
await page.evaluate(() => {
const example = document.getElementById('example');
example.value = example.value.slice(0, -1);
});
Delete First Character:
// Using page.keyboard:
await page.focus('#example');
await page.keyboard.press('Home');
await page.keyboard.press('Delete');
// Using page.evaluate:
await page.evaluate(() => {
const example = document.getElementById('example');
example.value = example.value.slice(1);
});
If you are not interested in simulating any key events, you could also use puppeteer's page.$eval method as a concise means to remove the textarea's value...
await page.$eval('#inputID', el => el.value = '');
await page.type('#inputID', 'blah');
...or even completely replace the value in one step, without simulating the subsequent typing:
await page.$eval('#inputID', el => el.value = 'blah');
This works perfect for "clear only" method:
const input = await page.$('#inputID');
await input.click({ clickCount: 3 })
await page.keyboard.press('Backspace')
above answers has an ESLint issues.
the following solution passing ESLint varification:
await page.evaluate(
(selector) => { (document.querySelector(selector).value = ''); },
inputBoxSelector,
);
Use the Keyboard API which simulates keystrokes:
await page.focus(css); // CSS selector of the input element
await page.keyboard.down('Shift');
await page.keyboard.press('Home');
await page.keyboard.up('Shift'); // Release the pressed 'Shift' key
await page.keyboard.press('Backspace');
This keystroke is cross-platform as opposed to using ctrl + A(does not work in Mac to select all characters in a input field)
The most clean way for me is:
Setup
const clearInput = async (page, { selector }) => {
const input = await page.$(selector)
await input.click({ clickCount: 3 })
await page.keyboard.press('Backspace')
}
Usage
const page = await context.newPage()
await clearInput(page, { selector: 'input[name="session[username_or_email]"]' })
await clearInput(page, { selector: 'input[name="session[password]"]' })
Well, the reason you want to delete existing text generally may be want to replace it.
You can use page.evalute
let title = getTitle()
let selector = getSelector()
await page.evaluate(
({selector, title}) => {
let el = document.querySelector(selector)
if ('value' in el) el.value = title
else el.innerText = title
},
{selector, title}
)
someField.type("");
Pass the empty string before typing your content.
This worked for me.
So I'm trying to crawl a site using Puppeteer. All the data I'm looking to grab is in multiple tables. Specifically, I'm trying to grab the data from a single table. I was able to grab the specific table using a very verbose .querySelector(table.myclass ~ table.myclass), so now my issue is, my code is grabbing the first item of each table (starting from the correct table, which is the 2nd table), but I can't find a way to get it to just grab all the data in only the 2nd table.
const puppeteer = require('puppeteer');
const myUrl = "https://coolurl.com";
(async () => {
const browser = await puppeteer.launch({
headless: true
});
const page = (await browser.pages())[0];
await page.setViewport({
width: 1920,
height: 926
});
await page.goto(myUrl);
let gameData = await page.evaluate(() => {
let games = [];
let gamesElms = document.querySelectorAll('table.myclass ~ table.myclass');
gamesElms.forEach((gameelement) => {
let gameJson = {};
try {
gameJson.name = gameelement.querySelector('.myclass2').textContent;
} catch (exception) {
console.warn(exception);
}
games.push(gameJson);
});
return games;
})
console.log(gameData);
browser.close();
})();
You can use either of the following methods to select the second table:
let gamesElms = document.querySelectorAll('table.myclass')[1];
let gamesElms = document.querySelector('table.myclass:nth-child(2)');
Additionally, you can use the example below to push all of the data from the table to an array:
let games = Array.from(document.querySelectorAll('table.myclass:nth-child(2) tr'), e => {
return Array.from(e.querySelectorAll('th, td'), e => e.textContent);
});
// console.log(games[rowNum][cellNum]); <-- textContent
Background:
Using NodeJS/CucumberJS/Puppeteer to build end-to-end regression test for an emberJS solution.
Problem:
Selecting (page.click) and getting textContent of one of the elements when there are several dynamic elements with the same selector? (In my case, I have 4 elements with the same selector = [data-test-foo4="true"])
I know, that with:
const text = await page.evaluate( () => document.querySelector('[data-test-foo4="true"]').textContent );
I can get the text of the first element, but how do I select the other elements with the same selector? I've tried:
var text = await page.evaluate( () => document.querySelectorAll('[data-test-foo4="true"]').textContent )[1];
console.log('text = ' + text);
but it gives me 'text = undefined'
Also, the following:
await page.click('[data-test-foo4="true"]');
selects the first elements with that selector, but how can I select the next one with that selector?
You can use Array.from() to create an array containing all of the textContent values of each element matching your selector:
const text = await page.evaluate(() => Array.from(document.querySelectorAll('[data-test-foo4="true"]'), element => element.textContent));
console.log(text[0]);
console.log(text[1]);
console.log(text[2]);
If you need to click more than one element containing a given selector, you can create an ElementHandle array using page.$$() and click each one using elementHandle.click():
const example = await page.$$('[data-test-foo4="true"]');
await example[0].click();
await example[1].click();
await example[2].click();
https://github.com/puppeteer/puppeteer/blob/v5.5.0/docs/api.md#frameselector-1
const pageFrame = page.mainFrame();
const elems = await pageFrame.$$(selector);
Not mentioned yet is the awesome page.$$eval which is basically a wrapper for this common pattern:
page.evaluate(() => callback([...document.querySelectorAll(selector)]))
For example,
const puppeteer = require("puppeteer"); // ^19.1.0
const html = `<!DOCTYPE html>
<html>
<body>
<ul>
<li data-test-foo4="true">red</li>
<li data-test-foo4="false">blue</li>
<li data-test-foo4="true">purple</li>
</ul>
</body>
</html>`;
let browser;
(async () => {
browser = await puppeteer.launch();
const [page] = await browser.pages();
await page.setContent(html);
const sel = '[data-test-foo4="true"]';
const text = await page.$$eval(sel, els => els.map(e => e.textContent));
console.log(text); // => [ 'red', 'purple' ]
console.log(text[0]); // => 'red'
console.log(text[1]); // => 'purple'
})()
.catch(err => console.error(err))
.finally(() => browser?.close());
If you want to pass additional data from Node for $$eval to use in the browser context, you can add additional arguments:
const text = await page.$$eval(
'[data-test-foo4="true"]',
(els, data) => els.map(e => e.textContent + data),
"X" // 'data' passed to the callback
);
console.log(text); // => [ 'redX', 'purpleX' ]
You can use page.$$eval to issue a native DOM click on each element or on a specific element:
// click all
await page.$$eval(sel, els => els.forEach(el => el.click()));
// click one (hardcoded)
await page.$$eval(sel, els => els[1].click());
// click one (passing `n` from Node)
await page.$$eval(sel, (els, n) => els[n].click(), n);
or use page.$$ to return the elements back to Node to issue trusted Puppeteer clicks:
const els = await page.$$('[data-test-foo4="true"]');
for (const el of els) {
await el.click();
}
// or click the n-th:
await els[n].click();
Pertinent to OP's question, you can always access the n-th item of these arrays with the usual syntax els[n] as shown above, but often, it's best to select based on the :nth-child pseudoselector. This depends on how the elements are arranged in the DOM, though, so it's not as general of a solution as array access.
I'm trying to access elements on a website with Puppeteer. Following is my minimal working example. The output of the page.evaluate is null when I try to get the span element. When I change it to the p element it correctly outputs an emtpy object. Is this because the span element has display: none ? What do I have to do to get the span element?
(async() => {
const browser = await puppeteer.launch({
headless: true,
ignoreHTTPSErrors: true,
timeout: 1000
});
const page = await browser.newPage();
await page.goto('https://example.com', {waitUntil: 'networkidle2'});
const elem = await page.evaluate(() => {
const elem = document.querySelector('span');
return elem;
})
console.log("ELEM", elem)
browser.close();
})();
An additional span element was being added by an extension to the Page DOM, which resulted in the incorrect selector specified for document.querySelector().
The correct selector, based on the information given, is 'p':
const elem = document.querySelector('p');