How to access React Event Handlers with Puppeteer - javascript

I'm not entirely sure I understand what I'm asking for, and I'm hoping someone can explain. I'm attempting to scrape a website using Puppeteer on NodeJS. I've gotten as far as selecting the element I need and accessing it's properties, however, I cannot access the property I need to pull the information I want. The information I want is within the green box below, however I cannot get past the __reactEventHandlers$kq2rgk91p6 as that just returns undefined.
I used the following selector, which works and accesses all other properties, just not the one I want.
const checked = await page.evaluate(() => document.querySelector(stockSelector));

If I understand correctly (without the URL and minimal reproducible code it is hard to guess), this is the issue: according to the docs, various eval functions can transfer only serializable data (roughly, the data JSON can handle, with some additions). Your code returns a DOM element, which is not serializable (it has methods and circular references). Try to retrieve the data in the browser context and returns only serializable data. For example:
const data = await page.evaluate(
selector => document.querySelector(selector)
.__reactEventHandlers$kq2rgk91p6.children[1].props.record.Stock,
selector,
);
If the array in the .Stockproperty is serializable, you will get the data.

I am using this function to extract React props, it helps to deal with the random characters at the end of react event handler. If you are not sure which childIndex to use, check React Chrome extension to navigate to the element.
const extractProps = async (elementHandle, childIndex) => {
let elementHandlerProperties = await elementHandle.getProperties()
for (let elProp of elementHandlerProperties) {
let key = elProp[0]
if (key.startsWith("__reactEventHandler")) {
let reactEventHandler = elProp[1]
let children = await reactEventHandler.getProperty("children")
let child = await children.getProperty(childIndex.toString())
let reactProps = await child.getProperty("props")
return reactProps
}
}
return null
}
Usage:
const selector = ".some-class"
const elementHandle = await page.$(selector);
let reactProps = await extractProps(elementHandle, 1)
let prop1 = await reactProps.getProperty("prop1")
console.log(await prop1.jsonValue())

Related

Firestore DocumentSnapshot.data() returns undefined, but in the console it definetly has and it works for other documents

In this Firebase Function I'm getting two DocumentSnapshots, the first works fine, I can get the data (emailNonce) from the db, but the second DocumentSnapshot somehow has no data, the object is there, I can see it in the logs, but calling .data() on it returns undefined:
const addRentalFct = async (data, context) => {
// this works:
const secretsRef = db.collection('user-secrets').doc('Yv3gZU8TeJTixl0njm7kUXXpvhc2');
const secretsSnap = await secretsRef.get();
const dbNonce = secretsSnap.data().emailNonce;
functions.logger.log('got the dbNonce: ', dbNonce);
// this doesn't work, but ir's the same logic as above:
const boxesSecretsRef = db.collection('box-secrets').doc('CB8lNQ8ZUnv4FDT6ZXGW');
const boxSecretsSnap = await boxesSecretsRef.get();
functions.logger.log('got the boxSecretsSnap: ', boxSecretsSnap);
functions.logger.log('got the boxSecretsSnap.data(): ', boxSecretsSnap.data());
const boxPassword = boxSecretsSnap.data().password;
functions.logger.log('the box secret is: ', boxPassword);
...
}
The DB:
box-secrets collection
user-secrets:
(the secrets are from my dev environment)
The problem was that I copied the id for the new document from an already existing document in the console like this:
Automatically there was a space added in front. When I created the new doc, the space was not visible, but I could create another doc with the same id, without the space in front. Here you see that it
s not that obvious, it looks like there are two docs with the exact same id:
When having it like this, the firebase function didn't find any of the two docs. I had delete both and readd it without space, then it worked.

How to scrape followers of instagram account with node.js, cheerio and InstAuto/Puppeteer

I'm trying to make a program that creates list of certain user follows and vice versa. After Instagram graph api shut down it became a hard task. I got to a point in which I have a correct div selected, but the javascript command just somehow doesn't work. The exact same command inserted in browser console gives a nice array, but here - undefined, no matter which metod I use: cheerio and jquery or vanilla js with document.queryAll. Can you help me out?
Code:
//scrape followers
await page.goto('https://www.instagram.com/fabiawdizlu/followers/');
await waitFor(5000);
const html2 = await page.content();
await waitFor(5000);
const $2 = cheerio.load(html2);
const followersList2 = $2('._aacl._aaco._aacw._adda._aacx._aad7._aade').eq(0).text();
console.log(followersList2);
const follow3 = page.evaluate(() => {
var f3 = document.querySelectorAll('_aacl _aaco _aacw _adda _aacx _aad7 _aade')[0];
return f3;
}).then((f3) => {
// console.log(f3.eq(1).text());
// console.log(f3.eq(2).text());
// console.log(f3.eq(3).text());
// console.log(f3.eq(4).text());
console.log(f3)
// for (let i = 0; i < 10; i++) {
// console.log(f3[i].innerText);
// }
})
This above is one of many methods I tried. For loop doesn't work, jquery's/cheerio eq(i) doesn't work (it displays user of particular id, but doesn't give me array as I want), page evaluate doesn't work. Maybe I'm doing something wrong, it's my second node project.
Thanks for your time, cheers,
Maciej
With cheerio you can't execute javascript. I think you should use playwright, this will execute javascript and load data dynamically.

page.evaluate() with document.querySelectorAll() returns undefined or empty array

I am trying to scrape the web responses from this site https://chat.kuki.ai/ using Puppeteer. I have tried using page.$eval and page.$$eval. I've also tried this, https://www.javaer101.com/en/article/17934751.html
and,
Puppeteer page.evaluate querySelectorAll return empty objects
and,
https://github.com/puppeteer/puppeteer/issues/489.
Each time, I get either an undefined object or and empty array.
My current code is:
const botResponses = await page.evaluate((sel) => {
let elements = Array.from(document.querySelectorAll(sel));
let responses = elements.map(element => {
return element.innerText;
})
return responses;
}, ".pb-chat-bubble pb-chat-bubble__bot");
The code returns an empty list. The selector in the code is a valid selector and you can check on the website to confirm. Any help is appreciated!
Have you tried using page.$$eval
const botResponses = await page.$$eval(".pb-chat-bubble pb-chat-bubble__bot",el=>el.innerText)
It does the same thing you're doing above with less code

Can't access innerText property using Puppeteer - .$$eval and .$$ is not yielding results - JavaScript

I am working on a web scraper that searches Google for certain things and then pulls text from the result page, and I am having an issue getting Puppeteer to return the text I need. What I want to return is an array of strings.
Let's say I have a couple nested divs within a div, and each has text like so:
<div class='mainDiv'>
<div>Mary Doe </div>
<div> James Dean </div>
</div>
In the DOM, I can do the following to get the result I need:
document.querySelectorAll('.mainDiv')[0].innerText.split('\n')
This yields: ["Mary Doe", "James Dean"].
I understand that Puppeteer doesn't return NodeLists, and instead it uses JSHandles, but I still can't figure out how to get any information using the prescribed methods. See below for what I have tried in Puppeteer and the corresponding console output:
In every scenario, I do await page.waitFor('selector') to start.
Scenario 1 (using .$$eval()):
const genreElements = await page.$$eval('div.mainDiv', el => el);
console.log(genreElements) // []
Scenario 2 (using evaluate):
function extractItems() {
const extractedElements = document.querySelectorAll('div.mainDiv')[0].innerText.split('\n')
return extractedElements
}
let items = await page.evaluate(extractItems)
console.log(items) // UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'innerText' of undefined
Scenario 3 (using evaluateHandle):
const selectorHandle = await page.evaluateHandle(() => document.querySelectorAll('div.mainDiv'))
const resultHandle = await page.evaluate(x => x[0], selectorHandle)
console.log(resultHandle) // undefined
Any help or guidance on how I am implementing or how to achieve what I am looking to do is much appreciated. Thank you!
Use page.$$eval() or page.evaluate():
You can use page.$$eval() or page.evaluate() to run Array.from(document.querySelectorAll()) within the page context and map() the innerText of each element to the result array:
const names_1 = await page.$$eval('.mainDiv > div', divs => divs.map(div => div.innerText));
const names_2 = await page.evaluate(() => Array.from(document.querySelectorAll('.mainDiv > div'), div => div.innerText));
Note: Keep in mind that if you use Puppeteer to automate searches on Google, you may be temporarily blocked and end up with an "Unusual traffic from your computer network" notice, requiring you to solve a reCAPTCHA. This may break your web scraper, so proceed with caution.
Try it like this:
let names = page.evaluate(() => [...document.querySelectorAll('.mainDiv div')].map(div => div.innerText))
That way you can test the whole thing in the chrome console.
Using page.$eval:
const names = await page.$eval('.mainDiv', (element) => {
return element.innerText
});
Here the element is retrieved by selector and directly passed to the function to be evaluated.
Using page.evaluate:
const namesElem = await page.$('.mainDiv');
const names = await page.evaluate(namesElem => namesElem.innerText, namesElem);
This is basically the first method split up into two steps. The interesting part is that ElementHandles can be passed as arguments in page.evaluate() and can be evaluated like JSHandles.
Note that for simplicity and clarification I used the methods for retrieving single elements. But page.$$() and page.$$eval() work the same way while selecting multiple elements and returning an array instead.

Is it possible to modify an element in the DOM with Puppeteer before creating a screenshot?

I ran into an issue where I nave a fairly simple Node process that captures a screenshot. Is it possible to change the innerText of an HTML element using Puppeteer, just before the screen capture is acquired?
I have had success with using Puppeteer to type text in authentication fields and with logging into a site, but I was wondering if there is a similar method that would let me change the text in a specific element (using id or class name).
Example of the screen capture code I'm using:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://google.com')
await page.screenshot({path: 'google.png'})
await browser.close()
})()
In this example, I would be interested in knowing if I can change the text content of an element such as the div with the ID 'lga' ... adding a text string for example.
Is that possible with Puppeteer?
Otherwise, it works great. I just need to insert some text into the page I'm performing a screenshot of. I'm using the command-line only on a Ubuntu 16.04 machine, and Node version 9, Puppeteer version 1.0.0.
you can do that before screen
await page.evaluate(() => {
let dom = document.querySelector('#id');
dom.innerHTML = "change to something"
});
page.$eval()
You can use page.$eval() to change the innerText of an element before taking a screenshot:
await page.$eval('#example', element => element.innerText = 'Hello, world!');
await page.screenshot({
path: 'google.png',
});
In addition to the excellent answer above, it is important to note that you can't access variables as you normally would expect in the evaluate function. In other words, this won't work:
const selector = '#id'
await page.evaluate(() => {
let dom = document.querySelector(selector)
dom.innerHTML = "change to something"
});
You can solve this problem by passing variables to the evaluate function. For example:
const selector = '#id'
await page.evaluate((s) => {
let dom = document.querySelector(s)
dom.innerHTML = "change to something"
}, selector);
In the above example I used s as the parameter, but passed in the value stored in selector. Those could be the same variable name, but I wanted to illustrate that the outer variable is not directly used.
If you need to pass in multiple values, use an array:
const selector = '#id'
const newInnerHTML = "change to something"
await page.evaluate(([selector, newInnerHTML]) => {
let dom = document.querySelector(selector)
dom.innerHTML = newInnerHTML
}, [selector, newInnerHTML]);

Categories