I'm using the following in puppeteer to try return the number in the inner text of the below element. I've tried many different ways but keep getting an empty object returned, what am I doing wrong?!
if (await page.$('.s-pagination-item.s-pagination-next.s-pagination-button.s-pagination-separator') !== null) {
var lastPageNumber = await page.evaluate(() => document.querySelector('s-pagination-item.s-pagination-disabled'), a => a.innerText);
} else {
var lastPageNumber = 1;
}
To get the innerText value of a given object you would need to do it like this:
var lastPageNumber = await page.evaluate(
() => document.querySelector('s-pagination-item.s-pagination-disabled').innerText
);
Related
I'm using puppeteer to scrape some data off of a website, but all of my selections for a certain element are always undefined.
const tempFunction = await page.evaluate(() => {
let a = document.querySelectorAll(".flex.flex-wrap.w-100.flex-grow-0.flex-shrink-0.ph2.pr0-xl.pl4-xl.mt0-xl.mt3")
let container = document.querySelector(".flex.flex-wrap.w-100.flex-grow-0.flex-shrink-0.ph2.pr0-xl.pl4-xl.mt0-xl.mt3")
let b = container.getElementsByClassName("mb1 ph1 pa0-xl bb b--near-white w-33")
return b
})
For some reason this code always returns undefined, but similar code works fine.
const checkData = await page.evaluate(() =>{
let tempArray = []
let element = document.querySelectorAll('.weather-block')
tempArray.push(element[0].innerText)
return tempArray
})
Even when trying to use specific selectors or id's, I only get undefined. Not sure where to go from here.
Hi I have exported using data (hawkers collection) using getDocs() from Firebase.
After that I put each hawker data as an object in an array called allStall as shown in the screenshot of the console log below.
Question 1 - How do I access each individual object in my allStall array. I try to use .map() to access each of it, but i am getting nothing.
Do note that I already have data inside my allStall array, see screenshot above.
[Update] map doesn't work in code below because field is stallname not stallName. However, it needs to be async + await if using/call in/from other function.
Question 2 - Why is there [[Prototype]]: Array(0) in my allStall array
export /*Soln add async*/function getAllStall(){
var allStall = [];
try
{
/*Soln add await */getDocs(collection(db, "hawkers")).then((querySnapshot) =>
{
querySnapshot.forEach((doc) =>
{
var stall = doc.data();
var name = stall.stallname;
var category = stall.category;
var description = stall.description;
var stallData = {
stallName:name,
stallCategory:category,
stallDescription:description
};
allStall.push(stallData);
});});
console.log(allStall);
//Unable to access individual object in Array of objects
allStall.map(stall =>{console.log(stall.stallName);});}
catch (e) {console.error("Error get all document: ", e);}
return allStall;
}
In my main js file, i did the following:
useEffect(/*Soln add await*/() =>
{
getAllStall();
/*Soln:replace the statement above with the code below
const allStall = await getAllStall();
allStall.map((stall)=>console.log(stall.stallname));
*/
}
);
You are getting nothing because allStall is empty since you are not waiting for the promise to be fullfilled
try this
export const getAllStall = () => getDocs(collection(db, "hawkers"))
.then((querySnapshot) =>
querySnapshot.map((doc) =>
{
const {stallName, category, description} = doc.data();
return {
stallName:name,
stallCategory:category,
stallDescription:description
};
});
)
try to change use effect like this
useEffect(async () =>
{
const allStats = await getAllStall();
console.log(allStats)
allStats.forEach(console.log)
}
);
A very big thanks to R4ncid, you have been an inspiration!
And thank you all who commented below!
I managed to get it done with async and await. Latest update, I figure out what's wrong with my previous code too. I commented the solution in my question, which is adding the async to the function and await to getDocs.
Also map doesn't work in code above because field is stallname not stallName. However, it needs to be async + await if using in/calling from other function.
Helper function
export async function getAllStall(){
const querySnapshot = await getDocs(collection(db, "hawkers"));
var allStall = [];
querySnapshot.forEach(doc =>
{
var stall = doc.data();
var name = stall.stallname;
var category = stall.category;
var description = stall.description;
var stallData = {
stallName:name,
stallCategory:category,
stallDescription:description
};
allStall.push(stall);
}
);
return allStall;
}
Main JS file
useEffect(async () =>
{
const allStall = await getAllStall();
allStall.map((stall)=>console.log(stall.stallname));
}
);
Hurray
When I execute js in CEFSharp using EvaluateScriptAsync(), I can return primitive types like string or array. For example, the following works:
var result = await Browser.EvaluateScriptAsync("Array.from(document.getElementsByTagName('input')).map(element => element.value)");
if (result.Success && result.Result != null)
{
dynamic values = result.Result;
foreach (dynamic value in values)
{
MessageBox.Show($"Value is: {value}");
}
}
But once I try to get a DOM element, either one or a list of, I get null:
var result = await Browser.EvaluateScriptAsync("Array.from(document.getElementsByTagName('input'))");
// `result.Success` is `true`, `result.Result` is `null`
I thought that CEFSharp only knows how to marshal primitive types, but object literals also work:
var result = await Browser.EvaluateScriptAsync("({ a: 1, b: 'hello' })");
if (result.Success && result.Result != null)
{
dynamic obj = result.Result;
MessageBox.Show($"{{ a: {obj.a}, b: {obj.b} }}");
}
So it turns out that CEFSharp only doesn't know how to marshal DOM objects.
Why? Is there a solution or workaround out there?
Firstly it's important to understand that Javascript is executed in the render process. The result of EvaluateScriptAsync is effectively a DTO, we create an object that represents the result of executing the script.
It's not currently possible to return a HTMLElement or any object that has a cyclic reference.
If we look at `HTMLElement as a specific example it will have a parentElement/parentNode and the parent has children which includes the node itself. You also end up walking the whole DOM tree as well.
CEF has very limited type support for it's CefV8Value type, so it's hard to do anything too fancy. See this.
We could potentially add an extension method that wraps the user script in an IIFE and does some instanceof HTMLElement style type checking to return a trimmed down representation of the HTML element. See this for an example of how I'm fudging support for returning a Promise.
As an alternative to using JavaScript you can now use CefSharp.Dom which is an asynchronous library for accessing the DOM.
It's freely available on
// Add using CefSharp.Dom to access CreateDevToolsContextAsync and related extension methods.
await using var devToolsContext = await chromiumWebBrowser.CreateDevToolsContextAsync();
// Get element by Id
// https://developer.mozilla.org/en-US/docs/Web/API/Document/querySelector
var element = await devToolsContext.QuerySelectorAsync<HtmlElement>("#myElementId");
//Strongly typed element types (this is only a subset of the types mapped)
var htmlDivElement = await devToolsContext.QuerySelectorAsync<HtmlDivElement>("#myDivElementId");
var htmlSpanElement = await devToolsContext.QuerySelectorAsync<HtmlSpanElement>("#mySpanElementId");
var htmlSelectElement = await devToolsContext.QuerySelectorAsync<HtmlSelectElement>("#mySelectElementId");
var htmlInputElement = await devToolsContext.QuerySelectorAsync<HtmlInputElement>("#myInputElementId");
var htmlFormElement = await devToolsContext.QuerySelectorAsync<HtmlFormElement>("#myFormElementId");
var htmlAnchorElement = await devToolsContext.QuerySelectorAsync<HtmlAnchorElement>("#myAnchorElementId");
var htmlImageElement = await devToolsContext.QuerySelectorAsync<HtmlImageElement>("#myImageElementId");
var htmlTextAreaElement = await devToolsContext.QuerySelectorAsync<HtmlImageElement>("#myTextAreaElementId");
var htmlButtonElement = await devToolsContext.QuerySelectorAsync<HtmlButtonElement>("#myButtonElementId");
var htmlParagraphElement = await devToolsContext.QuerySelectorAsync<HtmlParagraphElement>("#myParagraphElementId");
var htmlTableElement = await devToolsContext.QuerySelectorAsync<HtmlTableElement>("#myTableElementId");
// Get a custom attribute value
var customAttribute = await element.GetAttributeAsync<string>("data-customAttribute");
//Set innerText property for the element
await element.SetInnerTextAsync("Welcome!");
//Get innerText property for the element
var innerText = await element.GetInnerTextAsync();
//Get all child elements
var childElements = await element.QuerySelectorAllAsync("div");
//Change CSS style background colour
await element.EvaluateFunctionAsync("e => e.style.backgroundColor = 'yellow'");
//Type text in an input field
await element.TypeAsync("Welcome to my Website!");
//Click The element
await element.ClickAsync();
// Simple way of chaining method calls together when you don't need a handle to the HtmlElement
var htmlButtonElementInnerText = await devToolsContext.QuerySelectorAsync<HtmlButtonElement>("#myButtonElementId")
.AndThen(x => x.GetInnerTextAsync());
//Event Handler
//Expose a function to javascript, functions persist across navigations
//So only need to do this once
await devToolsContext.ExposeFunctionAsync("jsAlertButtonClick", () =>
{
_ = devToolsContext.EvaluateExpressionAsync("window.alert('Hello! You invoked window.alert()');");
});
var jsAlertButton = await devToolsContext.QuerySelectorAsync<HtmlButtonElement>("#jsAlertButton");
//Write up the click event listner to call our exposed function
_ = jsAlertButton.AddEventListenerAsync("click", "jsAlertButtonClick");
//Get a collection of HtmlElements
var divElements = await devToolsContext.QuerySelectorAllAsync<HtmlDivElement>("div");
foreach (var div in divElements)
{
// Get a reference to the CSSStyleDeclaration
var style = await div.GetStyleAsync();
//Set the border to 1px solid red
await style.SetPropertyAsync("border", "1px solid red", important: true);
await div.SetAttributeAsync("data-customAttribute", "123");
await div.SetInnerTextAsync("Updated Div innerText");
}
//Using standard array
var tableRows = await htmlTableElement.GetRowsAsync().ToArrayAsync();
foreach (var row in tableRows)
{
var cells = await row.GetCellsAsync().ToArrayAsync();
foreach (var cell in cells)
{
var newDiv = await devToolsContext.CreateHtmlElementAsync<HtmlDivElement>("div");
await newDiv.SetInnerTextAsync("New Div Added!");
await cell.AppendChildAsync(newDiv);
}
}
//Get a reference to the HtmlCollection and use async enumerable
//Requires Net Core 3.1 or higher
var tableRowsHtmlCollection = await htmlTableElement.GetRowsAsync();
await foreach (var row in tableRowsHtmlCollection)
{
var cells = await row.GetCellsAsync();
await foreach (var cell in cells)
{
var newDiv = await devToolsContext.CreateHtmlElementAsync<HtmlDivElement>("div");
await newDiv.SetInnerTextAsync("New Div Added!");
await cell.AppendChildAsync(newDiv);
}
}
I'm using phantom 6.0.3 to scrape a web page. Here is the initial setup:
(async function () {
const instance = await phantom.create(['--ignore-ssl-errors=yes', '--load-images=no', '--web-security=false'], {logLevel: 'error'});
const page = await instance.createPage();
await page.on('onResourceRequested', function (requestData) {
console.info('Requesting', requestData.url);
});
const url = // Some url
const status = await page.open(url);
const content = await page.evaluate(function () {
return document.querySelector('ul > li');
});
const contentLength = content.length // 5
//Code Block 2 goes here
})();
So far everything works fine. It was able to successfully determine that the length of the content is 5 (there are 5 li items). So what I want to do now is get the innerText of each of those li elements... and this is where I get my issue.
I've try using a for loop to retrieve the innerText of each li element, but it always returns null. Here's what I've tried:
//Code Block 2:
for (let i = 0; i < contentLength; i++) {
const info = await page.evaluate(function () {
const element = document.querySelector('ul > li');
return element[i].innerText;
});
console.log(info); // this returns null 5 times
}
I don't know what's going on. I can give a specific index to return, such as: return element[3].innerText, and this will give me the correct innerText, but I can't get this working via loop
PhantomJS evaluates the function in a different context so it's not aware of the parameter i.
You should pass i to the evaluate function in order to forward it to the browser process:
for (let i = 0; i < contentLength; i++) {
const info = await page.evaluate(function (index) { // notice index argument
const element = document.querySelector('ul > li');
return element[index].innerText;
}, i); // notice second argument is i
console.log(info);
}
Does anybody know how to get the innerHTML or text of an element? Or even better; how to click an element with a specific innerHTML? This is how it would work with normal JavaScript:
var found = false
$(selector).each(function() {
if (found) return;
else if ($(this).text().replace(/[^0-9]/g, '') === '5' {
$(this).trigger('click');
found = true
}
});
Thanks in advance for any help!
This is how i get innerHTML:
page.$eval(selector, (element) => {
return element.innerHTML
})
Returning innerHTML of an Element
You can use the following methods to return the innerHTML of an element:
page.$eval()
const inner_html = await page.$eval('#example', element => element.innerHTML);
page.evaluate()
const inner_html = await page.evaluate(() => document.querySelector('#example').innerHTML);
page.$() / elementHandle.getProperty() / jsHandle.jsonValue()
const element = await page.$('#example');
const element_property = await element.getProperty('innerHTML');
const inner_html = await element_property.jsonValue();
Clicking an Element with Specific innerHTML
You can use the following methods to click on an element based on the innerHTML that is contained within the element:
page.$$eval()
await page.$$eval('.example', elements => {
const element = elements.find(element => element.innerHTML === '<h1>Hello, world!</h1>');
element.click();
});
page.evaluate()
await page.evaluate(() => {
const elements = [...document.querySelectorAll('.example')];
const element = elements.find(element => element.innerHTML === '<h1>Hello, world!</h1>');
element.click();
});
page.evaluateHandle() / elementHandle.click()
const element = await page.evaluateHandle(() => {
const elements = [...document.querySelectorAll('.example')];
const element = elements.find(element => element.innerHTML === '<h1>Hello, world!</h1>');
return element;
});
await element.click();
This should work with puppeteer:)
const page = await browser.newPage();
const title = await page.evaluate(el => el.innerHTML, await page.$('h1'));
You can leverage the page.$$(selector) to get all your target elments and then use page.evaluate() to get the content(innerHTML), then apply your criteria. It should look something like:
const targetEls = await page.$$('yourFancySelector');
for(let target of targetEls){
const iHtml = await page.evaluate(el => el.innerHTML, target);
if (iHtml.replace(/[^0-9]/g, '') === '5') {
await target.click();
break;
}
}
I can never get the .innerHtml to work reliable. I always do the following:
let els = page.$$('selector');
for (let el of els) {
let content = await (await el.getProperty('textContent')).jsonValue();
}
Then you have your text in the 'content' variable.
With regard to this part of your question...
"Or even better; how to click an element with a specific innerHTML."
There are some particulars around innerHTML, innerText, and textContent that might give you grief. Which you can work-around using a sufficiently loose XPath query with Puppeteer v1.1.1.
Something like this:
const el = await page.$x('//*[text()[contains(., "search-text-here")]]');
await el[0].click({
button: 'left',
clickCount: 1,
delay: 50
});
Just keep in mind that you will get an array of ElementHandles back from that query. So... the particular item you are looking for might not be at [0] if your text isn't unique.
Options passed to .click() aren't necessary if all you need is a single left-click.
You can simply write as below. (no need await sentence in the last part)
const center = await page.$eval('h2.font-34.uppercase > strong', e => e.innerHTML);
<div id="innerHTML">Hello</div>
var myInnerHtml = document.getElementById("innerHTML").innerHTML;
console.log(myInnerHtml);