Return array with CSS query in Puppeteer - javascript

I am trying to get an array of strings from a website with puppeteer.
I have the CSS Selector td[class*="myClass"] which selects all the elements I want, and I then want to get the .innerText of every one of them.
So with that selector I know I can select 20 elements (I have tested this with <style> td[class*="myClass"] {background: red}</style>).
I am trying to get an array of their .innerText with:
console.log(await page.$eval('td[class*="myClass"]', element => element.innerText));
however this returns only the first element.
Does anyone know how I can select all the 20 elements and not only the first one?
Thank you!

await page.$$eval('td[class*="good_to_col"]', element => element.innerText);
An array doesn't have innerText property. If you want to return an array of 20 innerTexts, you need to map it:
await page.$$eval('td[class*="good_to_col"]', elements => elements.map(
element => element.innerText)
);
It's mentioned in the docs.

For convenience/semantic reasons, Puppeteer provides you with two different eval functions.
page.$eval() runs document.querySelector and, thus, only passes the first element it finds to your pageFunction.
page.$$eval() internally runs document.querySelectorAll and, thus, passes multiple elements to your pageFunction and returns an array.
A word of caution:
Some people may wrongfully assume, the second argument passed to $$eval is an iterator function that is invoked for each result of the css-selector. However, a bit counter-intuitively, $$eval passes an array as argument to your function, so any mapping needs to be done on this array. So referring to the OP, instead of page.$$eval('td[class*="good_to_col"]', element => element.innerText) use page.$$eval('td[class*="good_to_col"]', elements => elements.map(e => e.innerText)) and it shall work.

Related

addEventListener function not working in javascript

I am trying to check if a function that is meant to be triggered by a click with a console.log but the console message never comes up
<script src="e-com.js" async></script>
this is how i linked the js file in the head
Remove
this is the link I want the event on
let removeItem=document.getElementById("remove")
for (i=0; i<removeItem.length; i++){
let remove = removeItem.addEventListener("click", function(){
console.log("Clicked");
})
}
This is the js function
The issue seems to be that you are trying to loop over the result of getElementById(), which doesn't return an iterable.
To fix it, you just need to remove the loop. The code below should work correctly^.
const removeItem = document.getElementById("remove");
removeItem.addEventListener("click", () => {
console.log("Clicked!");
});
Remove
According to MDN Web Docs:
The Document method getElementById() returns an Element object representing the element whose id property matches the specified string.
As it states, getElementById() returns an Element, which is just an object, in short. This is why you cannot iterate over it.
If you wanted to listen to multiple objects with a for loop, you need to change a few things.
First, you can't use an id, as the id attribute is unique, and can only be used once in a HTML document. Therefore, to get multiple elements, you need to use the class attribute. See an example of the class attribute in use below.
<div class="division">Division!</div>
The class attribute can be used by any HTML element.
So, to get all of the classes and iterate over them, you need to use either the getElementsByClassName() method, or the querySelectorAll() method.
The only difference between the two is that getElementsByClassName() returns a live HTMLCollection, while querySelectorAll() returns a static HTMLCollection.
If you were to use querySelectorAll(), your code would look like this^.
const removeItem = document.querySelectorAll("remove");
Array.from(removeItem).forEach((ele) => {
ele.addEventListener("click", () => {
console.log("Clicked!");
});
});
Remove
Remove
Remove
Both of these solutions should work correctly, but they depend on what you need to accomplish.
^ The code has been modified slightly.
.getElementById() doesn't return an array since you're not allowed to have more than one element with a single id.
Therefore, the for loop failed.
Try
let removeItem=document.getElementById("remove")
removeItem.addEventListener("click", function(){
console.log("Clicked");
})

How to find descendant elements of a selected element or ElementHandle?

In other browser automation frameworks there tends to be a "find" method that allows the user to find all decedents of a given element that match the selector for example:
https://www.w3schools.com/jquery/jquery_traversing_descendants.asp
$(document).ready(function(){
$("div").find("span");
});
The above method returns all elements that match span descending from the given div.
If I have an ElementHandle, is there a way I could find all dependents that match a given selector using puppeteer?
Yes, you can use the elementHandle.$ function. Quote from the docs:
The method runs element.querySelector within the page. If no element matches the selector, the return value resolves to null.
Code sample:
const elementHandle = await page.$('div');
const elementInsideHandle = await elementHandle.$('span');
If you want to query multiple elements, there is also the $$ function to run element.querySelectorAll inside the page.

Puppeteer - using querySelectorAll() to access elements in a dynamic HTML environment

Searching the documentation for querySelectorAll() i got this:
A NodeList object, representing all elements in the document that
matches the specified CSS selector(s). The NodeList is a static
collection, meaning that changes in the DOM has NO effect in the
collection. Throws a SYNTAX_ERR exception if the selector(s) is
invalid.
What if you delete some elements. Then new elements appear (with the same class name as the old ones) due to dynamic html.
But now you want to access the new ones.
Will i be able to rerun the querySelectorAll()? Or the old elements will be in the array?
You will of course be able to rerun querySelectorAll() and each time it will return the elements currently corresponding to the query — the important thing is you will have to rerun it to get new elements.
An example:
(async() => {
// ... usual create browser and page stuff
var items = [];
while(items = await page.$$eval('p', pp => pp.map( p => p.textContent ) ))
{
console.log(items);
await page.waitFor(1000);
}
})()
page.$$eval runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.
The result:
Events can only be put on to existing Elements, and don't exist when the Elements don't, so you have to assign Events to the Elements after you make them. A good solution to avoid the reassignment, is to use jQuery's .on(). Either that, or make a function that you just run again.

Why is [0] needed for getElementsByClassName to work when there's only one class to select?

I tried using let modal = document.getElementsByClassName('modal') to select an element with the class modal. It only worked after using node selection to select the first result: let modal = document.getElementsByClassName('modal')[0]. I know the method Document.getElementsByClassName() returns child elements which have all of the given class names, but there's only one element in my HTML with that class. I confirmed this in my browser's dev tools by using var x = document.getElementsByClassName('modal').length and logging the value of x to the console (it returned 1 as expected).
Could someone explain why node selection is needed in this case?
Edit: My question is different than the one marked as a duplicate. In that question, they are asking the difference between methods than return a single element and those that return an array-like collection of elements. I'm already aware getElementsByClassName returns an array-like collection of elements, whereas the other methods return one element. My question is why do you need to specify the index in a case where all elements of a class are returned but there's only one element with a class (so one item, the correct item, is returned).
document.getElementsByClassName will return a list of elements with the given class name. Even if there is only one element with that class name it will be in a Node List which is why you have to use the [0]
It is needed because getElementsByClassName Returns an HTMLCollection and not a single element.
To get the item without using [0], use a query selector instead, this will give you the item instead of a collection of items.
let modal = document.querySelector('.modal')
console.log(modal)
document.getElementsByClassName
will return array of element who has this class

What does this code using [].filter.call do?

I’m learning javascript and trying to write code that sorts a list, removing elements if they meet certain criteria.
I found this snippet that seems promising but don't have a clue how it works so I can adapt it to my needs:
list = document.getElementById("raffles-list").children; // Get a list of all open raffles on page
list = [].filter.call(list, function(j) {
if (j.getAttribute("style") === "") {
return true;
} else {
return false;
}
});
Can you guys help me learn by explaining what this code block does?
It's getting all the children of the "raffles-list" element, then returning a filtered list of those that contain an empty "style" attribute.
The first line is pretty self-evident - it just retrieves the children from the element with id "raffles-list".
The second line is a little more complicated; it's taking advantage of two things: that [], which is an empty array, is really just an object with various methods/properties on it, and that the logic on the right hand side of the equals sign needs to be evaluated before "list" gets the new value.
Uses a blank array in order to call the "filter" method
Tells the filter to use list as the array to filter, and uses function(j) to do the filtering, where j is the item in the list being tested
If the item has a style attribute that is empty, i.e. has no style applied, it returns true.
Edit:
As per OP comment, [].filter is a prototype, so essentially an object which has various properties just like everything else. In this case filter is a method - see here. Normally you just specify an anonymous function/method that does the testing, however the author here has used the .call in order to specify an arbitrary object to do the testing on. It appears this is already built into the standard filter method, so I don't know why they did it this way.
Array like objects are some of javascript objects which are similar to arrays but with differences for example they don't implement array prototypes. If you want to achieve benefits of array over them (for example like question filter children of an element )you can do it this way:
Array.prototype.functionName.call(arrayLikeObject, [arg1, [arg2 ...]]);
Here in question array like is html element collection; and it takes items without any styling.
list is assigned a collection of elements that are children of the raffles-list element
list is then reassigned by filtering its elements as follows
an empty array is filtered by calling it with the parameter list and a callback function. The formal parameters for call are this (which is the list) and optionally further objects (in this case a callback function)
The callback function receives a formal parameter j and is called for each element
If the element's value for the style attribute is empty the element is retained in the array. Otherwise it is discarded.
At the end list should contain all elements that don't have a value for its style attribute

Categories