Puppeteer - using querySelectorAll() to access elements in a dynamic HTML environment

Puppeteer - using querySelectorAll() to access elements in a dynamic HTML environment - javascript

Searching the documentation for querySelectorAll() i got this:
A NodeList object, representing all elements in the document that
matches the specified CSS selector(s). The NodeList is a static
collection, meaning that changes in the DOM has NO effect in the
collection. Throws a SYNTAX_ERR exception if the selector(s) is
invalid.
What if you delete some elements. Then new elements appear (with the same class name as the old ones) due to dynamic html.
But now you want to access the new ones.
Will i be able to rerun the querySelectorAll()? Or the old elements will be in the array?

You will of course be able to rerun querySelectorAll() and each time it will return the elements currently corresponding to the query — the important thing is you will have to rerun it to get new elements.
An example:
(async() => {
// ... usual create browser and page stuff
var items = [];
while(items = await page.$$eval('p', pp => pp.map( p => p.textContent ) ))
{
console.log(items);
await page.waitFor(1000);
}
})()
page.$$eval runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.
The result:

Events can only be put on to existing Elements, and don't exist when the Elements don't, so you have to assign Events to the Elements after you make them. A good solution to avoid the reassignment, is to use jQuery's .on(). Either that, or make a function that you just run again.

Related

addEventListener function not working in javascript

I am trying to check if a function that is meant to be triggered by a click with a console.log but the console message never comes up
<script src="e-com.js" async></script>
this is how i linked the js file in the head
Remove
this is the link I want the event on
let removeItem=document.getElementById("remove")
for (i=0; i<removeItem.length; i++){
let remove = removeItem.addEventListener("click", function(){
console.log("Clicked");
})
}
This is the js function

The issue seems to be that you are trying to loop over the result of getElementById(), which doesn't return an iterable.
To fix it, you just need to remove the loop. The code below should work correctly^.
const removeItem = document.getElementById("remove");
removeItem.addEventListener("click", () => {
console.log("Clicked!");
});
Remove
According to MDN Web Docs:
The Document method getElementById() returns an Element object representing the element whose id property matches the specified string.
As it states, getElementById() returns an Element, which is just an object, in short. This is why you cannot iterate over it.
If you wanted to listen to multiple objects with a for loop, you need to change a few things.
First, you can't use an id, as the id attribute is unique, and can only be used once in a HTML document. Therefore, to get multiple elements, you need to use the class attribute. See an example of the class attribute in use below.
<div class="division">Division!</div>
The class attribute can be used by any HTML element.
So, to get all of the classes and iterate over them, you need to use either the getElementsByClassName() method, or the querySelectorAll() method.
The only difference between the two is that getElementsByClassName() returns a live HTMLCollection, while querySelectorAll() returns a static HTMLCollection.
If you were to use querySelectorAll(), your code would look like this^.
const removeItem = document.querySelectorAll("remove");
Array.from(removeItem).forEach((ele) => {
ele.addEventListener("click", () => {
console.log("Clicked!");
});
});
Remove
Remove
Remove
Both of these solutions should work correctly, but they depend on what you need to accomplish.
^ The code has been modified slightly.

.getElementById() doesn't return an array since you're not allowed to have more than one element with a single id.
Therefore, the for loop failed.
Try
let removeItem=document.getElementById("remove")
removeItem.addEventListener("click", function(){
console.log("Clicked");
})

Clicking an element inside element handle

How can an element inside of existing handle be clicked?
Considering that there's a reference to foo handle:
const fooHandle = await page.$('.foo');
Currently foo selector is repeated:
page.click('.foo .bar');
I'd like to select .bar based on fooHandle reference instead of repeating .foo selector. In other places obtaining a handle with nested elements involves more complex checks that cannot be done with simple selector.
I'm using Playwright but I assume the solution is the same as for Puppeteer due to API similarity.

In Playwright, $ and $$ methods are available on ElementHandle objects too:
elementHandle.$(selector)
selector <string> - A selector to query for.
returns: <Promise<null|ElementHandle>>
The method finds an element matching the
specified selector in the ElementHandle's subtree. [...]
If no elements match the selector, returns null.
So your code can be organized like this:
const fooHandle = await page.$('.foo');
const barHandle = await fooHandle.$('.bar');
await barHandle.click();

Return array with CSS query in Puppeteer

I am trying to get an array of strings from a website with puppeteer.
I have the CSS Selector td[class*="myClass"] which selects all the elements I want, and I then want to get the .innerText of every one of them.
So with that selector I know I can select 20 elements (I have tested this with <style> td[class*="myClass"] {background: red}</style>).
I am trying to get an array of their .innerText with:
console.log(await page.$eval('td[class*="myClass"]', element => element.innerText));
however this returns only the first element.
Does anyone know how I can select all the 20 elements and not only the first one?
Thank you!

await page.$$eval('td[class*="good_to_col"]', element => element.innerText);
An array doesn't have innerText property. If you want to return an array of 20 innerTexts, you need to map it:
await page.$$eval('td[class*="good_to_col"]', elements => elements.map(
element => element.innerText)
);
It's mentioned in the docs.

For convenience/semantic reasons, Puppeteer provides you with two different eval functions.
page.$eval() runs document.querySelector and, thus, only passes the first element it finds to your pageFunction.
page.$$eval() internally runs document.querySelectorAll and, thus, passes multiple elements to your pageFunction and returns an array.
A word of caution:
Some people may wrongfully assume, the second argument passed to $$eval is an iterator function that is invoked for each result of the css-selector. However, a bit counter-intuitively, $$eval passes an array as argument to your function, so any mapping needs to be done on this array. So referring to the OP, instead of page.$$eval('td[class*="good_to_col"]', element => element.innerText) use page.$$eval('td[class*="good_to_col"]', elements => elements.map(e => e.innerText)) and it shall work.

How to find descendant elements of a selected element or ElementHandle?

In other browser automation frameworks there tends to be a "find" method that allows the user to find all decedents of a given element that match the selector for example:
https://www.w3schools.com/jquery/jquery_traversing_descendants.asp
$(document).ready(function(){
$("div").find("span");
});
The above method returns all elements that match span descending from the given div.
If I have an ElementHandle, is there a way I could find all dependents that match a given selector using puppeteer?

Yes, you can use the elementHandle.$ function. Quote from the docs:
The method runs element.querySelector within the page. If no element matches the selector, the return value resolves to null.
Code sample:
const elementHandle = await page.$('div');
const elementInsideHandle = await elementHandle.$('span');
If you want to query multiple elements, there is also the $$ function to run element.querySelectorAll inside the page.

When in JS you need to identify element by array order?

Ok so I finally have a code example to show this!
if ($('#Snowsports-row')[0].classList.contains("hidden") == false) {
$('#snowsports-only').removeClass("hidden")
}
The code works ONLY as written above, i.e., if the [0] were moved to the second line and removed from the first line, or if it were present/absent in both lines, it would fail.
I understand the output difference...
$('#Snowsports-row')
=> [<div>...]
$('#Snowsports-row')[0]
=> <div>...
...but I'm not understanding under what circumstances you're OK to get an array of element(s) and in which you need to tease out the exact element.
THANKS FOR ALL ANSWERS! Very clearly helped me to figure out that the problem may have been confusing JS/jQuery methods. Final version:
if ($('#Snowsports-row').hasClass("hidden") == false) {
$('#snowsports-only').removeClass("hidden")
}

The .classList method is not widely supported (not in MSIE 9.0 for example) so it's not portable, although where it exists it's fast.
Since every ID in a document is supposed to be unique, and since calling removeClass for a class that isn't present is harmless, just replace your entire call with:
$('#Snowsports-row').removeClass('hidden')
Or better yet, if that class means what I think it does, use .hide() and let jQuery do its job for you, potentially animation the transition in the process.
Alternatively, if you actually wanted to stick with using DOM and classList, you should use the .remove() method that classList already supports:
document.getElementById('#Snowsports-row').classList.remove('hidden')
although there's a minor disadvantage in that this code will crash if that element isn't found (since .getElementById will return null) whereas jQuery silently ignores calls made on empty selectors.
As for the meta-question - you use [n] if you want to access the single DOM element at position n within the jQuery object, as you've done when you use .classList.
You use .eq(n) to obtain a jQuery object representing that DOM element, e.g. if you want to apply jQuery methods to that (single) element.
If there's only a single element, or you want the jQuery method to apply to every matching element, just call the method directly on the selector, as I've done above.

First off, by using jQuery for what it's good at, you can replace this:
if ($('#Snowsports-row')[0].classList.contains("hidden") == false) {
$('#snowsports-only').removeClass("hidden")
}
with this:
$('#Snowsports-row').removeClass("hidden");
Your first block of code does the following:
With $('#Snowsports-row'), make a jQuery object that contains all DOM elements that match the select '#Snowsports-row'.
Then reach into the jQuery object with [0] and get the first DOM object in that jQuery object.
Then, use a property/method on that DOM element to determine if a class exists on that DOM element with your .classList.contains("hidden") reference.
Then, if you find that class, remove it.
A jQuery object contains inside it an array of DOM elements. If you call a method on the jQuery object itself like:
$('.tableRows').html("hello");
Then, you are asking jQuery to operate on ALL the DOM elements inside the jQuery object. You must use jQuery methods, not DOM methods.
If, on the other hand, you want to use a method such as .classList.contains(), that is only a method on an actual DOM element. That isn't a jQuery method. So, you have to reach inside of the jQuery object to get a specific DOM element out of it. That's what the [0] does. It reaches into the jQuery object and gets the first DOM element from its internal data structure. Once you have that DOM element, you can then use any DOM element methods on that DOM object.
FYI, if you ever want to get just the first DOM element from a jQuery object, but want the result to be a jQuery object, not just a DOM element, instead of [0], you can use .eq(0) like ths:
$('#Snowsports-row').eq(0).removeClass("hidden");
Now, in this specific case, this is never necessary because $('#Snowsports-row') cannot ever contain more than one DOM element because internally jQuery will only return the first matching DOM element when you are searching for a ID value (since there's never supposed to be more than one matching element with the same ID).
Just keep in mind that DOM element and a jQuery object are completely different types of objects with different methods on them. What makes it slightly confusing is that a jQuery object contains an internal list of DOM elements. But, if the object you are operating on is a jQuery object, then you can only call jQuery methods on it. If you reach into the jQuery object and pull out a DOM element, then you can only call DOM methods on it.

First of all, ids must be unique, so if you have more than one #Snowsports-only elements you can experience problems.
In your question, you are mixing jQuery code with pure Javascript code.
This:
if ($('#Snowsports-row')[0].classList.contains("hidden") {
...
}
Means that you get the first instance of #Snowsports-row (remember that is better if there is only one element with this id), but you get the DOM object (pure javascript) with the jQuery selector. You can do the same thing in jQuery like this:
$('#Snowsports-row').hasClass("hidden")
See more:
https://api.jquery.com/hasclass/
https://developer.mozilla.org/es/docs/Web/API/Element/classList

Sure, because you are operating over a list. Now, you're kind of mistaking the jQuery/javascript code. If you would like to use the same line twice you can basically drop jQuery altogether and write something like this:
var el = document.getElementById('Snowsports-row');
if (el.classList.contains('hidden')){
el.classList.remove('hidden');
}

In the first line you're selecting one specific DOM element, whereas in the second line you are selecting ALL elements in the DOM that fit that selector and removing the "hidden" class from all of them. Basically checking whether the element has a class can only be performed over an element (that's why you need to select the index, specifying a given element), but jQuery allows you to remove the class of every element inside a list (hence your second line)

Use jQuery's .eq() function. So:
var el = $('#Snowsports-row').eq(0);
if (el.hasClass("hidden")) {
$(el.removeClass("hidden")
}
There's also no harm in calling removeClass on an element that might not have that class... so:
$('#Snowsports-row').eq(0).removeClass('hidden');

We Keep Coding

JavaScript is the programming language of the Web.

Puppeteer - using querySelectorAll() to access elements in a dynamic HTML environment - javascript

Events can only be put on to existing Elements, and don't exist when the Elements don't, so you have to assign Events to the Elements after you make them. A good solution to avoid the reassignment, is to use jQuery's .on(). Either that, or make a function that you just run again.

Related

addEventListener function not working in javascript

Clicking an element inside element handle

Return array with CSS query in Puppeteer

How to find descendant elements of a selected element or ElementHandle?

When in JS you need to identify element by array order?

Categories

Resources