I have a puppeteer script that inputs some text into a field, submits the query, and processes the results.
Currently, the script only processes 1 search term at a time, but I need it to be able to process an array of items consecutively.
I figured I would just put the code in a loop (see code below), however, it just types in all the items from the array at once into the field and doesn't execute the code block for each search term:
for (const search of searchTerms) {
await Promise.all([
page.type('input[name="q"]', 'in:spam ' + search + String.fromCharCode(13)),
page.waitForNavigation({
waitUntil: 'networkidle2'
})
]);
const count = await page.evaluate((sel) => {
return document.querySelectorAll(sel)[1].querySelectorAll('tr').length;
}, 'table[id^=":"]');
if (count > 0) {
const more = await page.$x('//span[contains(#class, "asa") and contains(#class, "bjy")]');
await more[1].click();
await page.waitFor(1250);
const markRead = await page.$x('//div[text()="Mark all as read"]');
await markRead[0].click();
const selectAll = await page.$x('//span[#role="checkbox"]');
await selectAll[1].click();
const move = await page.$x('//div[#act="8"]');
await move[0].click();
await page.waitFor(5000);
}
}
I tried using a recursion function from Nodejs Synchronous For each loop
I also tried using a function generator with yields, as well as promises and even tried the eachSeries function from the async package from this post Nodejs Puppeteer Wait to finish all code from loop
Nothing I tried was successful. Any help would be appreciated, thanks!
There is no way to visit two websites at same time with same tab. You can try it on your browser to make sure.
Jokes aside, if you want to search multiple items, you have to create a page or tab for that.
for (const search of searchTerms) {
const newTab = await browser.newPage()
// other modified code here
}
... wait that will still search one by one. But if you use a map with concurrency limit, it will work well.
We can use p-all for this.
const pAll = require('p-all');
const actions = []
for (const search of searchTerms) {
actions.push(async()=>{
const newTab = await browser.newPage()
// other modified code here
})
}
pAll(actions, {concurrency: 2}) // <-- set how many to search at once
So we are looping thru each term, and adding a new promise on the action list. Adding functions won't take much time. And then we can run the promise chain.
You will still need to modify the code above to have what you desire.
Peace!
Related
I have two tables with users, where each id for one user is same in both tables (don't ask why I have two user tables).
At some point, I need to filter users from table 1, and if certain condition is true, I store a promise (deleting request) for each user into (let's call it) tableOnePromises. I do the same for table 2.
In order to empty table 2, I MUST first empty table one due to some requirements.
this is what I did:
let tableOnePromises = [];
let tableTwoPromises = [];
tableOne.forEach(item => {
if(item.deactivated) {
const tableOneDeleted = supabase
.from("table-one")
.delete()
.match({id: item.id});
tableOnePromises.push(tableOneDeleted);
const tableTwoDeleted = supabase
.from("table-two")
.delete()
.match({id: item.id});
tableOnePromises.push(tableTwoDeleted);
}
});
await Promise.all(tableOnePromises).then(() => {
return Promise.all(tableTwoPromises)
}).catch(err => console.log(err));
Assuming the code using await is inside an async function (or at the top level of a module), the syntax is correct, but it's probably not what I'd use (in general, avoid mixing async/await with explicit callbacks via .then and .catch), and separately it's probably not working quite as you expect (this is borne out by your saying that your code was failing to delete from table-two).
For any particular id value, your code starts deleting from table-one and then immediately starts deleting from table-two without waiting for the deletion in table-one to complete:
// STARTS the deletion but doesn't wait for it to finish
const tableOneDeleted = supabase
.from("table-one")
.delete()
.match({id: item.id});
// ...
// Starts deleting from `table-two`, even though the item may still be in `table-one`
const tableTwoDeleted = supabase
.from("table-two")
.delete()
.match({id: item.id});
Remember that a promise is just a way of observing an asynchronous process; by the time you have the promise, the process it's observing is already underway.¹ So even though you don't wait for the table-two promises until later, you start the table-two deletions immediately.
...I MUST first empty table one due to some requirements...
If by "empty" you mean just that you have to ensure you've done the delete for a particular id on table-one before doing it on table-two, you need to wait for the table-one deletion to be completed before starting the table-two deletion. I'd put that in a function:
async function deleteItem(id) {
await supabase
.from("table-one")
.delete()
.match({id});
await supabase
.from("table-two")
.delete()
.match({id});
}
Then the code becomes:
const promises = [];
for (const {deactivated, id} of tableOne) {
if (deactivated) {
promises.push(deleteItem(id));
}
}
await Promise.all(promises); // With the `try`/`catch` if desired
...or if it's okay to make two passes through the array:
await Promise.all( // With the `try`/`catch` if desired
tableOne.filter(({deactivated}) => deactivated)
.map(({id}) => deleteItem(id))
);
¹ "...by the time you have the promise, the process it's observing is already underway." That's the normal case. There is unfortunately a popular document DB library that doesn't start its work on something until/unless you call then on the promise for it. But that's an exception, and an anti-pattern.
I'm new to puppeteer and I'm trying to click on a selector from a dropdown menu the MR element here
I've tried using await page.click('.mat-option ng-star-inserted mat-active');
and also
await page.select('#mat-option-0');
here is my code, would anyone be able to help me fix this issue and understand how to resolve it in the future? I'm not to sure what methods to be using with each elelement, I think it's every time I introduce a class with spaces in the name could that be the issue?
and does anyone have any best practices for when codings things like this?
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://www.game.co.uk/en/-2640058?cm_sp=NintendoFormatHub-_-Accessories-_-espot-_-PikaCase');
await console.log('Users navigated to site :)');
await page.waitFor(2300);
await page.click('.cookiePolicy_inner--actions');
await page.waitFor(1000);
await page.click('.addToBasket');
await page.waitFor(1300);
await page.click('.secure-checkout');
await page.waitFor(2350);
await page.click('.cta-large');
await page.waitFor(1200);
await page.goto('https://checkout.game.co.uk/contact');
await page.waitFor(500);
await page.click('.mat-form-field-infix');
await page.waitForSelector('.ng-tns-c17-1 ng-trigger ng-trigger-transformPanel mat-select-panel mat-primary');
await page.click('.mat-option ng-star-inserted mat-active');
})();
There are a couple of issues with the script, let's see them:
you are using waitFor() with a number of miliseconds, this is brittle because you never know if perhaps some action will take longer, and if it does not, you will waste time; you can substitute these waits with waitForSelector(); in fact, if you use VSCode (and perhaps other IDEs), it will notify you that this method is deprecated, don't ignore these warnings:
when I use DevTools, no element is returned for .mat-option ng-star-inserted mat-active selector, but I can find the desired element with #mat-option-0 selector, or I can use the longer version, but have to use a dot (.) before each class and delete spaces between them like so .mat-option.ng-star-inserted.mat-active, you can see a CSS reference here, the point is that with spaces, it looks for descendants, which is not what you want
These two changes should give you what you need, this is a result when running on my side, you can see that Mr. has been selected:
I got there with this script:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();
await page.goto('https://www.game.co.uk/en/-2640058?cm_sp=NintendoFormatHub-_-Accessories-_-espot-_-PikaCase');
await console.log('Users navigated to site :)');
await page.waitForSelector('.cookiePolicy_inner--actions');
await page.click('.cookiePolicy_inner--actions');
await page.waitForSelector('.addToBasket');
await page.click('.addToBasket');
await page.waitForSelector('.secure-checkout');
await page.click('.secure-checkout');
await page.waitForSelector('.cta-large');
await page.click('.cta-large');
await page.goto('https://checkout.game.co.uk/contact');
await page.waitForSelector('.mat-form-field-infix');
await page.click('.mat-form-field-infix');
await page.waitForSelector('#mat-option-0');
await page.click('#mat-option-0');
})();
However, this is still not ideal because:
you handle the cookie bar with clicks, try to find a way without clicking; perhaps injecting a cookie that disables the cookie bar (if possible)
the code is one big piece that is perhaps ok for now and this example but might become unmaintainable if you keep adding lines to it; try to reuse code in functions and methods
I'm practicing with Telegram bots and puppeteer so I've decided to create a bot to order pizza from a specific website.
Once the bot has taken the order he needs to place the data he took to inputs on the page, here how it looks like:
These two fields are spans and when puppeteer clicks on the enabled one (left) he gets an input to complete. Then when the first input is done puppeteer has to do the exact same procedure with the second field: click on <span> tag, place data in input, etc.
But the thing is that there is a small-time gap between the completion of the first field and activation of the second one. My bot doesn't recognize this gap and clicks on the second field's span instantly (and of course it doesn't work).
Here's a code fragment:
await page.waitForXPath('//*[#id="select2-chosen-2"]', {visible: true})
const [secondSpan] = await page.$x('//*[#id="select2-chosen-2"]')
await secondSpan.click()
When I type node bot with this fragment I get no errors or warnings. But as I said it takes some time for the second field to activate. I've found a function to make puppeeter stop the execution of my code for a certain time period: page.waitForTimeout().
Here the example of usage in puppeteer's documentation:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
page.waitForTimeout(1000).then(() => console.log('Waited a second!'));
await browser.close();
})();
https://pptr.dev/#?product=Puppeteer&version=v10.1.0&show=api-pagewaitfortimeoutmilliseconds
Here's my case:
await page.waitForXPath('//*[#id="select2-chosen-2"]', {visible: true})
const [secondSpan] = await page.$x('//*[#id="select2-chosen-2"]')
page.waitForTimeout(1500)
await secondSpan.click()
This code also doesn't show any error, but it also doesn't click on the field. When I add await to page.waitForTimeout() I get this error:
Error: Node is either not visible or not an HTMLElement
How can I fix it?
So all I needed was to put this code:
await page.click('#s2id_home-number-modal')
Or using XPath:
const [secondSpan] = await page.$x('//*[#id="select2-chosen-2"]')
await secondSpan.click()
into .then() method, that is called after page.setTimeout(500).
All in all, it looks like this (by the way, I've changed some selectors, but it's not a big deal):
await page.waitForTimeout(500).then(async () => {
await page.click('#s2id_home-number-modal')
})
I'm new to pupetteer and I'm trying to understand how it's actually working through some examples:
So basically what I'm trying to do in this example is to extract number of views of a Youtube video. I've written a js line on the Chrome console that let me extract this information:
document.querySelector('#count > yt-view-count-renderer > span.view-count.style-scope.yt-view-count-renderer').innerText
Which worked well. However when I did the same with my pupetteer code he doesn't recognize the element I queried.
const puppeteer = require('puppeteer')
const getData = async () => {
const browser = await puppeteer.launch()
const page = await browser.newPage()
await page.goto('https://www.youtube.com/watch?v=T5GSLc-i5Xo')
await page.waitFor(1000)
const result = await page.evaluate(() => {
let views = document.querySelector('#count > yt-view-count-renderer > span.view-count.style-scope.yt-view-count-renderer').innerText
return {views}
})
browser.close()
return result
}
getData().then(value => {
console.log(value)
})
I finally did it using ytInitialData object. However I'd like to understand the reason why my first code didn't work.
Thanks
It seems that wait for 1000 is not enough.
Try your solution with https://try-puppeteer.appspot.com/ and you will see.
However if you try the following solution, you will get the correct result
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://www.youtube.com/watch?v=T5GSLc-i5Xo');
await page.waitForSelector('span.view-count');
const views = await page.evaluate(() => document.querySelector('span.view-count').textContent);
console.log('Number of views: ' + views);
await browser.close();
Do not use hand made timeout to wait a page to load, unless you are testing whether the page can only in that amount of time. Differently from selenium where sometimes you do not have a choice other than using a timeout, with puppeteer you should always find some await function you can use instead of guessing a "good" timeout. As answered by Milan Hlinák, look into the page HTML code and figure out some HTML tag you can wait on, instead of using a timeout. Usually, wait for the HTML element(s) you test require in order to work properly. On you case, the span.view-count, as already answered by Milan Hlinák:
await page.waitForSelector('span.view-count');
Background
I am writing some asynchronous code in express. In one of my end points there I need to retrieve some data from firebase for 2 seperate things.
one posts some data
the other retrieves some data to be used in a calculation and another post.
These 2 steps are not dependent on one another but obviously the end result that should be returned is (just a success message to verify that everything was posted correctly).
Example code
await postData(request);
const data = await retrieveUnrelatedData(request);
const result = calculation(data);
await postCalculatedData(result);
In the code above postData will be holding up the other steps in the process even though the other steps (retrieveUnrelatedData & postCalculatedData) do not require the awaited result of postData.
Question
Is there a more efficient way to get the retrieveUnrelatedData to fire before the full postData promise is returned?
Yes, of course! The thing you need to know is that async/await are using Promises as their underlying technology. Bearing that in mind, here's how you do it:
const myWorkload = request => Promise.all([
postData(request),
calculateData(request)
])
const calculateData = async request => {
const data = await retrieveUnrelatedData(request);
const result = calculation(data);
return await postCalculatedData(result);
}
// Not asked for, but if you had a parent handler calling these it would look like:
const mainHandler = async (req, res) => {
const [postStatus, calculatedData] = await myWorkload(req)
// respond back with whatever?
}