I'm trying to have my puppeteer script iterate through selectors.
The reason being - depending on what I'm querying through my script, I can get slightly different elements on the page.
Essentially I have a page.evaluate method that does the scraping like this
while (currentPage <= pagesToScrape) {
let newProducts = await page.evaluate(({identified}) => {
let results = [];
let items = document.querySelectorAll(
identified
);
console.log(items)
items.forEach((item) => {
var prod, price;
if (identified == selectors[0]) {
prod = item.querySelector("div>div>div>div>div>a>h3").innerText;
price = item.querySelector("div>div>div>div>div>div>span>span")
.innerText;
} else {
prod = item.querySelector("div>a>h4").innerText;
price = item.querySelector("div>div>div>div>span>span").innerText;
}
results.push({
Product: prod !== "" ? prod : "",
Price: price !== "" ? price : "",
});
});
console.log("results");
console.log(results.length);
return results;
});
product_GSH = product_GSH.concat(newProducts);
if (currentPage < pagesToScrape) {
console.log(identified)
await Promise.all([
await page.click(buttonSelector),
await page.waitForSelector(identified),
]);
}
Now before the script starts, I need to ensure I have the correct selector.
const selectors = ['div[class = "sh-dlr__list-result"',"div[class = 'sh-dgr__content'"]
//works
const chooseSelector = await page.waitForFunction((selectors) => {
for (const selector of selectors) {
if (document.querySelector(selector) !== null) {
return selector;
}
}
return false;
}, {}, selectors);
const identified = await chooseSelector.jsonValue();
console.log(identified)
The issue I'm having is, from within the page.evaluate, I can run the identifier easily and find the correct one to use. But I need to have it parsed at the end of the query again to scrape the next page. When I try to re-assign the variable name to the correct identifier inside the page.evaluate, it doesn't parse it.
When i run this, the code runs, but I cannot change the selector inside the promise at the bottom with page.waitfor (so it works with some pages but when it's the wrong page I can't alternate the selector being chosen). this is the full code fyi.
product_GSH = product_GSH.concat(newProducts);
if (currentPage < pagesToScrape) {
await Promise.all([
await page.click(buttonSelector),
page.waitForNavigation()
]);
}
currentPage++;
}
browser.close();
return res.send(product_GSH);
} catch (e) {
return res.send(e);
}
});
});
I'm thinking one way to solve this issue is to look at the promise.all
function and replace it with something slightly different.
Thanks for helping with this issue!
Last question if you can help - How do i make sure when I choose say 5 pages, and there are only 3 pages of results, that it sends the 3 pages. What I'm finding is that if i say there's more pages it doesn't send any response.
Ideally, I'm trying to have this code be able to iterate through different selectors. I've tried a bunch of different methods, and CORS errors and more aside, very lost. It would be good to get some sort of definite error from puppeteer as well!
Appreciate the help :)
You have to use page.waitForNavigation along with page.click(buttonSelector) promises. Also, to use Promise.all, you have to pass it actual promises and not resolved promises like you're doing:
if (currentPage < pagesToScrape) {
await Promise.all([
page.click(buttonSelector),
page.waitForNavigation()
]);
}
You can simplify selectors, for example
div[class = "sh-dlr__list-result"
can be
div.sh-dlr__list-result
The selector
const buttonSelector = "a[id='pnnext']>span[style='display:block;margin-left:53px']";
is wrong; you should never rely on style to query a selector; that can easily be dynamic changed; instead you can define it like this
const buttonSelector = "a#pnnext";
After we make these changes we'll get the proper results, for example it will output:
product_GSH.length 100
product_GSH [...]
UPDATE
If you want to handle results with less than pagesToScrape pages, then you have to look for buttonSelector before you perform a click on it like this:
if (currentPage < pagesToScrape && await page.$(buttonSelector) !== null) {
await Promise.all([
page.click(buttonSelector),
page.waitForNavigation()
]);
}
Promise.all looks like the place to solve this. I'm not the best with promise functions though
Related
I need to access an iframe in playwright that has a name that is automatically generated.
The iframe's name is always prefixed by "__privateStripeFrame" and then a randomly generated number
How i can access the frame with the page.frame({name: }) ?
From the docs it seems like i can't use a regular expression!
The frameSelector doesn't need be specified by the name.
Try an xpath with contains - this works on the W3 sample page:
await page.goto('https://www.w3schools.com/tags/tryit.asp?filename=tryhtml_iframe');
await page.frame("//iframe[contains(#title,'W3s')]");
If you want a more general approach - you also have page.frames().
That will return an array of the frames and you can iterate through and find the one you need.
This works for me:
let myFrames = page.frames();
console.log("#frames: " + myFrames.length)
myFrames.map((f) => console.log(f.name()));
(W3S is not the best demo site as there are lots of nested frames - but this outputs the top level frames that have names)
The output:
iframeResult
__tcfapiLocator
__uspapiLocator
__tcfapiLocator
__uspapiLocator
We had the issue of multiple Stripe Elements iframes loading asynchronously and very slowly, so we wound up with this workaround to retry iterating all frames and querying for the card input fields for each, until found or timed out.
Not elegant, but it worked for us.
async function findStripeElementsIframeAsync(page: Page, timeout: number) {
const startTime = Date.now();
let stripeFrame = null;
while (!stripeFrame && Date.now() - startTime < timeout) {
const stripeIframes = await page.locator('iframe[name^=__privateStripeFrame]');
const stripeIframeCount = await stripeIframes.count();
for (let i = 0; i < stripeIframeCount; i++) {
const stripeIFrameElement = await stripeIframes.nth(i).elementHandle();
if (!stripeIFrameElement)
throw 'No Stripe iframe element handle.';
const cf = await stripeIFrameElement.contentFrame();
if (!cf)
throw 'No Stripe iframe content frame.';
// Does this iframe have a CVC input? If so, it's our guy.
// 1 ms timeout did not work, because the selector requires some time to find the element.
try {
await cf.waitForSelector('input[name=cvc]', { timeout: 200 });
stripeFrame = cf;
console.log('Found Stripe iframe with CVC input');
return stripeFrame;
} catch {
// Expected for iframes without this input.
}
}
// Give some time for iframes to load before retrying.
await new Promise(resolve => setTimeout(resolve, 200));
}
return null;
}
I'm new to the "async/await" aspect of JS and I'm trying to learn how it works.
The error I'm getting is Line 10 of the following code. I have created a firestore database and am trying to listen for and get a certain document from the Collection 'rooms'. I am trying to get the data from the doc 'joiner' and use that data to update the innerHTML of other elements.
// References and Variables
const db = firebase.firestore();
const roomRef = await db.collection('rooms');
const remoteNameDOM = document.getElementById('remoteName');
const chatNameDOM = document.getElementById('title');
let remoteUser;
// Snapshot Listener
roomRef.onSnapshot(snapshot => {
snapshot.docChanges().forEach(async change => {
if (roomId != null){
if (role == "creator"){
const usersInfo = await roomRef.doc(roomId).collection('userInfo');
usersInfo.doc('joiner').get().then(async (doc) => {
remoteUser = await doc.data().joinerName;
remoteNameDOM.innerHTML = `${remoteUser} (Other)`;
chatNameDOM.innerHTML = `Chatting with ${remoteUser}`;
})
}
}
})
})
})
However, I am getting the error:
Uncaught (in promise) TypeError: Cannot read property 'joinerName' of undefined
Similarly if I change the lines 10-12 to:
remoteUser = await doc.data();
remoteNameDOM.innerHTML = `${remoteUser.joinerName} (Other)`;
chatNameDOM.innerHTML = `Chatting with ${remoteUser.joinerName}`;
I get the same error.
My current understanding is that await will wait for the line/function to finish before moving forward, and so remoteUser shouldn't be null before trying to call it. I will mention that sometimes the code works fine, and the DOM elements are updated and there are no console errors.
My questions: Am I thinking about async/await calls incorrectly? Is this not how I should be getting documents from Firestore? And most importantly, why does it seem to work only sometimes?
Edit: Here are screenshots of the Firestore database as requested by #Dharmaraj. I appreciate the advice.
You are mixing the use of async/await and then(), which is not recommended. I propose below a solution based on Promise.all() which helps understanding the different arrays that are involved in the code. You can adapt it with async/await and a for-of loop as #Dharmaraj proposed.
roomRef.onSnapshot((snapshot) => {
// snapshot.docChanges() Returns an array of the documents changes since the last snapshot.
// you may check the type of the change. I guess you maybe don’t want to treat deletions
const promises = [];
snapshot.docChanges().forEach(docChange => {
// No need to use a roomId, you get the doc via docChange.doc
// see https://firebase.google.com/docs/reference/js/firebase.firestore.DocumentChange
if (role == "creator") { // It is not clear from where you get the value of role...
const joinerRef = docChange.doc.collection('userInfo').doc('joiner');
promises.push(joinerRef.get());
}
});
Promise.all(promises)
.then(docSnapshotArray => {
// docSnapshotArray is an Array of all the docSnapshots
// corresponding to all the joiner docs corresponding to all
// the rooms that changed when the listener was triggered
docSnapshotArray.forEach(docSnapshot => {
remoteUser = docSnapshot.data().joinerName;
remoteNameDOM.innerHTML = `${remoteUser} (Other)`;
chatNameDOM.innerHTML = `Chatting with ${remoteUser}`;
})
});
});
However, what is not clear to me is how you differentiate the different elements of the "first" snapshot (i.e. roomRef.onSnapshot((snapshot) => {...}))). If several rooms change, the snapshot.docChanges() Array will contain several changes and, at the end, you will overwrite the remoteNameDOM and chatNameDOM elements in the last loop.
Or you know upfront that this "first" snapshot will ALWAYS contain a single doc (because of the architecture of your app) and then you could simplify the code by just treating the first and unique element as follows:
roomRef.onSnapshot((snapshot) => {
const roomDoc = snapshot.docChanges()[0];
// ...
});
There are few mistakes in this:
db.collection() does not return a promise and hence await is not necessary there
forEach ignores promises so you can't actually use await inside of forEach. for-of is preferred in that case.
Please try the following code:
const db = firebase.firestore();
const roomRef = db.collection('rooms');
const remoteNameDOM = document.getElementById('remoteName');
const chatNameDOM = document.getElementById('title');
let remoteUser;
// Snapshot Listener
roomRef.onSnapshot(async (snapshot) => {
for (const change of snapshot.docChanges()) {
if (roomId != null){
if (role == "creator"){
const usersInfo = roomRef.doc(roomId).collection('userInfo').doc("joiner");
usersInfo.doc('joiner').get().then(async (doc) => {
remoteUser = doc.data().joinerName;
remoteNameDOM.innerHTML = `${remoteUser} (Other)`;
chatNameDOM.innerHTML = `Chatting with ${remoteUser}`;
})
}
}
}
})
First of all, hello.
I'm relatively new to web development and Vue.js or Javascript. I'm trying to implement a system that enables users to upload and vote for pictures and videos. In general the whole system worked. But because i got all of my information from the server, the objects used to show the files + their stats wasn't reactive. I tried to change the way i change the properties of an object from "file['votes'] ) await data.data().votes" to "file.$set('votes', await data.data().votes)". However now i'm getting the TypeError: Cannot read property 'call' of undefined Error. I have no idea why this happens or what this error even means. After searching a lot on the internet i couldn't find anybody with the same problem. Something must be inheritly wrong with my approach.
If anybody can give me an explanation for what is happening or can give me a better way to handle my problem, I'd be very grateful.
Thanks in advance for anybody willing to try. Here is the Code section i changed:
async viewVideo() {
this.videoURLS = []
this.videoFiles = []
this.videoTitels = []
var storageRef = firebase.storage().ref();
var videourl = []
console.log("try")
var listRef = storageRef.child('User-Videos/');
var firstPage = await listRef.list({
maxResults: 100
});
videourl = firstPage
console.log(videourl)
if (firstPage.nextPageToken) {
var secondPage = await listRef.list({
maxResults: 100,
pageToken: firstPage.nextPageToken,
});
videourl = firstPage + secondPage
}
console.log(this.videoURLS)
if (this.videoURLS.length == 0) {
await videourl.items.map(async refImage => {
var ii = refImage.getDownloadURL()
this.videoURLS.push(ii)
})
try {
await this.videoURLS.forEach(async file => {
var fale2 = undefined
await file.then(url => {
fale2 = url.substring(url.indexOf("%") + 3)
fale2 = fale2.substring(0, fale2.indexOf("?"))
})
await db.collection("Files").doc(fale2).get().then(async data => {
file.$set('titel', await data.data().titel)
file.$set('date', await data.data().date)
if (file.$set('voted', await data.data().voted)) {
file.$set('voted', [])
}
file.$set('votes', await data.data().votes)
if (file.$set('votes', await data.data().votes)) {
file.$set('votes', 0)
}
await this.videoFiles.push(file)
this.uploadDate = data.data().date
console.log(this.videoFiles)
this.videoFiles.sort(function(a, b) {
return a.date - b.date;
})
})
})
} catch (error) {
console.log(error)
}
}
},
<script src="https://cdnjs.cloudflare.com/ajax/libs/vue/2.5.17/vue.js"></script>
firstly, file.$set('votes', await data.data().votes) is the wrong syntax to use. It should be this.$set(file, 'votes', data.data().votes). I am guessing the second data with data() returns an object with votes as a property.
Your use of await is not necessary here. await db.collection("Files").doc(fale2).get().then(async data => {....
You are already using a promise in the form of the .then block here. Async-await and the then/catch blocks are basically doing the same thing. It's one or the other.
Please check this fantastic post that covers how to deal with asynchronous code in javascript. Learning about the asynchronous nature of javascript is highly essential right now.
There's a fair bit to pick on, and for now my focus is on removing things from your code that are either redundant or may not make it work. I am not focusing on the logic. With more information, I may make necessary edits for the logic.
I will leave comments in the code, where I feel they are necessary
async viewVideo() {
this.videoURLS = []
this.videoFiles = []
this.videoTitels = []
var storageRef = firebase.storage().ref();
var videourl = '' // videourl should be initialised as a string
console.log("try")
var listRef = storageRef.child('User-Videos/');
var firstPage = listRef.list({ // the await here isn't necessary as this function isn't expected to return a promise(isn't asynchronous) to the best of my knowledge.
maxResults: 100
});
videourl = firstPage
console.log(videourl)
if (firstPage.nextPageToken) {
var secondPage = listRef.list({ // same as above
maxResults: 100,
pageToken: firstPage.nextPageToken,
});
videourl = firstPage + secondPage // videourl is a string here
}
console.log(this.videoURLS)
if (this.videoURLS.length == 0) {
videourl.items.map(async refImage => { //videourl is acting as an object here (something seems off here) - please explain what is happening here
// again await is not needed here as the map function does not return a promise
var ii = refImage.getDownloadURL()
this.videoURLS.push(ii)
})
try {
this.videoURLS.forEach(file => { // await here is not necessary as the forEach method does not return a promise
// The 'async' keyword is not necessary here. It is required to use the await keyword and due to the database call here, ordinarily it wouldn't be out of place, but you deal with that bit of asynchronous code using a `.then` block. It's `async-await` or `.then` and never both.
var fale2 = undefined
file.then(url => { // await is not necessary here as you use `.then`
// Also, does `file` return a promise? That's the only thing I can infer from `file.then`. It looks odd.
fale2 = url.substring(url.indexOf("%") + 3)
fale2 = fale2.substring(0, fale2.indexOf("?"))
})
db.collection("Files").doc(fale2).get().then(data => { // await and async not necessary due to the same reasons outlined above
this.$set(file, 'titel', data.data().titel) // correct syntax according to vue's documentation - https://vuejs.org/v2/guide/reactivity.html#Change-Detection-Caveats
this.$set(file, 'date', data.data().date)
if (this.$set(file, 'voted', data.data().voted)) { // I don't know what's going on here, I will just correct the syntax. I am not focused on the logic at this point
this.$set(file, 'voted', [])
}
this.$set(file, 'votes', data.data().votes)
if (this.$set(file, 'votes', data.data().votes)) {
this.$set(file, 'votes', 0)
}
this.videoFiles.push(file) // await not necessary here as the push method does not return a promise and also is not asynchronous
this.uploadDate = data.data().date
console.log(this.videoFiles)
this.videoFiles.sort(function(a, b) {
return a.date - b.date;
})
})
})
} catch (error) {
console.log(error)
}
}
},
Like I said at the beginning, this first attempt isn't designed to make the logic work. There's a lot going on there that I don't understand. I have focused on removing redundant code and correcting syntax errors. I may be able to look at the logic if more detail is provided.
I am trying to scrape data from a bricklet in the UI(i.e. HTML dataTable) and using a testCafe client function to do this but I haven't been successful. I have a few questions about my code and would like someone to point me in the right direction.
I first put my client function in the test file(test.js) which houses all my other test cases and called the function from one of my tests. Just like this example here: - https://devexpress.github.io/testcafe/documentation/test-api/obtaining-data-from-the-client/examples-of-using-client-functions.html check section "complex DOM queries" but testCafe gets stuck, the browser closes but the execution is stuck
Here is my client function. It is in my file that houses all my tests - test.js
fixture`Getting Started`
.page`${config.baseUrl}`;
const getTableRowValues = ClientFunction(() => {
console.log("inside client function");
const elements = document.querySelector('#bricklet_summary_studtable > tbody').querySelectorAll('tr td');
const array = [];
console.log(elements.length);
for (let i = 0; i <= elements.length; i++) {
console.log("inside for");
const customerName = elements[i].textContent;
array.push(customerName);
}
return array;
});
Here is my test case:
test('My 4th test - Check the bricklet data matches the expected data', async t => {
await t.navigateTo('https://myurl.com/app/home/students');
await page_studTest.click_studentlink();
await t
.expect(await page_studTest.exists_ListPageHeader()).ok('do something async', { allowUnawaitedPromise: true })//check the compare button does not exists
await t .navigateTo('https://myurl.com/app/home/students/application/stud/id/details/test.html')
await t
.expect(await page_studTest.getText_studHeader(t)).eql('student123',
"the header text does not match");
let arr = await getTableRowValues();
await console.log(arr);
});
I thought this will get the values from the UI in an array and I will compare it to another array of test values that I will hard code later.
At first, I tried client functions in my page class(page object model: https://devexpress.github.io/testcafe/documentation/recipes/use-page-model.html) and I put the client function in the constructor and called it from a async function in same page class and called the async function from my test.js. All my tests are structured this way but this only prints the following in the console
Valuesfunction __$$clientFunction$$() {
const testRun = builder._getTestRun();
const callsite = (0, _getCallsite.getCallsiteForMethod)(builder.callsiteNames.execution);
const args = [];
// OPTIMIZATION: don't leak `arguments` object.
for (let i = 0; i < arguments.length; i++) args.push(arguments[i]);
return builder._executeCommand(args, testRun, callsite);
}
which is not useful to debug the problem.
There are no examples on testCafe site as to how/where to put the client function when you use the page-object model. Could someone, please share some insight? I am interested in knowing the best way to structure my tests.
I didn't find any problems in your code which can make TestCafe hang. I didn't find any syntax errors or incorrect calls to TestCafe methods either. I only wish that you take note that the await keyword should not be called before console.log. Though this should not lead to any issues.
Probably the use of a custom promise with the { allowUnawaitedPromise: true } option can lead to problems, however it's difficult to determine it without the full project.
I recommend you prepare a simple project with a sample test file to demonstrate the issue and create a separate bug report in the TestCafe repository using the following form
So, finally I tried to return a promise from my client function and then it worked.
const getTableRowValues = ClientFunction(() => {
const array = [];
return new Promise(resolve => {
var elements = document.querySelectorAll('#bricklet_summary_studtable > tbody > tr > *');
elements.forEach(function (element, i) {
let text = element.textContent;
array[i] = text.trim();
});
resolve(array);
});
});
I resolve a single dimensional array as the assertion wasn't working with a 2D array in the test when I compare the result of the client function to a 2D expected value.However this serves the purpose for now.
I am working on an app that requires calls to the foursquare places api, which has a 2-calls-per-second quota. The app pulls a list of places, an then has to separately call the pictures for each place. I have attempted to do this within a forEach function, and within a For-In function. I have tried everything I can think of, and find research on, to make this work (from using setTimeout in various situations, to creating promises with timeouts included and incorporated tehm in many different ways), but I have been unable to find any solutions to assist in my particular async/await fetch situation.
To be clear - the application is operational, and my "else" statement is kicking in, but the else statement is kicking in because I am exceeding the per-second quota - so, the code is there, and working, I just want to be able to run the photos instead of the generic icons. I can get the photos to work if I wait long enough, as if the server forgets for a second. But my total daily quotas are well over anything I could ever reach in dev environment, so this has to be what is getting me in trouble!
If anyone can help, I would appreciate it greatly!
const renderVenues = (venues) => {
for(let i=0; i < $venueDivs.length; i++){
const $venue = $venueDivs[i];
const venue = venues[i];
let newUrl = `https://api.foursquare.com/v2/venues/${venue.id}/photos?client_id=${clientId}&client_secret=${clientSecret}&v=20190202`;
const getPics = async () =>{
try{
const picResp = await fetch(newUrl);
if(picResp.ok){
const picJson = await picResp.json();
const photo = picJson.response.photos.items[0];
const venueImgSrc = `${photo.prefix}300x300${photo.suffix}`;
let venueContent = `<h2>${venue.name}</h2><h4 style='padding- top:15px'>${venue.categories[0].name}</h4>
<img class="venueimage" src="${venueImgSrc}"/>
<h3 style='padding-top:5px'>Address:</h3>
<p>${venue.location.address}</p>
<p>${venue.location.city}, ${venue.location.state}</p>
<p>${venue.location.country}</p>`;
$venue.append(venueContent);
} else{
const venueIcon = venue.categories[0].icon;
const venueImgSrc = `${venueIcon.prefix}bg_64${venueIcon.suffix}`;
let venueContent = `<h2>${venue.name}</h2><h4 style='padding-top:15px'>${venue.categories[0].name}</h4>
<img class="venueimage" src="${venueImgSrc}"/>
<h3 style='padding-top:5px'>Address:</h3>
<p>${venue.location.address}</p>
<p>${venue.location.city}, ${venue.location.state}</p>
<p>${venue.location.country}</p>`;
$venue.append(venueContent);
}
}
catch(error){
console.log(error)
alert(error)
}
}
getPics();
}
$destination.append(`<h2>${venues[0].location.city}, ${venues[0].location.state}</h2>`);
}
//and then below, I execute the promise(s) that this is included with.
getVenues().then(venues =>
renderVenues(venues)
)
On each iteration, you can await a Promise that resolves after 0.6 seconds:
const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
const renderVenues = async (venues) => {
for(let i=0; i < $venueDivs.length; i++){
// ...
await getPics();
// no need for a trailing delay after all requests are complete:
if (i !== $venueDivs.length - 1) {
await delay(600);
}
}
$destination.append(...)
};
If you find yourself doing a bunch of throttling like this in your application, the module https://github.com/SGrondin/bottleneck provides a nice interface for expressing them.