I need to access an iframe in playwright that has a name that is automatically generated.
The iframe's name is always prefixed by "__privateStripeFrame" and then a randomly generated number
How i can access the frame with the page.frame({name: }) ?
From the docs it seems like i can't use a regular expression!
The frameSelector doesn't need be specified by the name.
Try an xpath with contains - this works on the W3 sample page:
await page.goto('https://www.w3schools.com/tags/tryit.asp?filename=tryhtml_iframe');
await page.frame("//iframe[contains(#title,'W3s')]");
If you want a more general approach - you also have page.frames().
That will return an array of the frames and you can iterate through and find the one you need.
This works for me:
let myFrames = page.frames();
console.log("#frames: " + myFrames.length)
myFrames.map((f) => console.log(f.name()));
(W3S is not the best demo site as there are lots of nested frames - but this outputs the top level frames that have names)
The output:
iframeResult
__tcfapiLocator
__uspapiLocator
__tcfapiLocator
__uspapiLocator
We had the issue of multiple Stripe Elements iframes loading asynchronously and very slowly, so we wound up with this workaround to retry iterating all frames and querying for the card input fields for each, until found or timed out.
Not elegant, but it worked for us.
async function findStripeElementsIframeAsync(page: Page, timeout: number) {
const startTime = Date.now();
let stripeFrame = null;
while (!stripeFrame && Date.now() - startTime < timeout) {
const stripeIframes = await page.locator('iframe[name^=__privateStripeFrame]');
const stripeIframeCount = await stripeIframes.count();
for (let i = 0; i < stripeIframeCount; i++) {
const stripeIFrameElement = await stripeIframes.nth(i).elementHandle();
if (!stripeIFrameElement)
throw 'No Stripe iframe element handle.';
const cf = await stripeIFrameElement.contentFrame();
if (!cf)
throw 'No Stripe iframe content frame.';
// Does this iframe have a CVC input? If so, it's our guy.
// 1 ms timeout did not work, because the selector requires some time to find the element.
try {
await cf.waitForSelector('input[name=cvc]', { timeout: 200 });
stripeFrame = cf;
console.log('Found Stripe iframe with CVC input');
return stripeFrame;
} catch {
// Expected for iframes without this input.
}
}
// Give some time for iframes to load before retrying.
await new Promise(resolve => setTimeout(resolve, 200));
}
return null;
}
Related
I am using DiscordJS and their API has a character limit and will reject message if limit is exceeded.
Through fetchData()I am successfully building assignedPrint, which is an array of messages that I would like to send over the API.
So I already have the array ready to go but I am also using an auto update feature (courtesy of WOK) where every set amount of time, array would be flushed refilled with fresh data and sent over again for editing the original message through the message.edit() method.
It works just fine but I am forseeing that my array might get bigger over time and sending a single message may break things because of the API max character limit.
const getText = () => {
return assignedPrint
+ greengo + 'Updating in ' + counter + 's...' + greenstop;
};
const updateCounter = async (message) => {
message.edit(getText());
counter -= seconds;
if (counter <= 0) {
counter = startingCounter;
// emptying the array before fetching again for the message edit
assignedPrint = [];
await fetchData();
}
setTimeout(() => {
updateCounter(message);
}, 1000 * seconds);
};
module.exports = async (bot) => {
await fetchData();
const message = await bot.channels.cache.get(tgt).send(getText());
updateCounter(message);
};
As you can see, the hurdle is that getText()includes everything.
I tried sending one element as a time using for (cont e of assignedPrint) and it worked but how can I edit every message upon refreshing data, knowing that new data may be added or some already sent data could be removed...
The easiest of course is to do it in one single message but again, it may hit the quota and cause a crash.
Thanks.
The project aims to study a new social media:
https://booyah.live/
My needs are:
1 - Collect data from profiles that follow a specific profile.
2 - My account use this data to follow the collected profiles.
3 - Among other possible options, also unfollow the profiles I follow.
The problem found in the current script:
The profile data in theory is being collected, the script runs perfectly until the end, but for some reason I can't specify, instead of following all the collected profiles, it only follows the base profile.
For example:
I want to follow all 250 profiles that follow the ID 123456
I activate the booyahGetAccounts(123456); script
In theory the end result would be my account following 250 profiles
But the end result I end up following only the 123456 profile, so the count of people I'm following is 1
Complete Project Script:
const csrf = 'MY_CSRF_TOKEN';
async function booyahGetAccounts(uid, type = 'followers', follow = 1) {
if (typeof uid !== 'undefined' && !isNaN(uid)) {
const loggedInUserID = window.localStorage?.loggedUID;
if (uid === 0) uid = loggedInUserID;
const unfollow = follow === -1;
if (unfollow) follow = 1;
if (loggedInUserID) {
if (csrf) {
async function getUserData(uid) {
const response = await fetch(`https://booyah.live/api/v3/users/${uid}`),
data = await response.json();
return data.user;
}
const loggedInUserData = await getUserData(loggedInUserID),
targetUserData = await getUserData(uid),
followUser = uid => fetch(`https://booyah.live/api/v3/users/${loggedInUserID}/followings`, { method: (unfollow ? 'DELETE' : 'POST'), headers: { 'X-CSRF-Token': csrf }, body: JSON.stringify({ followee_uid: uid, source: 43 }) }),
logSep = (data = '', usePad = 0) => typeof data === 'string' && usePad ? console.log((data ? data + ' ' : '').padEnd(50, '━')) : console.log('━'.repeat(50),data,'━'.repeat(50));
async function getList(uid, type, follow) {
const isLoggedInUser = uid === loggedInUserID;
if (isLoggedInUser && follow && !unfollow && type === 'followings') {
follow = 0;
console.warn('You alredy follow your followings. `follow` mode switched to `false`. Followings will be retrieved instead of followed.');
}
const userData = await getUserData(uid),
totalCount = userData[type.slice(0,-1)+'_count'] || 0,
totalCountStrLength = totalCount.toString().length;
if (totalCount) {
let userIDsLength = 0;
const userIDs = [],
nickname = userData.nickname,
nicknameStr = `${nickname ? ` of ${nickname}'s ${type}` : ''}`,
alreadyFollowedStr = uid => `User ID ${uid} already followed by ${loggedInUserData.nickname} (Account #${loggedInUserID})`;
async function followerFetch(cursor = 0) {
const fetched = [];
await fetch(`https://booyah.live/api/v3/users/${uid}/${type}?cursor=${cursor}&count=100`).then(res => res.json()).then(data => {
const list = data[type.slice(0,-1)+'_list'];
if (list?.length) fetched.push(...list.map(e => e.uid));
if (fetched.length) {
userIDs.push(...fetched);
userIDsLength += fetched.length;
if (follow) followUser(uid);
console.log(`${userIDsLength.toString().padStart(totalCountStrLength)} (${(userIDsLength / totalCount * 100).toFixed(4)}%)${nicknameStr} ${follow ? 'followed' : 'retrieved'}`);
if (fetched.length === 100) {
followerFetch(data.cursor);
} else {
console.log(`END REACHED. ${userIDsLength} accounts ${follow ? 'followed' : 'retrieved'}.`);
if (!follow) logSep(targetList);
}
}
});
}
await followerFetch();
return userIDs;
} else {
console.log(`This account has no ${type}.`);
}
}
logSep(`${follow ? 'Following' : 'Retrieving'} ${targetUserData.nickname}'s ${type}`, 1);
const targetList = await getList(uid, type, follow);
} else {
console.error('Missing CSRF token. Retrieve your CSRF token from the Network tab in your inspector by clicking into the Network tab item named "bug-report-claims" and then scrolling down in the associated details window to where you see "x-csrf-token". Copy its value and store it into a variable named "csrf" which this function will reference when you execute it.');
}
} else {
console.error('You do not appear to be logged in. Please log in and try again.');
}
} else {
console.error('UID not passed. Pass the UID of the profile you are targeting to this function.');
}
}
This current question is a continuation of that answer from the link:
Collect the full list of buttons to follow without having to scroll the page (DevTools Google Chrome)
Since I can't offer more bounty on that question, I created this one to offer the new bounty to anyone who can fix the bug and make the script work.
Access account on Booyah website to use for tests:
Access by google:
User: teststackoverflowbooyah#gmail.com
Password: quartodemilha
I have to admit that it is really hard to read your code, I spent a lesser amount of time rewriting everything from scratch.
Stated that we need a code piece to be cut/pasted in the JavaScript console of web browsers able to store some data (i.e. expiration of followings and permanent followings) we need some considerations.
We can consider expiration of followings as volatile data: something that if lost can be reset to 1 day later from when we loose this data. window.localStorage is a perfect candidate to store these kind of data. If we change web browser the only drawback is that we loose the expiration of followings and we can tolerate to reset them to 1 day later from when we change browser.
While to store the list of permanent followings we need a permanent store even if we change web browser. The best idea that came to my mind is to create an alternative account with which to follow the users we never want to stop following. In my code I used uid 3186068 (a random user), once you have created your own alternative account, just replace the first line of the code block with its uid.
Another thing we need to take care is error handling: API could always have errors. The approach I chosen is to write myFetch which, in case of errors, retries twice the same call; if the error persists, probably we are facing a temporary booyah.live outage. Probably we just need to retry a bit later.
To try to provide a comfortable interface, the code blocks gathers the uid from window.location: to follow the followers of users, just cut/paste the code block on tabs opened on their profiles. For example I run the code from a tab open on https://booyah.live/studio/123456?source=44.
Last, to unfollow users the clean function is called 5 minutes later we paste the code (to not conflict with calls to follow followers) and than is executed one hour later it finishes its job. It is written to access the localStorage in an atomic way, so you can have many of them running simultaneously on different tabs of the same browser, you can not care about it. The only thing you need to take care it that when the window.location changes, all the JavaScript events in the tab are reset; so I suggest to keep a tab open on the home page, paste the code block on it, and forget about this tab; it will be the tab responsible of unfollowing users. Then open other tabs to do what you need, when you hit a user you want to follow the followers, paste the block on it, wait the job is finished and continue to use the tab normally.
// The account we use to store followings
const followingsUID = 3186068;
// Gather the loggedUID from window.localStorage
const { loggedUID } = window.localStorage;
// Gather the CSRF-Token from the cookies
const csrf = document.cookie.split("; ").reduce((ret, _) => (_.startsWith("session_key=") ? _.substr(12) : ret), null);
// APIs could have errors, let's do some retries
async function myFetch(url, options, attempt = 0) {
try {
const res = await fetch("https://booyah.live/api/v3/" + url, options);
const ret = await res.json();
return ret;
} catch(e) {
// After too many consecutive errors, let's abort: we need to retry later
if(attempt === 3) throw e;
return myFetch(url, option, attempt + 1);
}
}
function expire(uid, add = true) {
const { followingsExpire } = window.localStorage;
let expires = {};
try {
// Get and parse followingsExpire from localStorage
expires = JSON.parse(followingsExpire);
} catch(e) {
// In case of error (ex. new browsers) simply init to empty
window.localStorage.followingsExpire = "{}";
}
if(! uid) return expires;
// Set expire after 1 day
if(add) expires[uid] = new Date().getTime() + 3600 * 24 * 1000;
else delete expires[uid];
window.localStorage.followingsExpire = JSON.stringify(expires);
}
async function clean() {
try {
const expires = expire();
const now = new Date().getTime();
for(const uid in expires) {
if(expires[uid] < now) {
await followUser(parseInt(uid), false);
expire(uid, false);
}
}
} catch(e) {}
// Repeat clean in an hour
window.setTimeout(clean, 3600 * 1000);
}
async function fetchFollow(uid, type = "followers", from = 0) {
const { cursor, follower_list, following_list } = await myFetch(`users/${uid}/${type}?cursor=${from}&count=50`);
const got = (type === "followers" ? follower_list : following_list).map(_ => _.uid);
const others = cursor ? await fetchFollow(uid, type, cursor) : [];
return [...got, ...others];
}
async function followUser(uid, follow = true) {
console.log(`${follow ? "F" : "Unf"}ollowing ${uid}...`);
return myFetch(`users/${loggedUID}/followings`, {
method: follow ? "POST" : "DELETE",
headers: { "X-CSRF-Token": csrf },
body: JSON.stringify({ followee_uid: uid, source: 43 })
});
}
async function doAll() {
if(! loggedUID) throw new Error("Can't get 'loggedUID' from localStorage: try to login again");
if(! csrf) throw new Error("Can't get session token from cookies: try to login again");
console.log("Fetching current followings...");
const currentFollowings = await fetchFollow(loggedUID, "followings");
console.log("Fetching permanent followings...");
const permanentFollowings = await fetchFollow(followingsUID, "followings");
console.log("Syncing permanent followings...");
for(const uid of permanentFollowings) {
expire(uid, false);
if(currentFollowings.indexOf(uid) === -1) {
await followUser(uid);
currentFollowings.push(uid);
}
}
// Sync followingsExpire in localStorage
for(const uid of currentFollowings) if(permanentFollowings.indexOf(uid) === -1) expire(uid);
// Call first clean task in 5 minutes
window.setTimeout(clean, 300 * 1000);
// Gather uid from window.location
const match = /\/studio\/(\d+)/.exec(window.location.pathname);
if(match) {
console.log("Fetching this user followers...");
const followings = await fetchFollow(parseInt(match[1]));
for(const uid of followings) {
if(currentFollowings.indexOf(uid) === -1) {
await followUser(uid);
expire(uid);
}
}
}
return "Done";
}
await doAll();
The problem: I strongly suspect a booyah.live API bug
To test my code I run it from https://booyah.live/studio/123456?source=44.
If I run it multiple times I continue to get following output:
Fetching current followings...
Fetching permanent followings...
Syncing permanent followings...
Following 1801775...
Following 143823...
Following 137017...
Fetching this user followers...
Following 16884042...
Following 16166724...
There is bug somewhere! The expected output for subsequent executions in the same tab would be:
Fetching current followings...
Fetching permanent followings...
Syncing permanent followings...
Fetching this user followers...
After seeking the bug in my code without success, I checked booyah.live APIs: if I navigate following URLs (the uids are the ones the code continue to follow in subsequent executions)
https://booyah.live/studio/1801775
https://booyah.live/studio/143823
https://booyah.live/studio/137017
https://booyah.live/studio/16884042
https://booyah.live/studio/16166724
I can clearly see I follow them, but if I navigate https://booyah.live/following (the list of users I follow) I can't find them, neither if I scroll the page till the end.
Since I do exactly the same calls the website does, I strongly suspect the bug is in booyah.live APIs, exactly in the way they handle the cursor parameter.
I suggest you to open a support ticket to booyah.live support team. You could use the test account you provided us: I already provided you the details to do that. ;)
I'm trying to have my puppeteer script iterate through selectors.
The reason being - depending on what I'm querying through my script, I can get slightly different elements on the page.
Essentially I have a page.evaluate method that does the scraping like this
while (currentPage <= pagesToScrape) {
let newProducts = await page.evaluate(({identified}) => {
let results = [];
let items = document.querySelectorAll(
identified
);
console.log(items)
items.forEach((item) => {
var prod, price;
if (identified == selectors[0]) {
prod = item.querySelector("div>div>div>div>div>a>h3").innerText;
price = item.querySelector("div>div>div>div>div>div>span>span")
.innerText;
} else {
prod = item.querySelector("div>a>h4").innerText;
price = item.querySelector("div>div>div>div>span>span").innerText;
}
results.push({
Product: prod !== "" ? prod : "",
Price: price !== "" ? price : "",
});
});
console.log("results");
console.log(results.length);
return results;
});
product_GSH = product_GSH.concat(newProducts);
if (currentPage < pagesToScrape) {
console.log(identified)
await Promise.all([
await page.click(buttonSelector),
await page.waitForSelector(identified),
]);
}
Now before the script starts, I need to ensure I have the correct selector.
const selectors = ['div[class = "sh-dlr__list-result"',"div[class = 'sh-dgr__content'"]
//works
const chooseSelector = await page.waitForFunction((selectors) => {
for (const selector of selectors) {
if (document.querySelector(selector) !== null) {
return selector;
}
}
return false;
}, {}, selectors);
const identified = await chooseSelector.jsonValue();
console.log(identified)
The issue I'm having is, from within the page.evaluate, I can run the identifier easily and find the correct one to use. But I need to have it parsed at the end of the query again to scrape the next page. When I try to re-assign the variable name to the correct identifier inside the page.evaluate, it doesn't parse it.
When i run this, the code runs, but I cannot change the selector inside the promise at the bottom with page.waitfor (so it works with some pages but when it's the wrong page I can't alternate the selector being chosen). this is the full code fyi.
product_GSH = product_GSH.concat(newProducts);
if (currentPage < pagesToScrape) {
await Promise.all([
await page.click(buttonSelector),
page.waitForNavigation()
]);
}
currentPage++;
}
browser.close();
return res.send(product_GSH);
} catch (e) {
return res.send(e);
}
});
});
I'm thinking one way to solve this issue is to look at the promise.all
function and replace it with something slightly different.
Thanks for helping with this issue!
Last question if you can help - How do i make sure when I choose say 5 pages, and there are only 3 pages of results, that it sends the 3 pages. What I'm finding is that if i say there's more pages it doesn't send any response.
Ideally, I'm trying to have this code be able to iterate through different selectors. I've tried a bunch of different methods, and CORS errors and more aside, very lost. It would be good to get some sort of definite error from puppeteer as well!
Appreciate the help :)
You have to use page.waitForNavigation along with page.click(buttonSelector) promises. Also, to use Promise.all, you have to pass it actual promises and not resolved promises like you're doing:
if (currentPage < pagesToScrape) {
await Promise.all([
page.click(buttonSelector),
page.waitForNavigation()
]);
}
You can simplify selectors, for example
div[class = "sh-dlr__list-result"
can be
div.sh-dlr__list-result
The selector
const buttonSelector = "a[id='pnnext']>span[style='display:block;margin-left:53px']";
is wrong; you should never rely on style to query a selector; that can easily be dynamic changed; instead you can define it like this
const buttonSelector = "a#pnnext";
After we make these changes we'll get the proper results, for example it will output:
product_GSH.length 100
product_GSH [...]
UPDATE
If you want to handle results with less than pagesToScrape pages, then you have to look for buttonSelector before you perform a click on it like this:
if (currentPage < pagesToScrape && await page.$(buttonSelector) !== null) {
await Promise.all([
page.click(buttonSelector),
page.waitForNavigation()
]);
}
Promise.all looks like the place to solve this. I'm not the best with promise functions though
I am looking to:
Open a known URL (www.source.com/1 below)
scrape all URLs on that page (e.g. www.urllookingfor.com/1 to .../10) and log to console
scrape a new URL (e.g. www.source.com/2) from that page
load the next page and repeat the process X number of times
Imagine a list of 50 URLs dividend across 5 pages where you need to click the next button to move on a page.
The first two steps work fine, but I think the issue is that the nextLink isn't updated before the loop runs again. Essentially what happens is that step four gets repeated with the original URL and not the 'new' URL. The steps above are within an if loop.
I've tried using setTimeout, async...await as I think the issue is that it doesn't have time to load the 'new' URL before the next function is complete but this did not work.
If I add console.log(URL) within the if function, it will print the original URL. But when I add console.log to outside the if loop it prints the updated URL which makes me think 'nextLink' isn't updated until after the if loop.
I've also tried repeating the functions over and over (essentially a repeated if statement), but this also does not seem to update 'nextLink' before the next function runs which goes against the above.
let nextLink = www.source.com/1
//this pulls source page and scrapes required URLs
const getDatafromPage = () => {
request(nextLink, (error, response, html) => {
if((!error) && (response.statusCode == 200))
{
let $ = cheerio.load(html);
$('.class1').each((i, el) => {
let link = $(el).find('.class2').attr('href');
console.log(`${link});
})
}
})
}
//this gets the next URL
const getNextLink = () => {
request(nextLink, (error, response, html) => {
if((!error) && (response.statusCode == 200))
{
let $ = cheerio.load(html);
nextLink = $('.class3').attr('href');
}
})
}
for (let i = 0; i <= 4; i++) {
getDatafromPage();
getNextLink();
}
console.log(nextLink)
Expected results (all 50 URLs from the pages and ends by logging the last source URL)
www.urllookingfor.com/1
...
www.urllookingfor.com/50
www.source.com/5
Actual results (repeats the first page, but then logs the next page at the end):
www.urllookingfor.com/1
...
www.urllookingfor.com/10
www.urllookingfor.com/1
...
www.urllookingfor.com/10
www.source.com/2
Here's more or less what it might look like when I do it:
const doPage = async ($) => {
// do stuff here
}
;(async function(){
let response = await request(url)
let $ = cheerio.load(response)
await doPage($)
let a
// keep following next links
while(a = $('[rel=next]')[0]){
url = new URL($(a).attr('href'), url).href
response = await request(url)
$ = cheerio.load(response)
await doPage($)
}
})()
I am working on an app that requires calls to the foursquare places api, which has a 2-calls-per-second quota. The app pulls a list of places, an then has to separately call the pictures for each place. I have attempted to do this within a forEach function, and within a For-In function. I have tried everything I can think of, and find research on, to make this work (from using setTimeout in various situations, to creating promises with timeouts included and incorporated tehm in many different ways), but I have been unable to find any solutions to assist in my particular async/await fetch situation.
To be clear - the application is operational, and my "else" statement is kicking in, but the else statement is kicking in because I am exceeding the per-second quota - so, the code is there, and working, I just want to be able to run the photos instead of the generic icons. I can get the photos to work if I wait long enough, as if the server forgets for a second. But my total daily quotas are well over anything I could ever reach in dev environment, so this has to be what is getting me in trouble!
If anyone can help, I would appreciate it greatly!
const renderVenues = (venues) => {
for(let i=0; i < $venueDivs.length; i++){
const $venue = $venueDivs[i];
const venue = venues[i];
let newUrl = `https://api.foursquare.com/v2/venues/${venue.id}/photos?client_id=${clientId}&client_secret=${clientSecret}&v=20190202`;
const getPics = async () =>{
try{
const picResp = await fetch(newUrl);
if(picResp.ok){
const picJson = await picResp.json();
const photo = picJson.response.photos.items[0];
const venueImgSrc = `${photo.prefix}300x300${photo.suffix}`;
let venueContent = `<h2>${venue.name}</h2><h4 style='padding- top:15px'>${venue.categories[0].name}</h4>
<img class="venueimage" src="${venueImgSrc}"/>
<h3 style='padding-top:5px'>Address:</h3>
<p>${venue.location.address}</p>
<p>${venue.location.city}, ${venue.location.state}</p>
<p>${venue.location.country}</p>`;
$venue.append(venueContent);
} else{
const venueIcon = venue.categories[0].icon;
const venueImgSrc = `${venueIcon.prefix}bg_64${venueIcon.suffix}`;
let venueContent = `<h2>${venue.name}</h2><h4 style='padding-top:15px'>${venue.categories[0].name}</h4>
<img class="venueimage" src="${venueImgSrc}"/>
<h3 style='padding-top:5px'>Address:</h3>
<p>${venue.location.address}</p>
<p>${venue.location.city}, ${venue.location.state}</p>
<p>${venue.location.country}</p>`;
$venue.append(venueContent);
}
}
catch(error){
console.log(error)
alert(error)
}
}
getPics();
}
$destination.append(`<h2>${venues[0].location.city}, ${venues[0].location.state}</h2>`);
}
//and then below, I execute the promise(s) that this is included with.
getVenues().then(venues =>
renderVenues(venues)
)
On each iteration, you can await a Promise that resolves after 0.6 seconds:
const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
const renderVenues = async (venues) => {
for(let i=0; i < $venueDivs.length; i++){
// ...
await getPics();
// no need for a trailing delay after all requests are complete:
if (i !== $venueDivs.length - 1) {
await delay(600);
}
}
$destination.append(...)
};
If you find yourself doing a bunch of throttling like this in your application, the module https://github.com/SGrondin/bottleneck provides a nice interface for expressing them.