How to stay within 2 GET requests/second per seconds with Axios (Shopify API) - javascript

I have about 650 products and each product has a lot of additional information relating to it being stored in metafields. I need all the metafield info to be stored in an array so I can filter through certain bits of info and display it to the user.
In order to get all the metafiled data, you need to make one API call per product using the product id like so: /admin/products/#productid#/metafields.json
So what I have done is got all the product ids then ran a 'for in loop' and made one call at a time. The problem is I run into a '429 error' because I end up making more than 2 requests per second. Is there any way to get around this like with some sort of queuing system?
let products = []
let requestOne = `/admin/products.json?page=1&limit=250`
let requestTwo = `/admin/products.json?page=2&limit=250`
let requestThree = `/admin/products.json?page=3&limit=250`
// let allProducts will return an array with all products
let allProducts
let allMetaFields = []
let merge
$(document).ready(function () {
axios
.all([
axios.get(`${requestOne}`),
axios.get(`${requestTwo}`),
axios.get(`${requestThree}`),
])
.then(
axios.spread((firstResponse, secondResponse, thirdResponse) => {
products.push(
firstResponse.data.products,
secondResponse.data.products,
thirdResponse.data.products
)
})
)
.then(() => {
// all 3 responses into one array
allProducts = [].concat.apply([], products)
})
.then(function () {
for (const element in allProducts) {
axios
.get(
`/admin/products/${allProducts[element].id}/metafields.json`
)
.then(function (response) {
let metafieldsResponse = response.data.metafields
allMetaFields.push(metafieldsResponse)
})
}
})
.then(function () {
console.log("allProducts: " + allProducts)
console.log("allProducts: " + allMetaFields)
})
.catch((error) => console.log(error))
})

When you hit 429 error, check for Retry-After header and wait for the number of seconds specified there.
You can also use X-Shopify-Shop-Api-Call-Limit header in each response to understand how many requests left until you exceed the bucket size limit.
See more details here: REST Admin API rate limits
By the way, you're using page-based pagination which is deprecated and will become unavailable soon.
Use cursor-based pagination instead.

Related

Is it possible to loop through an API, print separate results, and then combine them into a single variable?

I’m trying to read the sentiment of multiple Reddit posts. I’ve got the idea to work using 6 API calls but I think we can refactor it to 2 calls.
The wall I’m hitting - is it possible to loop through multiple APIs (one loop for each Reddit post we’re scrapping), print the results, and then add them into a single variable?
The last part is where I’m stuck. After looping through the API, I get separate outputs for each loop and I don’t know how to add them into a single variable…
Here’s a simple version of what the code looks like:
import React, { useState, useEffect } from 'react';
function App() {
const [testRedditComments, setTestRedditComments] = useState([]);
const URLs = [
'https://www.reddit.com/r/SEO/comments/tepprk/is_ahrefs_worth_it/',
'https://www.reddit.com/r/juststart/comments/jvs0d1/is_ahrefs_worth_it_with_these_features/',
];
useEffect(() => {
URLs.forEach((URL) => {
fetch(URL + '.json').then((res) => {
res.json().then((data) => {
if (data != null) setTestRedditComments(data[1].data.children);
});
});
});
}, []);
//This below finds the reddit comments and puts them into an array
const testCommentsArr = testRedditComments.map(
(comments) => comments.data.body
);
//This below takes the reddit comments and terns them into a string.
const testCommentsArrToString = testCommentsArr.join(' ');
console.log(testCommentsArrToString);
I've tried multiple approaches to adding the strings together but I've sunk a bunch of time. Anyone know how this works. Or is there a simpler way to accomplish this?
Thanks for your time and if you need any clarification let me know.
-Josh
const URLs = [
"https://www.reddit.com/r/SEO/comments/tepprk/is_ahrefs_worth_it/",
"https://www.reddit.com/r/juststart/comments/jvs0d1/is_ahrefs_worth_it_with_these_features/",
];
Promise.all(
URLs.map(async (url) => {
const resp = await fetch(url + ".json");
return resp.json();
})
).then((res) => console.log(res));
I have used Promise.all and got the response and attached a react sandbox with the same.
Based on your requirements, you can use state value or you can prepare your api response before setting it to state.

Paginating data causing unusual behavior?

I am displaying "global posts" on one of my tabs. Currently, there are only 11 posts in the database:
In the app Some of the posts are being duplicated, and I have no idea why these SPECIFIC posts are being duplicated, as it seems to me like it is happening at random.
Here is the code for how I paginate the data.
When the component mounts, I query firestore and pull 5 posts using getCollection().
.
async componentDidMount() {
this.unsubscribe = Firebase.firestore()
.collection('globalPosts')
.orderBy("date_created", "desc")
.limit(5)
.onSnapshot(this.getCollection);
}
I get the posts successfully in getCollection(), and set an index, lastItemIndex, so I know where to query for the next posts
.
getCollection = (querySnapshot) => {
const globalPostsArray = [];
querySnapshot.forEach((res) => {
const {
..fields
} = res.data();
globalPostsArray.push({
..fields
});
});
this.setState({
globalPostsArray,
isLoading: false,
lastItemIndex: globalPostsArray.length - 1
});
}
This gets the first 5 items, no problem, ordered by date_created, descending.
If the user scrolls down the flatlist, I have logic in the flatlist to handle fetching more data:
<FlatList
data={this.state.globalPostsArray}
renderItem={renderItem}
keyExtractor={item => item.key}
contentContainerStyle={{ paddingBottom: 50 }}
showsHorizontalScrollIndicator={false}
showsVerticalScrollIndicator={false}
onRefresh={this._refresh}
refreshing={this.state.isLoading}
onEndReachedThreshold={0.5} <---------------------- Threshold
onEndReached={() => {this.getMore()}} <------------ Get more data
/>
Finally, once it is time to retrieve more data, I call this.getMore()
Here is the code to get the next 5 posts:
getMore = async() => {
const newPostsArray = [] <-------- new array for the next 5 posts
Firebase.firestore()
.collection('globalPosts')
.orderBy("date_created", "desc")
.startAfter(this.state.globalPostsArray[this.state.lastItemIndex].date_created) <--- note start after
.limit(5)
.onSnapshot(querySnapshot => {
querySnapshot.forEach((res) => {
const {
... same fields as getCollection()
} = res.data();
newPostsArray.push({
... same fields as getCollection()
});
});
this.setState({
globalPostsArray: this.state.globalPostsArray.concat(newPostsArray), <--- add to state array
lastItemIndex: this.state.globalPostsArray.length-1 <---- increment index
});
console.log(this.state.lastItemIndex) <------- I print out last item index
})
}
Some notes:
The code works fine in terms of fetching the data
The code works fine in terms of pagination, and only fetches 5 posts at a time
There is no discernible pattern I am seeing in which posts are being duplicated
I am ordering by date_created, descending when querying firestore in both getCollection() and getMore()
I console log "last item index" in my getMore(), and of course the index is higher than the number of posts
I keep getting the following warning/error, with different keys (post ID's in firestore), which shows me the duplication is happening at random, and not specific to one user. This warning/error doesn't break the application, but is telling me this weird behavior is happening:
Encountered two children with the same key, ZJu3FbhzOkXDM5mn6O6T. Keys should be unique so that components maintain their identity across updates. Non-unique keys may cause children to be duplicated and/or omitted — the behavior is unsupported and could change in a future version.
Can someone point me in the right direction, why my pagination is having such unusual behavior?
My issue was with lastItemIndex. Saving it in state was causing problems. I solved the problem by removing lastItemIndex from state, and making it a local variable in getMore():
getMore = async() => {
const newPostsArray = []
const lastItemIndex = this.state.globalPostsArray.length - 1 <---- added it here
await Firebase.firestore()
.collection('globalPosts')
.orderBy("date_created", "desc")
.startAfter(this.state.globalPostsArray[lastItemIndex].date_created)
.limit(5)
.onSnapshot(querySnapshot => {
querySnapshot.forEach((res) => {
const {
..fields
} = res.data();
newPostsArray.push({
key: res.id,
..fields
});
});
this.setState({
globalPostsArray: this.state.globalPostsArray.concat(newPostsArray)
});
})
}
You can only guarantee uniqueness on a document ID, not by the contents of a document's fields.
For example, as a workaround, you could concatenate two IDs into a single string and use it as the document ID. You would then use a transaction to ensure that a document does not already exist with that new composite ID before writing it.
In order to paginate through a list of these unique items you need to use .startAfter() and .limit(). You can then create a lastVisible variable that uses the last document in a batch to start a cursor for the next.
var lastVisible = documentSnapshots.docs[documentSnapshots.docs.length-1];
The example below can be found here if you want to read more about it:
var first = db.collection("cities")
.orderBy("population")
.limit(25);
return first.get().then(function (documentSnapshots) {
// Get the last visible document
var lastVisible = documentSnapshots.docs[documentSnapshots.docs.length-1];
console.log("last", lastVisible);
// Construct a new query starting at this document,
// get the next 25 cities.
var next = db.collection("cities")
.orderBy("population")
.startAfter(lastVisible) // <--- HERE
.limit(25);
});

Javascript: append to an array not working

I am trying to append numbers that I get from an api call (a promise) into an array. When I test the array's length it's always returning 1 as if each api call resets the array and puts in a new number.
here's the code:
The API call
wiki()
.page("COVID-19_pandemic_in_Algeria")
.then((page) => page.fullInfo())
.then((info) => {
(data.confirmed.value = info.general.confirmedCases),
(data.recovered.value = info.general.recoveryCases),
(data.deaths.value = info.general.deaths);
});
const data = {
confirmed: { value: 0 },
deaths: { value: 0 },
recovered: { value: 0 },
};
Now I want to put the deaths count into an array, so that I have a list of numbers over the next days to keep track of.
function countStats() {
const counter = [];
var deathCounter = data.deaths.value;
counter.push(deathCounter);
console.log(counter.length);
return counter;
}
countStats();
every time the functions run (wiki() and countStats()) the counter array's length is always 1. Why is that?
Unless ...
the data source provides multi-day data, or
you are going to run an extremely long javascript session (which is impractical and unsafe),
... then javascript can't, on its own, meet the objective of processing/displaying data arising from multiple days'.
Let's assume that the data source provides data that is correct for the current day.
You will need a permanent data store, in which scraped data can be accumulated, and retreived on demand. Exactly what you choose for your permanent data store is dependant on the environment in which you propose to run your javascript (essentially client-side browser or server-side NODE), and that choice is beyond the scope of this question.
Your master function might be something like this ...
function fetchCurrentDataAndRenderAll() {
return fetchCurrentData()
.then(writeToFile)
.then(readAllFromFile)
.then(data => {
// Here, you have the multi-day data that you want.
return renderData(data); // let's assume the data is to be rendered, say as a graph.
})
.catch(error => {
// something went wrong
console.log(error);
throw error;
});
}
... and the supporting functions might be something like this:
function fetchCurrentData() {
return wiki() // as given in the question ...
.page("COVID-19_pandemic_in_Algeria")
.then(page => page.fullInfo())
.then(info => ({
'timeStamp': Date.now(), // you will most likely need to timestamp the data
'confirmed': info.general.confirmedCases,
'recovered': info.general.recoveryCases,
'deaths': info.general.deaths
}));
}
function writeToFile(scrapedData) {
// you need to write this ...
// return Promise.
}
function readAllFromFile() {
// you need to write this ...
// return Promise.
}
function renderData(data) {
// you need to write this ...
// optionally: return Promise (necessary if rendering is asynchronous).
}
You can use Promise.all(). I take it that you'll not be requesting the same page 10 times but requesting a different page in each call e.g. const Pages = ['COVID-19_pandemic_in_Algeria','page2','page3','page4','page5','page6','page7','page8','page9','page10']. Then you could make the 10 calls as follows:
//const wiki = ......
const Pages = ['COVID-19_pandemic_in_Algeria','page2','page3','page4','page5','page6','page7','page8','page9','page10'];
let counter = [];
Promise.all(
Pages.map(Page => wiki().page(Page))
)
.then(results => {
for (page of results) {
let infoGeneral = page.fullInfo().general;
counter.push(infoGeneral.deaths);
}
console.log( counter.length ); //10
console.log( counter ); //[10 deaths results one for each page]
})
.catch(err => console.log(err.message));

How can I update my dictionary with nested HTTP request?

I'm gonna try to explain this as clearly as I can, but it's very confusing to me so bear with me.
For this project, I'm using Node.js with the modules Axios and Cheerio.
I am trying to fetch HTML data from a webshop (similar to Amazon/eBay), and store the product information in a dictionary. I managed to store most things (title, price, image), but the product description is on a different page. To do a request to this page, I'm using the URL I got from the first request, so they are nested.
This first part is done with the following request:
let request = axios.get(url)
.then(res => {
// This gets the HTML for every product
getProducts(res.data);
console.log("Got products in HTML");
})
.then(res => {
// This parses the product HTML into a dictionary of product items
parseProducts(productsHTML);
console.log("Generated dictionary with all the products");
})
.then(res => {
// This loops through the products to fetch and add the description
updateProducts(products);
})
.catch(e => {
console.log(e);
})
I'll also provide the way I'm creating product objects, as it might clarify the function where I think the problem occurs.
function parseProducts(html) {
for (item in productsHTML) {
// Store the data from the first request
const $ = cheerio.load(productsHTML[item]);
let product = {};
let mpUrl = $("a").attr("href");
product["title"] = $("a").attr("title");
product["mpUrl"] = mpUrl;
product["imgUrl"] = $("img").attr("src");
let priceText = $("span.subtext").text().split("\xa0")[1].replace(",", ".");
product["price"] = parseFloat(priceText);
products.push(product);
}
}
The problem resides in the updateProducts function. If I console.log the dictionary afterwards, the description is not added. I think this is because the console will log before the description gets added. This is the update function:
function updateProducts(prodDict) {
for (i in prodDict) {
let request2 = axios.get(prodDict[i]["mpUrl"])
.then(res => {
const $ = cheerio.load(res.data);
description = $("div.description p").text();
prodDict[i]["descr"] = description;
// If I console.log the product here, the description is included
})
}
// If I console.log the product here, the description is NOT included
}
I don't know what to try anymore, I guess it can be solved with something like async/await or putting timeouts on the code. Can someone please help me with updating the products properly, and adding the product descriptions? Thank you SO much in advance.
To refactor this with async/await one would do:
async function fetchAndUpdateProducts() => {
try {
const response = await axios.get(url);
getProducts(response.data);
console.log("Got products in HTML");
parseProducts(productsHTML);
console.log("Generated dictionary with all the products");
await updateProducts(products);
} catch(e) {
console.log(e);
}
}
fetchAndUpdateProducts().then(() => console.log('Done'));
and
async function updateProducts(prodDict) {
for (i in prodDict) {
const response = await axios.get(prodDict[i]["mpUrl"]);
const $ = cheerio.load(response.data);
description = $("div.description p").text();
prodDict[i]["descr"] = description;
}
}
This will not proceed to conclude the call to fetchAndUpdateProducts unless the promise returned by updateProducts has been resolved.

How many requests can Node-Express fire off at once?

I have a script that is pulling 25,000 records from AWS Athena which is basically a PrestoDB Relational SQL Database. Lets say that I'm generating a request for each one of these records, which means I have to make 25,000 requests to Athena, then when the data comes back I have to make 25,000 requests to my Redis Cluster.
What would be the ideal amount of requests to make at one time from node to Athena?
The reason I ask is because I tried to do this by creating an array of 25,000 promises and then calling Promise.all(promiseArray) on it, but the app just hanged forever.
So I decided instead to fire off 1 at a time and use recursion to splice the first index out and then pass the remaining records to the calling function after the promise has been resolved.
The problem with this is that it takes forever. I took about an hour break and came back and there were 23,000 records remaining.
I tried to google how many requests Node and Athena can handle at once, but I came up with nothing. I'm hoping someone might know something about this and be able to share it with me.
Thank you.
Here is my code just for reference:
As a sidenote, what I would like to do differently is instead of sending one request at a time I could send 4, 5, 6, 7 or 8 at a time depending on how fast it would execute.
Also, how would a Node cluster effect the performance of something like this?
exports.storeDomainTrends = () => {
return new Promise((resolve, reject)=>{
athenaClient.execute(`SELECT DISTINCT the_column from "the_db"."the_table"`,
(err, data) => {
var getAndStoreDomainData = (records) => {
if(records.length){
return new promise((resolve, reject) => {
var subrecords = records.splice(0, )[0]
athenaClient.execute(`
SELECT
field,
field,
field,
SUM(field) as field
FROM "the_db"."the_table"
WHERE the_field IN ('Month') AND the_field = '`+ record.domain_name +`'
GROUP BY the_field, the_field, the_field
`, (err, domainTrend) => {
if(err) {
console.log(err)
reject(err)
}
redisClient.set(('Some String' + domainTrend[0].domain_name), JSON.stringify(domainTrend))
resolve(domainTrend);
})
})
.then(res => {
getAndStoreDomainData(records);
})
}
}
getAndStoreDomainData(data);
})
})
}
Using the lib your code could look something like this:
const Fail = function(reason){this.reason=reason;};
const isFail = x=>(x&&x.constructor)===Fail;
const distinctDomains = () =>
new Promise(
(resolve,reject)=>
athenaClient.execute(
`SELECT DISTINCT domain_name from "endpoint_dm"."bd_mb3_global_endpoints"`,
(err,data)=>
(err)
? reject(err)
: resolve(data)
)
);
const domainDetails = domain_name =>
new Promise(
(resolve,reject)=>
athenaClient.execute(
`SELECT
timeframe_end_date,
agg_type,
domain_name,
SUM(endpoint_count) as endpoint_count
FROM "endpoint_dm"."bd_mb3_global_endpoints"
WHERE agg_type IN ('Month') AND domain_name = '${domain_name}'
GROUP BY timeframe_end_date, agg_type, domain_name`,
(err, domainTrend) =>
(err)
? reject(err)
: resolve(domainTrend)
)
);
const redisSet = keyValue =>
new Promise(
(resolve,reject)=>
redisClient.set(
keyValue,
(err,res)=>
(err)
? reject(err)
: resolve(res)
)
);
const process = batchSize => limitFn => resolveValue => domains =>
Promise.all(
domains.slice(0,batchSize)
.map(//map domains to promises
domain=>
//maximum 5 active connections
limitFn(domainName=>domainDetails(domainName))(domain.domain_name)
.then(
domainTrend=>
//the redis client documentation makes no sense whatsoever
//https://redis.io/commands/set
//no mention of a callback
//https://github.com/NodeRedis/node_redis
//mentions a callback, since we need the return value
//and best to do it async we will use callback to promise
redisSet([
`Endpoint Profiles - Checkin Trend by Domain - Monthly - ${domainTrend[0].domain_name}`,
JSON.stringify(domainTrend)
])
)
.then(
redisReply=>{
//here is where things get unpredictable, set is documented as
// a synchronous function returning "OK" or a function that
// takes a callback but no mention of what that callback recieves
// as response, you should try with one or two records to
// finish this on reverse engineering because documentation
// fails 100% here and can not be relied uppon.
console.log("bad documentation of redis client... reply is:",redisReply);
(redisReply==="OK")
? domain
: Promise.reject(`Redis reply not OK:${redisReply}`)
}
)
.catch(//catch failed, save error and domain of failed item
e=>
new Fail([e,domain])
)
)
).then(
results=>{
console.log(`got ${batchSize} results`);
const left = domains.slice(batchSize);
if(left.length===0){//nothing left
return resolveValue.conat(results);
}
//recursively call process untill done
return process(batchSize)(limitFn)(resolveValue.concat(results))(left)
}
);
const max5 = lib.throttle(5);//max 5 active connections to athena
distinctDomains()//you may want to limit the results to 50 for testing
//you may want to limit batch size to 10 for testing
.then(process(1000)(max5)([]))//we have 25000 domains here
.then(
results=>{//have 25000 results
const successes = results.filter(x=>!isFail(x));
//array of failed items, a failed item has a .reason property
// that is an array of 2 items: [the error, domain]
const failed = results.filter(isFail);
}
)
You should figure out what redis client does, I tried to figure it out using the documentation but may as well ask my goldfish. Once you've reverse engineered the client behavior it is best to try with small batch size to see if there are any errors. You have to import lib to use it, you can find it here.
I was able to take what Kevin B said to find a much quicker way to query the data. What I did was change the query so that I could get the trend for all domains from Athena. I ordered it by domain_name and then sent it as a Node stream so that I could separate out each domain name into it's own JSON as the data was coming in.
Anyways this is what I ended up with.
exports.storeDomainTrends = () => {
return new Promise((resolve, reject)=>{
var streamObj = athenaClient.execute(`
SELECT field,
field,
field,
SUM(field) AS field
FROM "db"."table"
WHERE field IN ('Month')
GROUP BY field, field, field
ORDER BY field desc`).toStream();
var data = [];
streamObj.on('data', (record)=>{
if (!data.length || record.field === data[0].field){
data.push(record)
} else if (data[0].field !== record.field){
redisClient.set(('Key'), JSON.stringify(data))
data = [record]
}
})
streamObj.on('end', resolve);
streamObj.on('error', reject);
})
.then()
}

Categories