I want to run 1 thundered http requests in configurable chunks, and set configurable timeout between chunk requests. The request is based on the data provided with some.csv file.
It doesn't work because I am getting a TypeError, but when I remove () after f, it doesn't work either.
I would be very grateful for a little help. Probably the biggest problem is that I don't really understand how exactly promises work, but I tried multiple solutions and I wasn't able to achieve what I want.
The timeout feature will probably give me even more headache so I would appreciate any tips for this too.
Can you please help me to understand why it doesn't work?
Here is the snippet:
const rp = require('request-promise');
const fs = require('fs');
const { chunk } = require('lodash');
const BATCH_SIZE = 2;
const QUERY_PARAMS = ['clientId', 'time', 'changeTime', 'newValue'];
async function update(id, time, query) {
const options = {
method: 'POST',
uri: `https://requesturl/${id}?query=${query}`,
body: {
"prop": {
"time": time
}
},
headers: {
"Content-Type": "application/json"
},
json: true
}
return async () => { return await rp(options) };
}
async function batchRequestRunner(data) {
const promises = [];
for (row of data) {
row = row.split(',');
promises.push(update(row[0], row[1], QUERY_PARAMS.join(',')));
}
const batches = chunk(promises, BATCH_SIZE);
for (let batch of batches) {
try {
Promise.all(
batch.map(async f => { return await f();})
).then((resp) => console.log(resp));
} catch (e) {
console.log(e);
}
}
}
async function main() {
const input = fs.readFileSync('./input.test.csv').toString().split("\n");
const requestData = input.slice(1);
await batchRequestRunner(requestData);
}
main();
Clarification for the first comment:
I have a csv file which looks like below:
clientId,startTime
123,13:40:00
321,13:50:00
the file size is ~100k rows
the file contains information how to update time for a particular clientId in the database. I don't have an access to the database but I have access to an API which allows to update entries in the database.
I cannot make 100k calls at once, because: my network is limited (I work remotely because of coronavirus), it comsumpts a lot of memory, and API can also be limited and can crash if I will make all the requests at once.
What I want to achieve:
Load csv into memory, convert it to an Array
Handle api requests in chunks, for example take first two rows from the array, make API call based on the first two rows, wait 1000ms, take another two rows, and continue processing until the end of array (csv file)
Well, it seems like this is a somewhat classic case of where you want to process an array of values with some asynchronous operation and to avoid consuming too many resources or overwhelming the target server, you want to have no more than N requests in-flight at the same time. This is a common problem for which there are pre-built solutions for. My goto solution is a small piece of code called mapConcurrent(). It's analagous to array.map(), but it assumes a promise-returning asynchronous callback and you pass it the max number of items that should ever be in-flight at the same time. It then returns to you a promise that resolves to an array of results.
Here's mapConcurrent():
// takes an array of items and a function that returns a promise
// returns a promise that resolves to an array of results
function mapConcurrent(items, maxConcurrent, fn) {
let index = 0;
let inFlightCntr = 0;
let doneCntr = 0;
let results = new Array(items.length);
let stop = false;
return new Promise(function(resolve, reject) {
function runNext() {
let i = index;
++inFlightCntr;
fn(items[index], index++).then(function(val) {
++doneCntr;
--inFlightCntr;
results[i] = val;
run();
}, function(err) {
// set flag so we don't launch any more requests
stop = true;
reject(err);
});
}
function run() {
// launch as many as we're allowed to
while (!stop && inflightCntr < maxConcurrent && index < items.length) {
runNext();
}
// if all are done, then resolve parent promise with results
if (doneCntr === items.length) {
resolve(results);
}
}
run();
});
}
Your code can then be structured to use it like this:
function update(id, time, query) {
const options = {
method: 'POST',
uri: `https://requesturl/${id}?query=${query}`,
body: {
"prop": {
"time": time
}
},
headers: {
"Content-Type": "application/json"
},
json: true
}
return rp(options);
}
function processRow(row) {
let rowData = row.split(",");
return update(rowData[0], rowData[1], rowData[2]);
}
function main() {
const input = fs.readFileSync('./input.test.csv').toString().split("\n");
const requestData = input.slice(1);
// process this entire array with up to 5 requests "in-flight" at the same time
mapConcurrent(requestData, 5, processRow).then(results => {
console.log(results);
}).catch(err => {
console.log(err);
});
}
You can obviously adjust the number of concurrent requests to whatever number you want. I set it to 5 here in this example.
Related
There is a function:
export function getImage(requestParameters: TRequestParameters): TRequest<TResponse<ImageBitmap | HTMLImageElement>> {
const request = helper.getArrayBuffer(requestParameters);
return {
response: (async () => {
const response = await request.response;
const image = await arrayBufferToCanvasImageSource(response.data);
return {
data: image,
cacheControl: response.cacheControl,
expires: response.expires
};
})(),
cancel: request.cancel
};
}
It is synchronous, but returns an object consisting of two fields: response - a Promise, which is resolved with an object (3 fields: data, cacheControl, expires, but that's not interesing for us) and cancel - a method that cancels the request.
The function works as expected and everything about it is just fine. However, I need to implement an additional constraint. It is necessary to make sure that the number of parallel (simultaneous) requests to the network at any given point in time does not exceed n.
Thus, if n === 0, no request should be made. If n === 1, then only one image can be loaded at a time (that is, all images are loaded sequentially). For n > 1 < m, no more than m images can be loaded simultaneously.
My solution
Based on the fact that the getImage function is synchronous, the line
const request = helper.getArrayBuffer(requestParameters);
is executed immediately when getImage is called. That's not what we want though, we need to postpone the execution of the request itself. Therefore, we will replace the request variable with the requestMaker function, which we will call only when we need it:
export function getImage(requestParameters: TRequestParameters): TRequest<TResponse<ImageBitmap | HTMLImageElement>> {
if (webpSupported.supported) {
if (!requestParameters.headers) requestParameters.headers = {};
requestParameters.headers['Accept'] = 'image/webp,*/*';
}
function requestMaker() {
const request = helper.getArrayBuffer(requestParameters);
return request;
}
return {
response: (async () => {
const response = await requestMaker().response;
const image = await arrayBufferToCanvasImageSource(response.data);
return {
data: image,
cacheControl: response.cacheControl,
expires: response.expires
};
})(),
cancel() {
//
}
};
}
(Let's omit the cancel for now for the sakes of simplicity).
Now the execution of this requestMaker function, which makes the request itself, needs to be postponed until some point.
Suppose now we are trying to solve the problem only for n === 1.
Let's create an array in which we will store all requests that are currently running:
const ongoingImageRequests = [];
Now, inside requestMaker, we will save requests to this variable as soon as they occur, and delete them as soon as we receive a response:
const ongoingImageRequests = [];
export function getImage(requestParameters: TRequestParameters): TRequest<TResponse<ImageBitmap | HTMLImageElement>> {
if (webpSupported.supported) {
if (!requestParameters.headers) requestParameters.headers = {};
requestParameters.headers['Accept'] = 'image/webp,*/*';
}
function requestMaker() {
const request = helper.getArrayBuffer(requestParameters);
ongoingImageRequests.push(request);
request.response.finally(() => ongoingImageRequests.splice(ongoingImageRequests.indexOf(request), 1));
return request;
}
return {
response: (async () => {
const response = await requestMaker().response;
const image = await arrayBufferToCanvasImageSource(response.data);
return {
data: image,
cacheControl: response.cacheControl,
expires: response.expires
};
})(),
cancel() {
//
}
};
}
It's only left now to add a restriction regarding the launch of requestMaker: before starting it, we need to wait until all the requests from the array are finished:
const ongoingImageRequests = [];
export function getImage(requestParameters: TRequestParameters): TRequest<TResponse<ImageBitmap | HTMLImageElement>> {
if (webpSupported.supported) {
if (!requestParameters.headers) requestParameters.headers = {};
requestParameters.headers['Accept'] = 'image/webp,*/*';
}
function requestMaker() {
const request = helper.getArrayBuffer(requestParameters);
ongoingImageRequests.push(request);
request.response.finally(() => ongoingImageRequests.splice(ongoingImageRequests.indexOf(request), 1));
return request;
}
return {
response: (async () => {
await Promise.allSettled(ongoingImageRequests.map(ongoingImageRequest => ongoingImageRequest.response));
const response = await requestMaker().response;
const image = await arrayBufferToCanvasImageSource(response.data);
return {
data: image,
cacheControl: response.cacheControl,
expires: response.expires
};
})(),
cancel() {
//
}
};
}
I understand it this way: when getImage starts executing (it is called from somewhere outside), it immediately returns an object in which response is a Promise, which will resolve at least not before the moment when all the other requests from the queue are completed.
But, as it turns out, this solution for some reason does not work. The question is why? And how to make it work? At least for n === 1.
I know the title is quite generic but I am inserting 1 Million records into a AWS DynamoDB and currently it takes ~30 minutes to load.
I have the 1 Million records in memory, I just need to improve the speed to insert the items. AWS only allows to send batches of 25 records but I all my code in syncronous.
Usually my data has a very small amount of data in the object (e.g. like 3-5 properties with number ids)
I read the 1 million entries from a CSV and basically store it in data array
Then I do this:
await DatabaseHandler.batchWriteItems('myTable', data); // data length is 1 Million
Which calls my insert function
const documentClient = new DynamoDB.DocumentClient();
export class DatabaseHandler {
static batchWriteItems = async (tableName: string, data: {}[]) => {
// AWS only allows batches of max 25 items
while (data.length) {
const batch = data.splice(0, 25);
const putRequests = batch.map((elem => {
return {
PutRequest: {
Item: elem
}
};
});
const params = {
RequestItems: {
[tableName]: putRequests,
},
};
await documentClient.batchWrite(params).promise();
}
}
}
I believe I am doing 40,000 HTTP requests to create 25 records in the database
Is there any way to improve this? Even some ideas would be great
Your code is "blocking", in the sense that you're waiting for the previous batch to execute before executing the next one. This is not the nature of JavaScript, and you're not taking advantage of promises. Instead, you can send all your requests at once, and JS' asynchronism will kick in and do all the work for you, which will be significantly faster:
// in your class method:
const proms = []; // <-- create a promise array
while (data.length) {
const batch = data.splice(0, 25);
const putRequests = batch.map((elem => {
return {
PutRequest: {
Item: elem
}
};
});
const params = {
RequestItems: {
[tableName]: putRequests,
},
};
proms.push(documentClient.batchWrite(params).promise()); // <-- add the promise to our array
}
}
await Promise.all(proms); // <-- wait for everything to be resolved asynchronously, then be done
This will speed up your request monumentally, as long as AWS lets you send that many concurrent requests.
I'm not sure how exactly you implemented the code, but to prove that it works, here's a dummy implementation (expect to wait about a minute):
const request = (_, t = 5) => new Promise(res => setTimeout(res, t)); // implement a dummy request API
// with your approach
async function a(data) {
while(data.length) {
const batch = data.splice(0, 25);
await request(batch);
}
}
// non-blocking
async function b(data) {
const proms = [];
while(data.length) {
const batch = data.splice(0, 25);
proms.push(request(batch));
}
await Promise.all(proms);
}
(async function time(a, b) {
const data = Array(10000).fill(); // create some dummy data (10,000 instead of a million or you'll be staring at this demo for a while)
console.time("original");
await a(data);
console.timeEnd("original");
console.time("optimized");
await b(data);
console.timeEnd("optimized");
})(a, b);
Title isn't so clear but to elaborate, I need to make a HTTP request to an API endpoint, and so far I'm using a function that looks something like this:
function getPostsFromAPI(argOne, argTwo) {
const apiUrl = `https://www.exampleapi.com/v1/userposts`
apiGet(`${apiUrl}?name=argOne&something=argTwo`).then(userPosts => {
// do stuff with userPosts
return userPostData
}).catch(handleError)
}
However, the API response can include the following:
{
//...
"has_more": true,
"next_offset": 10
}
In which case, I'd need to send the API call a second time, this time with the &offset=10 argument.
The promise would need to continue making API calls until has_more: true is no longer present. My initial thought would be to just re-run getPostsFromAPI() based on an if statement from inside itself, but I can't figure out how to make that work cleanly inside a promise. Ultimately, the promise should keep making requests until the API says that it's ran out of data to give (I'll implement my own limit).
What would be the best way to achieve this?
The algorithm to achieve this is much more obvious if you use async/await. You can just create an empty array, and gradually append to it in a loop until the server indicates there are no more results.
async function getPostsFromAPI(argOne, argTwo) {
const apiUrl = `https://www.exampleapi.com/v1/userposts`
let results = [];
let offset = 0;
while (true) {
let response = await apiGet(`${apiUrl}?name=argOne&something=argTwo&offset=${offset}`);
results = results.concat(response.records);
if (response.has_more) {
offset = response.next_offset;
} else {
return results;
}
}
}
If you can't use async/await and have to stick to promises, you can use recursion to have a method invoke itself each time a response indicates there are more records:
function getPostsFromAPI(argOne, argTwo) {
return new Promise((resolve, reject) => {
const apiUrl = `https://www.exampleapi.com/v1/userposts`;
let results = [];
let offset = 0;
const getNextPage = (offset = 0) => {
apiGet(`${apiUrl}?name=argOne&something=argTwo&offset=${offset}`).then((response) => {
results = results.concat(response.records);
if (response.has_more) {
getNextPage(response.next_offset);
} else {
resolve(results);
}
}).catch(reject);
}
getNextPage(0);
});
}
Note that as a matter of general good practice you should never construct a query string through concatenation or template strings. You should use URLSearchParams.toString() to ensure your query string is properly encoded. You can do so indirectly by creating a new URL:
const url = new URL(`https://www.exampleapi.com/v1/userposts`)
url.searchParams.append("argOne", argOne);
url.searchParams.append("argTwo", argTwo);
url.searchParams.append("offset", offset);
url.toString()
This is a great use case for an async generator.
Would look something like the following
async function* getPostsFromAPI(arg1, arg2) {
const apiUrl = `https://www.exampleapi.com/v1/userposts`
let response = { next_offset: 0 };
do {
response = await apiGet(`${apiUrl}?name=${arg1}&something=${arg2}&offset=${response.next_offset}`)
response.items.forEach((item) => {
yield item
})
} while (response.has_more)
}
I am trying to run parallel requests in batches to an API using a bunch of keywords in an array. Article by Denis Fatkhudinov.
The problem I am having is that for each keyword, I need to run the request again with a different page argument for as many times as the number in the pages variable.
I keep getting Cannot read property 'then' of undefined for the return of the chainNext function.
The parallel request in batches on its own, without the for loop, works great, I am struggling to incorporate the for loop on the process.
// Parallel requests in batches
async function runBatches() {
// The keywords to request with
const keywords = ['many keyword strings here...'];
// Set max concurrent requests
const concurrent = 5;
// Clone keywords array
const keywordsClone = keywords.slice()
// Array for future resolved promises for each batch
const promises = new Array(concurrent).fill(Promise.resolve());
// Async for loop
const asyncForEach = async (pages, callback) => {
for (let page = 1; page <= pages; page++) {
await callback(page);
}
};
// Number of pages to loop for
const pages = 2;
// Recursively run batches
const chainNext = (pro) => {
// Runs itself as long as there are entries left on the array
if (keywordsClone.length) {
// Store the first entry and conviently also remove it from the array
const keyword = keywordsClone.shift();
// Run 'the promise to be' request
return pro.then(async () => {
// ---> Here was my problem, I am declaring the constant before running the for loop
const promiseOperation = await asyncForEach(pages, async (page) => {
await request(keyword, page)
});
// ---> The recursive invocation should also be inside the for loop
return chainNext(promiseOperation);
});
}
return pro;
}
return await Promise.all(promises.map(chainNext));
}
// HTTP request
async function request(keyword, page) {
try {
// request API
const res = await apiservice(keyword, page);
// Send data to an outer async function to process the data
await append(res.data);
} catch (error) {
throw new Error(error)
}
}
runBatches()
The problem is simply that pro is undefined, because you haven't initialized it.
You basically execute this code:
Promise.all(new Array(concurrent).fill(Promise.resolve().map(pro => {
// pro is undefined here because the Promise.resolve had no parameter
return pro.then(async () => {})
}));
I'm not completely sure about your idea behind that, but this is your problem in a more condensed version.
I got it working by moving actual request promiseOperation inside the for loop and returning the recursive function there too
// Recursively run batches
const chainNext = async (pro) => {
if (keywordsClone.length) {
const keyword = keywordsClone.shift()
return pro.then(async () => {
await asyncForEach(pages, (page) => {
const promiseOperation = request(keyword, page)
return chainNext(promiseOperation)
})
})
}
return pro
}
Credit for the parallel requests in batches goes to https://itnext.io/node-js-handling-asynchronous-operations-in-parallel-69679dfae3fc
Below I have a Node.js function that makes a series of requests to different urls, then for each url I use the Cheerio web scraping library to loop through elements on the dom and create a sub array. At the end of each request (after the sub array is full) I'd like to push the contents of that array to a larger array, which is outside of the request scope.
The approach I'm trying doesn't seem to be working. It looks like I don't have access to 'allPlayers' from inside the .then block.
function readPlayers(teamUrls){
const allPlayers = [];
teamUrls.forEach((teamUrl, i) => {
const options = {
gzip: true,
uri: teamUrl,
Connection: 'keep-alive',
transform: function (body) {
return cheerio.load(body);
}
};
request(options)
.then(($) => {
const team = [];
$('tbody').children('tr').each(function(j, element){
const playerName = $(element).children('td').eq(1).children('span').eq(1).find('a').text().trim();
const player = { 'playerName': playerName };
team.push(player);
});
allPlayers.push(team);
}).catch(err => console.log("error: " + err)) );
});
}
So I'm wondering the best way to re-write this code to make the requests work and populate the outer array (allPlayers) with the results.
I've looked into trying to push the entire request directly into the outer array, to no avail.
In this example I'm using request-promise to make the request.
I've looked into using Promise.map, which I think is suited for this situation. Then I would return the entire request (I think), but I don't exactly understand what I'm doing in that case.. or if it will work.
Could anyone explain the scoping in this case, why I can't do it like I'm trying.
Many thanks
You have to remember when you are using asynchronous function you cannot go back to synchronous code execution.
This is one of the methods you can do it. It will fetch all the players parallely:
async function readPlayers(teamUrls) {
const playerPromises = teamUrls.map((teamUrl, i) => {
const options = {
gzip: true,
uri: teamUrl,
Connection: 'keep-alive',
transform: function(body) {
return cheerio.load(body);
}
};
return request(options)
});
const players = await Promise.all(playerPromises);
return players.reduce((allPlayers, $) =>{
const team = [];
$('tbody').children('tr').each(function(j, element) {
const playerName = $(element).children('td').eq(1).children('span').eq(1).find('a').text().trim();
const player = { playerName: playerName };
team.push(player);
});
allPlayers.push(team);
return allPlayers;
},[])
}
And you can use it using await readPlayers(array) or readPlayers(array).then(allteamplayers=>{...})
Note: In the current code it will be a 2D array, [[{p1:p1}..], [{p2:p2}..]] etc
If you use a forEach, every callback will run asynchronously and you won't be able to await them. You could swap it to a for loop, collect your promises in an array and then await the completion of all of them:
async function readPlayers(teamUrls) {
const allPlayers = [];
const allPromises = [];
for (var i = 0; i < teamUrls.length; i++) {
var teamUrl = teamUrls[i];
const options = {
gzip: true,
uri: teamUrl,
Connection: "keep-alive",
transform: function(body) {
return cheerio.load(body);
}
};
allPromises.push(
request(options)
.then($ => {
const team = [];
$("tbody")
.children("tr")
.each(function(j, element) {
const playerName = $(element)
.children("td")
.eq(1)
.children("span")
.eq(1)
.find("a")
.text()
.trim();
const player = { playerName: playerName };
team.push(player);
});
allPlayers.push(team);
})
.catch(err => console.log("error: " + err))
);
// wait untill all the promises resolve
await Promise.all(allPromises);
console.log(allPlayers);
return allPlayers;
}
}
Then you can get all the players by awaiting your function:
var allPlayers = await readPlayers(teamUrls);