MongoDB Aggregation+Cursor coupled with JS Generators - javascript

It is the first time I am using Generators in JavaScript (Node.js). I just wanted someone to validate if the implementation is correct.
If not, please suggest a better way.
So, there are 2 questions here:
Is the generators based implementation optimized. And before being optimized, is it correct. (I am getting the results, however, just wanted to understand it from the blocking/non-blocking perspective).
I am making 10 API calls using node-fetch and collecting the promises in an array. If I need to deploy this on AWS Lambda, is it okay if I do not use Promise.all(array) and simply return irrespective of whether the API calls are successful or not? Anyway I do not care about the response. I just trigger the API.
Requirements:
Host a Node.js function that talks to MongoDB using Mongoose driver on AWS Lambda.
Fetch 10000 documents from MongoDB.
The documents have the following schema. The _id key holds a String value of length 163:
{
_id: "cSKhwtczH4QV7zM-43wKH:APA91bF678GW3-EEe8YGt3l1kbSpGJ286IIY2VjImfCL036rPugMkudEUPbtcQsC"
}
I am interested in the value of _ids and I am able to process just 1000 _ids at a time.
Those are put into an array and an API call is made. Hence, for 10000 _ids, I need to make 10 API calls with 10 of those arrays.
Implementation I did using Node.js Generators:
async function* generatorFunction() {
await connectMongo();
const cursor = Model.aggregate([
//... some pipeline here
]).cursor();
const GROUP_LIMIT = 1000;
let newGroup = [];
for await (const doc of cursor) {
if (newGroup.length == GROUP_LIMIT) {
yield newGroup;
newGroup = [];
}
newGroup.push(doc._id);
}
yield newGroup;
await disconnectMongo();
}
const gen = generatorFunction();
(async () => {
const promises = [];
for await (const newGroup of gen) {
// POST request using node-fetch
// Ignore the syntax
promises.push(fetch('API', { body: { newGroup } }));
}
// Do I need to do this?
// Or can this be skipped?
// I do not care about the response anyway...
// I just need to trigger the API and forget...
await Promise.all(promises);
return {
success: true,
};
})();

Related

Resolve promises together in parallel

I am trying to resolve the array of promises together. Not sure how to do it. Let me share the pseudo code for it.
async function sendNotification(user, notificationInfo) {
const options = {
method: 'POST',
url: 'http://xx.xx.xx:3000/notification/send',
headers:
{ 'Content-Type': 'application/json' },
body:
{ notificationInfo, user },
json: true,
};
console.log('sent');
return rp(options);
}
I have wrapped the sendNotification method in another method which returns the promise of rp(request-promise) module.
Next i am pushing this sendNotification method in array of promise , something like this
const notificationWorker = [];
for (const key3 in notificationObject) {
if(notificationObject[key3].users.length > 0) {
notificationWorker.push(sendNotification(notificationObject[key3].users, notificationObject[key3].payload)); // problem is notification are going as soon as i am pushing in notificationWorker array.
}
}
// task 1 - send all notifications
const result = await Promise.all(notificationWorker); // resolving all notification promises together
// task 2 - update values in db , after sending all notifications
const result2 = await Promise.all(updateWorker); // update some values in db
In above code , my problem is notifications are going as soon as i am pushing it in notificationWorker array. I want all notifications to go together, when i run await Promise.all(notificationWorker)
Not sure , how to achieve what i am trying?
I understood the question partially , but then i feel this is difference between nodejs working concurrently and we trying to achieve parallelism , isn't that so.
Nodejs just switching between the tasks by , and not actually parallely doing it.Child Process might help you in that case.
So for eg. if you go through a snippet
function done(i){
try{
return new Promise((resolve,reject)=>{
console.log(i);
resolve("resolved " + i + "th promise");
})
}catch(e){
return null;
}
}
let promises = [];
for(let i=0;i < 100000; i++){
promises.push(done(i));
}
So console starts even when you dont call Promise.all right ? this was your question but infact Promise.all should not suffice your thing , should go by spwaning child processes to achieve parallelism to some extent.
The point i am trying to make it you are potraying the question to do something like first generate array of promises and start all of them once when Promise.all is called but in my opinion Promise.all also will be running concurrently not giving you what you want to achieve.
Something like this - https://htayyar.medium.com/multi-threading-in-javascript-with-paralleljs-10e1f7a1cf32 || How to create threads in nodejs
Though most of these cases show up when we need to do a cpu intensive task etc but here we can achieve something called map reduce to distribute you array of users in parts and start that array to loop and send notifications.
All of the solutions, i am presenting is to achieve some kind of parallelism but i dont think sending array of huge amount of users would ever be done easily (with less resources - instance config etc) at same instant

How to get the HTML from a website using NodeJS?

I know this is a pretty basic question, but I can't get anything working.
I have a list of URL's and I need to get the HTML from them using NodeJS.
I have tried using Axios, but the response returned is always undefined.
I am hitting the endpoint /process-logs with a post request and the body consists of logFiles (which is an array).
router.post("/process-logs", function (req, res, next) {
fileStrings = req.body.logFiles;
for (var i = 0; i < fileStrings.length; i++) {
axios(fileStrings[i]).then(function (response) {
console.log(response.body);
});
}
res.send("done");
});
A sample fileString is of the form https://amazon-artifacts.s3.ap-south-1.amazonaws.com/q-120/log1.txt.
How can I parallelize this process to do the same task for multiple files at a time?
I can think of two approaches here:
the first one is to use ES6 promises (promise.all) and Async/Await feature, by chunking the fileStrings array into n chunks. This is a basic approach and you have to handle a lot of cases.
This is a general idea of the flow i am thinking of:
async function handleChunk (chunk) {
const toBeFullfilled = [];
for (const file of chunk) {
toBeFullfilled.push(axios.get(file)); // replace axios.get with logic per file
}
return Promise.all(toBeFullfilled);
}
async function main() {
try {
const fileStrings = req.body.logfiles;
for (i; i < fileStrings; i += limit) {
let chunk = fileStrings.slice(i, i+limit);
const results = await handleChunk(chunk);
console.log(results);
}
}
catch (e) {
console.log(e);
}
}
main().then(() => { console.log('done')}).catch((e) => { console.log(e) });
one of the drawbacks is we are processing chunks sequentially (chunk by chunk, still better than file-by-file), one enhancement could be to chunk the fileStrings ahead of time and process the chunks concurrently (it really depends on what you're trying to achieve and what are the limitations you have)
the second approach is to use Async library , which has many control flows and collections that allows you to configure the concurreny ... etc. (i really recommend using this approach)
You should have a look at Async's Queue Control Flow to run same task for multiple files concurrently.

Setting delay/timeout for axios requests in map() function

I am using node and axios (with TS, but that's not too important) to query an API. I have a suite of scripts that make calls to different endpoints and log the data (sometimes filtering it.) These scripts are used for debugging purposes. I am trying to make these scripts "better" by adding a delay between requests so that I don't "blow up" the API, especially when I have a large array I'm trying to pass. So basically I want it to make a GET request and pause for a certain amount of time before making the next request.
I have played with trying setTimeout() functions, but I'm only putting them in places where they add the delay after the requests have executed; everywhere I have inserted the function has had this result. I understand why I am getting this result, I just had to try everything I could to at least increase my understanding of how things are working.
I have though about trying to set up a queue or trying to use interceptors, but I think I might be "straying far" from a simpler solution with those ideas.
Additionally, I have another "base script" that I wrote on the fly (sorta the birth point for this batch of scripts) that I constructed with a for loop instead of the map() function and promise.all. I have played with trying to set the delay in that script as well, but I didn't get anywhere helpful.
var axios = require('axios');
var fs = require('fs');
const Ids = [arrayOfIds];
try {
// Promise All takes an array of promises
Promise.all(Ids.map(id => {
// Return each request as its individual promise
return axios
.get(URL + 'endPoint/' + id, config)
}))
.then((vals) =>{
// Vals is the array of data from the resolved promise all
fs.appendFileSync(`${__dirname}/*responseOutput.txt`,
vals.map((v) => {
return `${JSON.stringify(v.data)} \n \r`
}).toString())
}).catch((e) => console.log)
} catch (err) {
console.log(err);
}
No errors with the above code; just can't figure out how to put the delay in correctly.
You could try Promise.map from bluebird
It has the option of setting concurrency
var axios = require('axios');
var fs = require('fs');
var Promise = require('bluebird');
const Ids = [arrayOfIds];
let concurrency = 3; // only maximum 3 HTTP request will run concurrently
try {
Promise.map(Ids, id => {
console.log(`starting request`, id);
return axios.get(URL + 'endPoint/' + id, config)
}, { concurrency })
.then(vals => {
console.log({vals});
})
;
} catch (err) {
console.log(err);
}

Is it possible to group independent async functions under a single await?

Background
I am writing some asynchronous code in express. In one of my end points there I need to retrieve some data from firebase for 2 seperate things.
one posts some data
the other retrieves some data to be used in a calculation and another post.
These 2 steps are not dependent on one another but obviously the end result that should be returned is (just a success message to verify that everything was posted correctly).
Example code
await postData(request);
const data = await retrieveUnrelatedData(request);
const result = calculation(data);
await postCalculatedData(result);
In the code above postData will be holding up the other steps in the process even though the other steps (retrieveUnrelatedData & postCalculatedData) do not require the awaited result of postData.
Question
Is there a more efficient way to get the retrieveUnrelatedData to fire before the full postData promise is returned?
Yes, of course! The thing you need to know is that async/await are using Promises as their underlying technology. Bearing that in mind, here's how you do it:
const myWorkload = request => Promise.all([
postData(request),
calculateData(request)
])
const calculateData = async request => {
const data = await retrieveUnrelatedData(request);
const result = calculation(data);
return await postCalculatedData(result);
}
// Not asked for, but if you had a parent handler calling these it would look like:
const mainHandler = async (req, res) => {
const [postStatus, calculatedData] = await myWorkload(req)
// respond back with whatever?
}

would mysql js transaction pack up all instruction then commit in one time

I am a little bit confused when I use mysql js. I would like to figure out the actual mechanism behind the code.
As I know , if I would like to do a lot of mysql insertion which can run in parallel , it would be more efficient to pack all insertion instructions and commit to mysql one time.
My question is , I am using promise-mysql.js. I would like to use transaction to wrap the whole instructions like this
conn.beginTransaction()
einvoiceList.map(e => {
conn.query("do some insertion")
})
conn.commit()
Am I correct to use this method ?
I have an example using nodejs and mysql2 lib to handle multiple queries to the DB in parallel inside a transaction: https://github.com/Talento90/organization-api/blob/master/organizations-api/src/organizations/manager.js#L30
Basically the idea is to open a transaction and then do all the db queries and at the end commit or rollback the changes.
Tips: Try to use async/await it makes the job much easier :)
async myAsyncMethod(root) {
let promises = [];
let conn = null;
try {
conn = await this.database.getConnection();
await conn.query('START TRANSACTION');
//Execute multiple queries and save them inside an array to wait for the results later
for(...){
promises.push();
}
//Wait until all queries are done!
let results = await Promise.all(promises);
await conn.query('COMMIT');
await conn.release();
return results;
} catch (error) {
if (conn != null) {
await conn.query('ROLLBACK');
}
return 0;
}
}

Categories