Using data from asynchronous functions [duplicate] - javascript

This question already has answers here:
How do I return the response from an asynchronous call?
(41 answers)
Closed 5 years ago.
I have question regarding asynchronous functions and how to send something after a function has returned it's result. This is what I am trying to accomplish:
Within the handling of a GET request in Node I read the contents of a folder, returning the files in that folder. Next I want to loop over the stats of each file in that folder, loading only the files created within a certain period and lastly send the data in those files as a response to the request. It looks something like this:
array = []
fs.readdir(path, function(err, items) {
items.forEach(function(item) {
fs.stat(path, function(err, stats) {
if (period check) {
array.push(data)
}
})
})
}
res.send(array)
This approach ends up sending an empty array, and I've looked into Promises which seem the solution here but I can't get them to work in this scenario. Using fs.statSync instead of fs.stat does work but this greatly reduces the performance, and it feels like it should be doable with Promises but I just don't know how.
Is there anyone with a solution for this?
EDIT: Regarding to the question marked as duplicate, I tried to solve my problem with the answer there first but didn't succeed. My problem has some nested functions and loops and is more complex than the examples given there.

Use this if you prefer a Promise-based approach:
var path = require('path')
fs.readdir(myPath, function(err, items) {
var array = [];
Promise.all(items.map(function(item) {
return new Promise(function(resolve, reject) {
fs.stat(path.resolve(myPath, item), function(err, stats) {
if (err) {
return reject(err)
}
if (/* period check */) {
array.push(data)
}
resolve()
})
})
})).then(function() {
res.send(array)
}).catch(function(error) {
// error handling
res.sendStatus(500)
})
}

Here is what I would suggest.
// This is a new API and you might need to use the util.promisify
// npm package if you are using old node versions.
const promisify = require('util').promisify;
const fs = require('fs');
// promisify transforms a callback-based API into a promise-based one.
const readdir = promisify(fs.readdir);
const stat = promisify(fs.stat);
const dataProm = readdir(path)
.then((items) => {
// Map each items to a promise on its stat.
const statProms = items.map(path => fs.stat(path);
// Wait for all these promises to resolve.
return Promise.all(statProms);
})
// Remove undesirable file stats based on the result
// of period check.
.then(stats => stats.filter(stat => periodCheck(stat)));
// dataProm will resolve with your data. You might as well return it
// as is. But if you need to use `res.send`, you can do:
dataProm.then((data) => {
res.send(data);
}, (err) => {
// If you go away from the promise chain, you need to handle
// errors now or you are silently swallowing them.
res.sendError(err);
});
Here is a link toward the util.promisify package I am referring to. If you are using node v8+, you do not need it. If you do, do not forget to replace require('util').promisify; by require('util.promisify');.

Related

Trying to delete a file using Node.js. Should I use asynchronously fs.unlink(path, callback) or synchronous fs.unlinkSync(path)?

I have a simple function which deletes a product entry from the database, and now I'm trying to delete the image file of that product as well. I checked the Node.js file system docs and found 2 functions which deal with that - fs.unlink(path, callback) and fs.unlinkSync(path). I understand that the first one is asynchronous and that the second one is synchronous, but I'm still not quite sure which one should I use and why.
module.exports.deleteProduct = async (req, res, next) => {
let productId = req.body.productId
try {
let product = await Product.destroy({
where: {
id: productId
}
})
res.status(200).json({
product: product
})
} catch (e) {
console.log(e)
res.status(500)
}
}
Some code and an idea for you:
As others have already said, async is better than sync, so you won't end up blocking, even though, unless your API volume is extremely high, it probably won't matter, as indicated in another answer.
You can use the fs promises API via
const fs = require('fs').promises; //es5 OR
import { promises as fs } from 'fs'; //es6
to use the async (non-blocking) API with a one-liner await.
Special note: you may not want your API request to fail if you failed to unlink the directory, as you did in fact delete the product from the database.
// make sure you are using the promise API from fs
const fs = require('fs').promises;
module.exports.deleteProduct = async (req, res, next) => {
let productId = req.body.productId
try {
let product = await Product.destroy({
where: {
id: productId
}
})
try {
await fs.unlink('the/path/to/the/product/image');
} catch {
// you may want to handle a failure to delete separately
}
res.status(200).json({product: product})
} catch (e) {
console.log(e)
res.status(500)
}
}
If your server OS is Linux or some other UNIX derivative with a local file system, both .unlinkSync() and .unlink() run quickly: the OS-level unlinking operation is designed to complete quickly and predictably. So, if you use the blocking .unlinkSync() version you won't do much harm, especially if your unlinking is infrequent.
That being said, if you can use the asynchronous version it's a good practice to do so.
It looks like you can; you can call res.status()... from within a callback or after an await.
Don't Block the event loop in Node Js
The synchronous methods blocks the event loop unnecessarily ,which affects your application performance .always use async methods ,wherever possible.
or if you want to use it with awaitoperation (pseudo sync) ,you can do something like below ,by wrapping it within promise
const fs=require("fs");
function unlinkPromise(file)
{
return new Promise((resolve,reject)=>{
fs.unlink(file,(err,data)=>{
if(err)
{
reject(err);
}
resolve(data);
})
})
}
async function data()
{
console.log(await unlinkPromise("file"));
}

Why is promise-limit not working in Nodejs?

I have a large number of images that I want to download. I'm using the request-promise package to download the images from their URL(s). Since there were a large number of images, the server was getting overloaded and the downloads would hang or get corrupted, so I decided to use the promise-limit library to set the concurrency limit. Here is my code :
const fs = require('fs');
const request = require('request-promise');
const promiseLimit = require('promise-limit');
var limit = promiseLimit(30);
async function download(data) {
console.log(data);
return Promise.all(data.map( (obj) => {
return limit(() => downloadRequest(obj))
})).then(() => {
console.log("All images downloaded.")
})
function downloadRequest(obj) {
var img = obj.dest;
var url = obj.url;
var req = request(url);
req.pipe(fs.createWriteStream(img));
}
I replicated the sample exactly as is given in the github page, but the method returns without ever having the Promise.all() fulfilled. I do not understand what I am doing wrong, I thought that Promise.all() will definitely wait till it resolves all promises.
In the backend, this call is used as
...
.then((data) => {
return downloader.download(data);
})
.then(() => {
var filename = "Sample.pdf";
// Pages.json is written early in the chain, contains URL and corresponding image paths
return generator.generate('./Pages.json', filename);
});
Does this mean NodeJS is already trying to generate the file out of pages.json? How can I make this part of the code synchronous to download?
Your function downloadRequest() does not return a promise. It must return a promise that is tied to the asynchronous operation in contains such that the promise is resolved when that asynchronous operation is complete or rejected when that asynchronous operation has an error. Only when it does that can the limit() package properly do its job.
Since you're using a stream and piping it in downloadRequest(), you will have to manually construct a promise and then monitor the various events in the stream to know when it's done or has an error so you can resolve or reject that promise.
Here's an idea how to make downloadRequest() properly return a promise:
function downloadRequest(obj) {
return new Promise((resolve, reject) => {
const img = obj.dest;
const url = obj.url;
const req = request(url);
req.on('error', reject);
const ws = fs.createWriteStream(img);
ws.on('error', reject);
req.pipe(ws).on('finish', resolve);
});
}
And, it is now recommended to use the pipeline() function instead of .pipe() because it does more complete cleanup in error conditions and there is also a promise version of that built-in:
const { pipeline } = require('stream/promises');
function downloadRequest(obj) {
return pipeline(request(obj.url), fs.createWriteStream(obj.dest));
}
P.S. In case you didn't know, the request() library has been deprecated and it is recommended that you not use it in new projects any more. There is a list of alternative libraries to choose from here, all of which also have built-in promise support. I've looked at the various choices, tried several and decided I'm using got() in my work.

Setting delay/timeout for axios requests in map() function

I am using node and axios (with TS, but that's not too important) to query an API. I have a suite of scripts that make calls to different endpoints and log the data (sometimes filtering it.) These scripts are used for debugging purposes. I am trying to make these scripts "better" by adding a delay between requests so that I don't "blow up" the API, especially when I have a large array I'm trying to pass. So basically I want it to make a GET request and pause for a certain amount of time before making the next request.
I have played with trying setTimeout() functions, but I'm only putting them in places where they add the delay after the requests have executed; everywhere I have inserted the function has had this result. I understand why I am getting this result, I just had to try everything I could to at least increase my understanding of how things are working.
I have though about trying to set up a queue or trying to use interceptors, but I think I might be "straying far" from a simpler solution with those ideas.
Additionally, I have another "base script" that I wrote on the fly (sorta the birth point for this batch of scripts) that I constructed with a for loop instead of the map() function and promise.all. I have played with trying to set the delay in that script as well, but I didn't get anywhere helpful.
var axios = require('axios');
var fs = require('fs');
const Ids = [arrayOfIds];
try {
// Promise All takes an array of promises
Promise.all(Ids.map(id => {
// Return each request as its individual promise
return axios
.get(URL + 'endPoint/' + id, config)
}))
.then((vals) =>{
// Vals is the array of data from the resolved promise all
fs.appendFileSync(`${__dirname}/*responseOutput.txt`,
vals.map((v) => {
return `${JSON.stringify(v.data)} \n \r`
}).toString())
}).catch((e) => console.log)
} catch (err) {
console.log(err);
}
No errors with the above code; just can't figure out how to put the delay in correctly.
You could try Promise.map from bluebird
It has the option of setting concurrency
var axios = require('axios');
var fs = require('fs');
var Promise = require('bluebird');
const Ids = [arrayOfIds];
let concurrency = 3; // only maximum 3 HTTP request will run concurrently
try {
Promise.map(Ids, id => {
console.log(`starting request`, id);
return axios.get(URL + 'endPoint/' + id, config)
}, { concurrency })
.then(vals => {
console.log({vals});
})
;
} catch (err) {
console.log(err);
}

Best way to calling API inside for loop using Promises

I have 500 millions of object in which each has n number of contacts as like below
var groupsArray = [
{'G1': ['C1','C2','C3'....]},
{'G2': ['D1','D2','D3'....]}
...
{'G2000': ['D2001','D2002','D2003'....]}
...
]
I have two way of implementation in nodejs which is based on regular promises and another one using bluebird as shown below
Regular promises
...
var groupsArray = [
{'G1': ['C1','C2','C3']},
{'G2': ['D1','D2','D3']}
]
function ajax(url) {
return new Promise(function(resolve, reject) {
request.get(url,{json: true}, function(error, data) {
if (error) {
reject(error);
} else {
resolve(data);
}
});
});
}
_.each(groupsArray,function(groupData){
_.each(groupData,function(contactlists,groupIndex){
// console.log(groupIndex)
_.each(contactlists,function(contactData){
ajax('http://localhost:3001/api/getcontactdata/'+groupIndex+'/'+contactData).then(function(result) {
console.log(result.body);
// Code depending on result
}).catch(function() {
// An error occurred
});
})
})
})
...
Using bluebird way i have used concurrency to check how to control the queue of promises
...
_.each(groupsArray,function(groupData){
_.each(groupData,function(contactlists,groupIndex){
var contacts = [];
// console.log(groupIndex)
_.each(contactlists,function(contactData){
contacts.push({
contact_name: 'Contact ' + contactData
});
})
groups.push({
task_name: 'Group ' + groupIndex,
contacts: contacts
});
})
})
Promise.each(groups, group =>
Promise.map(group.contacts,
contact => new Promise((resolve, reject) => {
/*setTimeout(() =>
resolve(group.task_name + ' ' + contact.contact_name), 1000);*/
request.get('http://localhost:3001/api/getcontactdata/'+group.task_name+'/'+contact.contact_name,{json: true}, function(error, data) {
if (error) {
reject(error);
} else {
resolve(data);
}
});
}).then(log => console.log(log.body)),
{
concurrency: 50
}).then(() => console.log())).then(() => {
console.log('All Done!!');
});
...
I want to know when dealing with 100 millions of api call inside loop using promises. please advise the best way to call API asynchronously and deal the response later.
My answer using regular Node.js promises (this can probably easily be adapted to Bluebird or another library).
You could fire off all Promises at once using Promise.all:
var groupsArray = [
{'G1': ['C1','C2','C3']},
{'G2': ['D1','D2','D3']}
];
function ajax(url) {
return new Promise(function(resolve, reject) {
request.get(url,{json: true}, function(error, data) {
if (error) {
reject(error);
} else {
resolve(data);
}
});
});
}
Promise.all(groupsArray.map(group => ajax("your-url-here")))
.then(results => {
// Code that depends on all results.
})
.catch(err => {
// Handle the error.
});
Using Promise.all attempts to run all your requests in parallel. This probably won't work well when you have 500 million requests to make all being attempted at the same time!
A more effective way to do it is to use the JavaScript reduce function to sequence your requests one after the other:
// ... Setup as before ...
const results = [];
groupsArray.reduce((prevPromise, group) => {
return prevPromise.then(() => {
return ajax("your-url-here")
.then(result => {
// Process a single result if necessary.
results.push(result); // Collect your results.
});
});
},
Promise.resolve() // Seed promise.
);
.then(() => {
// Code that depends on all results.
})
.catch(err => {
// Handle the error.
});
This example chains together the promises so that the next one only starts once the previous one completes.
Unfortunately the sequencing approach will be very slow because it has to wait until each request has completed before starting a new one. Whilst each request is in progress (it takes time to make an API request) your CPU is sitting idle whereas it could be working on another request!
A more efficient, but complicated approach to this problem is to use a combination of the above approaches. You should batch your requests so that the requests in each batch (of say 10) are executed in parallel and then the batches are sequenced one after the other.
It's tricky to implement this yourself - although it's a great learning exercise
- using a combination of Promise.all and the reduce function, but I'd suggest using the library async-await-parallel. There's a bunch of such libraries, but I use this one and it works well and it easily does the job you want.
You can install the library like this:
npm install --save async-await-parallel
Here's how you would use it:
const parallel = require("async-await-parallel");
// ... Setup as before ...
const batchSize = 10;
parallel(groupsArray.map(group => {
return () => { // We need to return a 'thunk' function, so that the jobs can be started when they are need, rather than all at once.
return ajax("your-url-here");
}
}, batchSize)
.then(() => {
// Code that depends on all results.
})
.catch(err => {
// Handle the error.
});
This is better, but it's still a clunky way to make such a large amount of requests! Maybe you need to up the ante and consider investing time in proper asynchronous job management.
I've been using Kue lately for managing a cluster of worker processes. Using Kue with the Node.js cluster library allows you to get proper parallelism happening on a multi-core PC and you can then easily extend it to muliple cloud-based VMs if you need even more grunt.
See my answer here for some Kue example code.
In my opinion you have two problems coupled in one questions - I'd decouple them.
#1 Loading of a large dataset
Operation on such a large dataset (500m records) will surely cause some memory limit issues sooner or later - node.js runs in a single thread and that is limited to use approx 1.5GB of memory - after that your process will crash.
In order to avoid that you could be reading your data as a stream from a CSV - I'll use scramjet as it'll help us with the second problem, but JSONStream or papaparse would do pretty well too:
$ npm install --save scramjet
Then let's read the data - I'd assume from a CSV:
const {StringStream} = require("scramjet");
const stream = require("fs")
.createReadStream(pathToFile)
.pipe(new StringStream('utf-8'))
.csvParse()
Now we have a stream of objects that will return the data line by line, but only if we read it. Solved problem #1, now to "augment" the stream:
#2 Stream data asynchronous augmentation
No worries - that's just what you do - for every line of data you want to fetch some additional info (so augment) from some API, which by default is asynchronous.
That's where scramjet kicks in with just couple additional lines:
stream
.flatMap(groupData => Object.entries(groupData))
.flatMap(([groupIndex, contactList]) => contactList.map(contactData => ([contactData, groupIndex])
// now you have a simple stream of entries for your call
.map(([contactData, groupIndex]) => ajax('http://localhost:3001/api/getcontactdata/'+groupIndex+'/'+contactData))
// and here you can print or do anything you like with your data stream
.each(console.log)
After this you'd need to accumulate the data or output it to stream - there are numbers of options - for example: .toJSONArray().pipe(fileStream).
Using scramjet you are able to separate the process to multiple lines without much impact on performance. Using setOptions({maxParallel: 32}) you can control concurrency and best of all, all this will run with a minimal memory footprint - much much faster than if you were to load the whole data into memory.
Let me know how if this is helpful - your question is quite complex so let me know if you run into any problems - I'll be happy to help. :)

nodejs: How to wait for several asynchronous tasks to finish

I have a file where I'm writing things:
var stream = fs.createWriteStream("my_file.txt");
stream.once('open', function(fd) {
names.forEach(function(name){
doSomething(name);
});
stream.end();
});
This is working ok and I'm able to write to the file.
The problem is that the doSomething() function has some parts that are asynchronous. An example can be given with the dnsLookup function. Somewhere in my doSomething() I have:
dns.lookup(domain, (err, addresses, family) => {
if(err){
stream.write("Error:", err);
}else{
stream.write(addresses);
}
});
Now, my problem is, since the DNS check is asynchronous, the code keeps executing closing the stream. When the DNS response finally comes it cannot write to anywhere.
I already tried to use the async module but it didn't work. Probably I did something wrong.
Any idea?
Now that NodeJS is mostly up to speed with ES2015 features (and I notice you're using at least one arrow function), you can use the native promises in JavaScript (previously you could use a library):
var stream = fs.createWriteStream("my_file.txt");
stream.once('open', function(fd) {
Promise.all(names.map(name => doSomething(name)))
.then(() => {
// success handling
stream.end();
})
.catch(() => {
// error handling
stream.end();
});
});
(The line Promise.all(names.map(name => doSomething(name))) can be simply Promise.all(names.map(doSomething)) if you know doSomething ignores extra arguments and only uses the first.)
Promise.all (spec | MDN) accepts an iterable and returns a promise that is settled when all of the promises in the iterable are settled (non-promise values are treated as resolved promises using the value as the resolution).
Where doSomething becomes:
function doSomething(name) {
return new Promise((resolve, reject) => {
dns.lookup(domain, (err, addresses, family) => {
if(!err){ // <== You meant `if (err)` here, right?
stream.write("Error:", err);
reject(/*...reason...*/);
}else{
stream.write(addresses);
resolve(/*...possibly include addresses*/);
});
});
});
There are various libs that will "promise-ify" Node-style callbacks for you so using promises is less clunky than the mix above; in that case, you could use the promise from a promise-ified dns.lookup directly rather than creating your own extra one.

Categories