I have built in Node.js a very stable robot app that basically sends requests continuously to an API. To make sure nothing can go wrong, I handle any possible error and I have set timeouts for promises that could take too long to resolve...
Now, I would like to improve the app by removing my safety nets, and monitoring async operations to find any kind of "async leaking", eg promises pending forever or any strange outcome I'm not aware of (that's the point of my question).
Are there any tools meant to monitor the Node.js async flow ? For instance, getting the total amount of pending promises in the process at a given time ? Or getting a warning if any promise has been pending for more than a given time, and tracking that promise ?
If that may guide answers, here are the modules I use :
// Bluebird (promises)
var Promise = require("bluebird");
// Mongoose with promises
var mongoose = require('mongoose');
mongoose.Promise = require('bluebird');
// Rate limiter with promises
var Bottleneck = require("bottleneck");
// Promisified requests
var request = require('request-promise');
Sorry for not being able to formulate my question precisely : I have no clue as to what I can expect/wish for exactly...
EDIT : So far, my research has led me to :
Bluebird's resource management tools, but I can't figure out a way to make them useful
The amazing npm monitor and the shipped-in monitor-dashboard, but for some reason I can't yet make it work for my needs...
Since I'm still developing the app and have a life besides the app, I don't have much time to look into it, but I'm definitely going to address this question seriously at some point !
Sounds like you might need some kind of job management.
I've been trying kue lately to manage asynchronous jobs and it's pretty good.
It has an API for starting and running jobs. Each job can report it's progress. It comes with a builtin job dashboard that shows you the running jobs and their progress. It has an extensive set of events so that you can monitor the status of each job.
You need to install Redis to use this, but that's not difficult.
It doesn't support promises, but you can see in my code example below that it's easy to start a job and then wrap it in a promise so that you can await job completion:
const queue = kue.createQueue();
queue.process("my-job", 1, (job, done) => {
const result = ... some result ...
// Run your job here.
if (an error occurs) {
done(err); // Job failure.
return;
}
done(null, result); // Job success
});
function runMyJob () {
return new Promise((resolve, reject) => {
const job = queue.create("my-job").save();
job.on('complete', result => {
resolve(result); // Job has completed successfully, resolve the promise.
});
job.on("failed", err => {
reject(err); // Job has failed, reject the promise.
});
});
};
runMyJob()
.then(() => {
console.log("Job completed.")
})
.catch(err => {
console.error("Job failed.");
});
It's fairly easy to make this work in parallel over multiple CPUs using the Node.js cluster module.
Read more on the kue web page.
Related
I have TDD knowledge, and I've been trying to start a project in javascript applying the same principles.
I am building an API that, once hit, fires a request to an external service in order to gather some data, and once retrieved, parses it and returns the response.
So far, I've been unlucky with my saga, I've searched a lot and the most similar issue on SO I've found is this one. But I've had no success in order to apply the same solution.
My implementation on the implementation side is as follows:
//... other handlers
weather(request, response) {
//... some setup and other calls
this.externalService.get(externalURL)
.then(serviceResponse => {
this.externalResponseParser.parse(serviceResponse)
});
//... some more code
}
And on the test side:
let requester;
let mockParser;
let handler;
let promiseResolve;
let promiseReject;
beforeEach(function () {
requester = new externalRequesterService();
mockParser = sinon.createStubInstance(...);
handler = new someHandler({
externalService: requester,
externalResponseParser: mockParser
});
});
it('returns data', function () {
sinon.stub(requester, 'get').callsFake((url) => {
return new Promise((resolve, reject) => {
// so I can be able to handle when the promise gets resolved
promiseResolve = resolve;
promiseReject = reject;
});
});
handler.weather(request, response);
// assertions of what happens before externalService.get gets called - all green
promiseResolve(expectedServiceResponse);
assert.ok(mockExternalResponseParser.parse.calledOnce, 'Expected externalResponseParser.parse to have been called once');
});
In the last line of the test, it fails, even though I am calling what I am supposed to.
At some point I've added some logging, and I was able to see that the code of the then block, seems to get executed after the assertion in the test, which might be source of the problem.
I've tried to find out if there is some sort of eventually that could be used, so my assertion after resolving the promise would be something like:
assert.eventually(mockExternalResponseParser.parse.calledOnce, 'Expected externalResponseParser.parse to eventually be called once');
but no luck.
Does anyone have some clear explanation of what is missing? Many thanks in advance
P.S.- As per request, please find a stripped down version of my code here. Just run npm install, followed by npm test in order to get the same output.
Thank you all for the time spent.
I ended up finding this very good article on Medium which allowed me to solve my issue. It has a very nice explanation moving from callbacks to promises, which was exactly the scenario I had in hands.
I have updated the github repository I created on how to fix it.
If you want to TL;DR here goes just the highlight of the changes. Implementation side:
async weather(request, response) {
//...
let serviceResponse = await this.requesterService.get(weatherURL);
//...
};
And on the test side:
it('returns weather on success', async function () {
sinon.stub(requester, 'get').callsFake((_) => {
return new Promise((resolve, _) => {
resolve(expectedServiceResponse);
});
});
await handler.weather(request, response);
//...
assert.ok(mockWeatherParser.parseWeather.calledOnce, 'Expected WeatherParser.parseWeather to have been called once'); // no longer fails here
//...
});
Now keep in mind in this example, it is still synchronous. However, I evolved my API already step by step, and after migrating to this synchronous version using Promises, it was way more easier to migrate to an async version. Both in terms of testing and implementation.
If you have the same issue and need help or have questions, let me know. I'll be happy to help.
I'm doing some reading up on JS Promises to up-skill.
Here's my quandry:
Say you want to console.log('we done, bruh!') AFTER your data's come back.
so with a Promise, you might say:
let iWantToLogOut = function() {
let data = fetch('https://jsonplaceholder.typicode.com/users')
return new Promise((resolve) => {
resolve(data)
})
}
And then resolve that promise like:
iWantToLogOut().then((dataBack)
=> databack.json())
.then((json) => {
console.log('We done, bruh! Look: ', json)
})
So that's great. You get your API data back and then we log our msg out.
But isn't it just way easier to go:
let data = fetch('https://jsonplaceholder.typicode.com/users');
data ? console.log('we done, bruh!') : null;
I'm probably over-simplifying/missing something (because... well... i'm retarded) but I just want to make sure i'm really understanding Promises first before i move onto Async/Await.
But isn't it just way easier to go:
let data = fetch('https://jsonplaceholder.typicode.com/users');
data ? console.log('we done, bruh!') : null;
It would be, but it doesn't work. What fetch returns is a promise, not the result of the operation. You can't return the result of an asynchronous process. More: How do I return the response from an asynchronous call?
In the upcoming ES2017 spec, though, we have syntactic sugar around promise consumption which will let you write this:
let data = await fetch('https://jsonplaceholder.typicode.com/users');
// --------^^^^^
console.log('we done, bruh!');
Note we don't even need the conditional, because await converts a promise rejection into an exception.
That code would need to be in an async function, e.g.:
(async function() {
let data = await fetch(/*...*/);
// use data here
})();
The JavaScript engines in some browsers already support async/await, but to use it in the wild, you'll want to transpile with Babel or similar.
Note: You've shown
so with a Promise, you might say:
let iWantToLogOut = function() {
let data = fetch('https://jsonplaceholder.typicode.com/users')
return new Promise((resolve) => {
resolve(data)
})
}
There are a couple of problems with that code:
It never settles the promise you created if the fetch fails.
It calls something data which is not data, it's a promise of data (that's mostly style, but it's misleading).
It exhibits the promise creation anti-pattern. You already have a promise (from fetch), no need to create another.
iWantToLogOut should be simply:
let iWantToLogOut = function() {
return fetch('https://jsonplaceholder.typicode.com/users');
};
That returns a promise that will be resolved with the data, or of course rejected. Which you'd then consume with promise methods or await (within an async function).
It is not a matter of easy.
Usually network calls should be handle asynchronously(I don't want to the anti-pattern of synchronous AJAX calls). At that point you have few options to handle it:
Callbacks
Promises
Observables
In you code above, when it's synchronous, the fetch should return immediately with a promise that will be resolve to the data only when the server has responded. Only then you can check the data for it's content. Further. Because every promise can be fulfilled or failed, in your then you can have a handler for each instead of using the ternary.
From the latest spec:
Synchronous XMLHttpRequest outside of workers is in the process of being removed from the web platform as it has detrimental effects to the end user’s experience. (This is a long process that takes many years.) Developers must not pass false for the async argument when current global object is a Window object. User agents are strongly encouraged to warn about such usage in developer tools and may experiment with throwing an InvalidAccessError exception when it occurs.
I have 500 millions of object in which each has n number of contacts as like below
var groupsArray = [
{'G1': ['C1','C2','C3'....]},
{'G2': ['D1','D2','D3'....]}
...
{'G2000': ['D2001','D2002','D2003'....]}
...
]
I have two way of implementation in nodejs which is based on regular promises and another one using bluebird as shown below
Regular promises
...
var groupsArray = [
{'G1': ['C1','C2','C3']},
{'G2': ['D1','D2','D3']}
]
function ajax(url) {
return new Promise(function(resolve, reject) {
request.get(url,{json: true}, function(error, data) {
if (error) {
reject(error);
} else {
resolve(data);
}
});
});
}
_.each(groupsArray,function(groupData){
_.each(groupData,function(contactlists,groupIndex){
// console.log(groupIndex)
_.each(contactlists,function(contactData){
ajax('http://localhost:3001/api/getcontactdata/'+groupIndex+'/'+contactData).then(function(result) {
console.log(result.body);
// Code depending on result
}).catch(function() {
// An error occurred
});
})
})
})
...
Using bluebird way i have used concurrency to check how to control the queue of promises
...
_.each(groupsArray,function(groupData){
_.each(groupData,function(contactlists,groupIndex){
var contacts = [];
// console.log(groupIndex)
_.each(contactlists,function(contactData){
contacts.push({
contact_name: 'Contact ' + contactData
});
})
groups.push({
task_name: 'Group ' + groupIndex,
contacts: contacts
});
})
})
Promise.each(groups, group =>
Promise.map(group.contacts,
contact => new Promise((resolve, reject) => {
/*setTimeout(() =>
resolve(group.task_name + ' ' + contact.contact_name), 1000);*/
request.get('http://localhost:3001/api/getcontactdata/'+group.task_name+'/'+contact.contact_name,{json: true}, function(error, data) {
if (error) {
reject(error);
} else {
resolve(data);
}
});
}).then(log => console.log(log.body)),
{
concurrency: 50
}).then(() => console.log())).then(() => {
console.log('All Done!!');
});
...
I want to know when dealing with 100 millions of api call inside loop using promises. please advise the best way to call API asynchronously and deal the response later.
My answer using regular Node.js promises (this can probably easily be adapted to Bluebird or another library).
You could fire off all Promises at once using Promise.all:
var groupsArray = [
{'G1': ['C1','C2','C3']},
{'G2': ['D1','D2','D3']}
];
function ajax(url) {
return new Promise(function(resolve, reject) {
request.get(url,{json: true}, function(error, data) {
if (error) {
reject(error);
} else {
resolve(data);
}
});
});
}
Promise.all(groupsArray.map(group => ajax("your-url-here")))
.then(results => {
// Code that depends on all results.
})
.catch(err => {
// Handle the error.
});
Using Promise.all attempts to run all your requests in parallel. This probably won't work well when you have 500 million requests to make all being attempted at the same time!
A more effective way to do it is to use the JavaScript reduce function to sequence your requests one after the other:
// ... Setup as before ...
const results = [];
groupsArray.reduce((prevPromise, group) => {
return prevPromise.then(() => {
return ajax("your-url-here")
.then(result => {
// Process a single result if necessary.
results.push(result); // Collect your results.
});
});
},
Promise.resolve() // Seed promise.
);
.then(() => {
// Code that depends on all results.
})
.catch(err => {
// Handle the error.
});
This example chains together the promises so that the next one only starts once the previous one completes.
Unfortunately the sequencing approach will be very slow because it has to wait until each request has completed before starting a new one. Whilst each request is in progress (it takes time to make an API request) your CPU is sitting idle whereas it could be working on another request!
A more efficient, but complicated approach to this problem is to use a combination of the above approaches. You should batch your requests so that the requests in each batch (of say 10) are executed in parallel and then the batches are sequenced one after the other.
It's tricky to implement this yourself - although it's a great learning exercise
- using a combination of Promise.all and the reduce function, but I'd suggest using the library async-await-parallel. There's a bunch of such libraries, but I use this one and it works well and it easily does the job you want.
You can install the library like this:
npm install --save async-await-parallel
Here's how you would use it:
const parallel = require("async-await-parallel");
// ... Setup as before ...
const batchSize = 10;
parallel(groupsArray.map(group => {
return () => { // We need to return a 'thunk' function, so that the jobs can be started when they are need, rather than all at once.
return ajax("your-url-here");
}
}, batchSize)
.then(() => {
// Code that depends on all results.
})
.catch(err => {
// Handle the error.
});
This is better, but it's still a clunky way to make such a large amount of requests! Maybe you need to up the ante and consider investing time in proper asynchronous job management.
I've been using Kue lately for managing a cluster of worker processes. Using Kue with the Node.js cluster library allows you to get proper parallelism happening on a multi-core PC and you can then easily extend it to muliple cloud-based VMs if you need even more grunt.
See my answer here for some Kue example code.
In my opinion you have two problems coupled in one questions - I'd decouple them.
#1 Loading of a large dataset
Operation on such a large dataset (500m records) will surely cause some memory limit issues sooner or later - node.js runs in a single thread and that is limited to use approx 1.5GB of memory - after that your process will crash.
In order to avoid that you could be reading your data as a stream from a CSV - I'll use scramjet as it'll help us with the second problem, but JSONStream or papaparse would do pretty well too:
$ npm install --save scramjet
Then let's read the data - I'd assume from a CSV:
const {StringStream} = require("scramjet");
const stream = require("fs")
.createReadStream(pathToFile)
.pipe(new StringStream('utf-8'))
.csvParse()
Now we have a stream of objects that will return the data line by line, but only if we read it. Solved problem #1, now to "augment" the stream:
#2 Stream data asynchronous augmentation
No worries - that's just what you do - for every line of data you want to fetch some additional info (so augment) from some API, which by default is asynchronous.
That's where scramjet kicks in with just couple additional lines:
stream
.flatMap(groupData => Object.entries(groupData))
.flatMap(([groupIndex, contactList]) => contactList.map(contactData => ([contactData, groupIndex])
// now you have a simple stream of entries for your call
.map(([contactData, groupIndex]) => ajax('http://localhost:3001/api/getcontactdata/'+groupIndex+'/'+contactData))
// and here you can print or do anything you like with your data stream
.each(console.log)
After this you'd need to accumulate the data or output it to stream - there are numbers of options - for example: .toJSONArray().pipe(fileStream).
Using scramjet you are able to separate the process to multiple lines without much impact on performance. Using setOptions({maxParallel: 32}) you can control concurrency and best of all, all this will run with a minimal memory footprint - much much faster than if you were to load the whole data into memory.
Let me know how if this is helpful - your question is quite complex so let me know if you run into any problems - I'll be happy to help. :)
Background:
I am writing a meteor app which initially populates a db with data retrieved via SOAP requests made to a API end point on a server somewhere.
The initial request is a search via a search term I select. I get back a list of id's to records that match my search.
Then I make another API request, but this time, for each of those records which I then store to my own db (only selected values, not all the data)
If I have a list of search terms then the above is performed for each of these.
To avoid 'callback hell' and because I though it was a good opportunity to learn something new I opted to use Promises ordered sequentially: which goes something like this: forEach searchTerm -> getTheResults.then.forEachResult->fetchRecord
For short sequences of a 100 or so it worked fine but when it got up to 1000 it would hang. After speaking to uncle Google, I came across some threads about native Promises not being optimized in some way and the third party libraries that were faster. So I decided to try Bluebird, but before doing this I would test the assertion that it might make things faster.
Code
The code below creates a sequence of 'sleep promises'. The idea was to swap out the Bluebird promise implementation and observe the time test takes to run over a sequence 5000 Promises long.
var Promise = require("bluebird"); // comment out for native implementation
function sleep(time) {
return new Promise(function (resolve, reject) {
setTimeout(function() {
// process.stdout.write(".");
resolve(true);
}, time);
});
}
// the function is badly named, please ignore
function nativePromiseSequence(n) {
// dont know how this first line works
const arr = Array.apply(null, Array(n)).map(function () {});
return arr.reduce(function(sequence, e, i, a) {
return sequence.then(function() {
return sleep(1);
});
}, Promise.resolve());
}
The Test
I am using 'chai-as-promised' for testing the promises.
it('Run 5000 promises in a chain', function() {
this.timeout(7000);
return nativePromiseSequence(5000).should.eventually.equal(true);
});
The Result
Over a Promise chain of 5000 the test with Bluebird implementation completed about a second slower that Native promises
Have I made a mistake here or missed something?
The following code works (the user object is written to the console), however the process doesn't exit. I believe one of the promises must not be resolved?
var Promise = require("bluebird");
var mongodb = require('mongodb');
Promise.promisifyAll(mongodb);
mongodb.MongoClient.connectAsync("mongodb://localhost/test")
.then(function(db){
var users = db.collection('users');
return users.findOneAsync({userName: "someuser"});
})
.then(function (result) {
console.log(result);
})
.catch(function(e){
//handle error
});
What is wrong with this code?
MongoDB creates a persistent connection which you're supposed to use for the whole lifecycle of your application.
When you're done with it - close it. That is - call db.close()
If you want to write saner code, use Promise.using and a disposer for making a saner connectAsync which does resource management for you.