Use of Promises for sequential processing - javascript

I'm relatively new to Node.js and JavaScript -please excuse if the question below is dumb.
To me, promises for async processing make sense but I'm not 100% sure about the use of promises when it comes to serial/sequential processing. Lets look at an example (pseudo code):
Objective: Read file, process what was read from the file and send notification
using HTTP post call.
bendUniverseWithoutPromise: function() {
var data = fs.readFileSync(..); //Read the file
var result = processData(data);
this.postNotification(data);
}
In the above function, processData() can not run until we've read the file. And we cannot send the notification until we've finished processing.
Lets look at a slightly different version (assuming each of the above method calls return a promise or we wrap them in a promise):
bendUniverseWithPromise: function() {
return new Promise(function() {
fs.readFileAsync(fileName)
.then(processData(data))
.then(postNotification(result))
})
}
Now, my questions are:
Seeing that we require serial/sequential processing in this instance, how is the promise version better than the non promise version? What is it doing better than the first example? Maybe it is a bad example, but then what would be a good example to demonstrate the differences?
Besides the syntax, the promise version adds a little (only a little) in terms readability of code and can get quite complicated with nested promises, context (this!) etc.
I do understand that technically, the first method will NOT return until all processing is done and the second will return immediately and the processing, although still sequential (in context of the method), will carry on in the background.
Is there a general rule regarding the use of promises? Are there any patterns and anti patterns?
Thank you in advance.

I will try to answer all four of your points by taking your example further.
Lets say the first operation (file read) is a slow I/O bound operation and takes 900 ms. The processing and notification are CPU bound and I/O bound respectively, taking 50 ms each. What do the terms “CPU bound” and “I/O bound” mean?
Now, both versions will take the same 1000 ms to complete, but the second example utilizes available resources better, as it is asynchronous. Herein lies the advantage of the promise based version. The first version will make the server completely unresponsive for an entire second, while the second version will only make the server unresponsive during the 50 ms CPU bound processing step.
This hopefully becomes even more lucid when we consider 10 of these requests coming in at the same time. The first example goes through them one at a time, serving request #1 after 1s, #2 after 2s, and so on, finishing after 10s. Its average performance is 1 req/s. The second version would start a file read for request #1, then immediately go on to request #2, spinning up another file read, and so on for all requests. All requests would then finish their reads in around 1s, assuming 100 ms overhead and little or no saturation of disk read bandwidth. The processing would then queue up, taking 500 ms in total for all requests. Lastly we could do the notification posting in parallel, due to it again being I/O-bound. All requests would then in this idealized example be finished in around 1.5 seconds at over 6 req/s, a 6x performance increase. This is exclusively due to the better resourcefulness provided by asynchronicity.
The rule is therefore, always use async/promises when performing I/O bound work.
Sidenote:
Your second example is not correct as there is no data or result variable defined in that scope, the correct version would pass in only the functions to then().
bendUniverseWithPromise: function (fileName) {
return fs.readFileAsync(fileName)
.then(processData)
.then(postNotification)
}

First let's correct the asynchronous code:
bendUniverseWithPromise: function() {
return fs.readFileAsync(fileName)
.then(processData)
.then(postNotification);
}
Now, that above was (almost, had it been complete) an anti-pattern - the explicit construction anti-pattern.
As for why you'd want to use the promised version, well, it's asynchronous
.. It allows other operations to take place while asynchronous (mostly I/O) operations are waited for.
Note: fs.readFileAsync does not return a promise by default and needs to be "promisified".

Related

Is there a limit to how many promises can or should run concurrently?

Surprisingly google had trouble returning the result for this question.
I'm wondering how many promises can or should be ran in parallel before queuing them and waiting for the next one to finish. I guess it might depend on the user's internet, but I figured it was worth asking.
If it's based on the user's ISP/connection type is there a way to test for the ideal amount of promises to send before starting a queue?
Also, I'm talking strictly from the client side. So, single thread js.
Example code:
function uploadToServer(requestData){
return Promise((...));
}
function sendRequests(requestArray){
var count = 0;
for(var requestData in requestArray){
if(count<idealAmount){
uploadToServer(idealAmount).then(count--);
count++;
}else{
// Logic to wait before attempting to fire event
}
}
}
Promises themselves have no particular coded limits. They are just a notification system and you could have millions of them just fine (as long as you had enough memory to hold those Javascript objects).
Now, if a promise represents an underlying asynchronous operation (which they usually do), there could very well be some limits to how many of that specific type of asynchronous operation can be in flight at the same time. For example, at some point you might run into limits of how many requests a single host would accept from you at the same time. Or, you might run into local resources issues with zillions of connections somewhere.
For things like node.js disk I/O operations, the underlying disk I/O sub-system already has a queuing system so that only a small number of operations are actually running at once and the rest are queued.
So, to answer a question about how many concurrent operations you can have, it can only be analyzed and answered in the context of a specific type of asynchronous request and sometimes even a specific type of receiving host.
If you know you're processing a large or potentially large array of requests and you'll be sending a network request for every item in the array, then it is common to code a limit yourself to avoid overwhelming either local resources or the target host resources. This is usually not done with a queue, but rather code that just launches N requests and then as one finishes, it launches the next one and so on. Both the Bluebird and Async libraries have methods for managing this for you. In Bluebird, it's the concurrency option for Promise.map(). I've also hand-coded loops that manage the number of concurrent connections several times myself and here are links to some of that code:
Promise.all consumes all my RAM
Javascript - how to control how many promises access network in parallel
Make several requests to an API that can only handle 20 request a minute
Loop through an api get request with variable URL
Choose proper async method for batch processing for max requests/sec
Nodejs: Async request with a list of URL
As #jfried00 mentioned there can't be any limits on a number of promises running, as there's no such thing as running a Promise. Once you run an async function or run a code like new Promise(res => something(res)), the method is run.
What you can do is limit the number of promise chains being resolved:
// ten promises ago:
let oldPromise = doSomethingAsync();
// and now:
oldPromise.then(doSomethingNewAsync());
But actually coding this on your own is gonna dye your hair grey rather quickly as my example has shown - error handling, finding the empty slots and keeping the flow in the right order will be hard.
That said it is possible and my framework, Scramjet, which I'll shamelessly plug here does what you need:
DataStream.from(requestArray)
.setOptions({maxParallel: 4})
.unorder(requestData => uploadToServer(requestData))
.run()
Scramjet will keep 4 promises resolving but won't try to keep order (there are other methods for that) and you can use any function - if it doesn't return a promise, it will work the same as if it did. Here's some more text on unordered transforms in scramjet. You can also peek at the source code if you'd rather do that yourself...

Performance impact during the multiple callback functions after sending response

Can anyone help me understand the function of NodeJS and performance impact for the below scenario.
a. Making the request to Rest API end point "/api/XXX". In this request, i am returning the response triggering the asynchronous function like below.
function update(req, res) {
executeUpdate(req.body); //Asynchronous function
res.send(200);
}
b. In this, I send the response back without waiting for the function to complete and this function executing four mongodb updates of different collection.
Questions:
As I read, the NodeJS works on the single thread, how this
asynchronous function is executing?
If there are multiple requests for same end point, how will be the
performance impact of NodeJS?
How exactly the NodeJS handles the asynchronous function of each
request, because as the NodeJS is runs on the single thread, is there
any possibility of the memory issue?
In short, it depends on what you are doing in your function.
The synchronous functions in node are executed on main thread, thus,
they will not preempt and execute until end of the function or until
return statement is encountered.
The async functions, on the other hand, are removed from main thread,
and will only be executed when async tasks are completed on a
separate worker thread.
There are, I think, two different parts in the answer to your question.
Actual Performance - which includes CPU & memory performance. It also obviously includes speed.
Understanding as the previous poster said, Sync and Async.
In dealing with #1 - actual performance the real only way to test it is to create or use a testing environment on your code. In a rudimentary way based upon the system you are using you can view some of the information in top (linux) or Glances will give you a basic idea of performance, but in order to know exactly what is going on you will need to apply some of the various testing environments or writing your own tests.
Approaching #2 - It is not only sync and async processes you have to understand, but also the ramifications of both. This includes the use of callbacks and promises.
It really all depends on the current process you are attempting to code. For instance, many Node programmers seem to prefer using promises when they make calls to MongoDB, especially when one requires more than one call based upon the return of the cursor.
There is really no written-in-stone formula for when you use sync or async processes. Avoiding callback hell is something all Node programmers try to do. Catching errors etc. is something you always need to be careful about. As I said some programmers will always opt for Promises or Async when dealing with returns of data. The famous Async library coupled with Bluebird are the choice of many for certain scenarios.
All that being said, and remember your question is general and therefore so is my answer, in order to properly know the implications on your performance, in memory, cpu and speed as well as in return of information or passing to the browser, it is a good idea to understand as best as you can sync, async, callbacks, promises and error catching. You will discover certain situations are great for sync (and much faster), while others do require async and/or promises.
Hope this helps somewhat.

Node.js asynchronous use cases

Recently, I have been developing web application and I realize that I am not making use of the asynchronous property at all. Hence I am ending up with a lot of nested callbacks.
For example, if the user want to get a file from the server through a particular API, I will have code similar to this,
db.query(<select list of permitted files_names>, function(err, filenames) {
async.each(file_names, function(name, next) {
//open each file to put into array
});
})
This code needs to query database to get a list of file names before looping asynchronously and putting each file content into an array. Finally it will return the finished array to the client.
With the nested callback, and async library, this code is behaving like a synchronous code.
names = db.querySync(//select list of permitted files_names);
for(name in names) {
//open each file to put into array
}
I am better off writing synchronous code like this since it is much neater. My use case might be a little strange but most of my api behaves in similar manner and that makes me think why do I even need asynchronous function?
Can someone please enlighten me if there are any differences between these two codes in term of performance? How do I make use of non-blocking property to enhance the performance in this use case?
If you're writing callback functions you're using by definition using async calls. The callback function fires only when the operation is complete or has errored out. You don't need a fancy library to use these, this is the backbone of how Node's event-loop driven subsystem operates.
Node strongly advises against using "Sync" calls. The Node core only includes a handful as a convenience, they're there as last-resort tools. Many libraries don't even support them so you absolutely must get used to writing async code. In the browser environment, for example, you simply cannot use blocking calls without jamming up the JavaScript runtime and stalling the page.
I prefer using Promises line Bluebird implements to keep code orderly. There are other ways, like the async library, which can help manage otherwise complicated nesting patterns.
Some of the perks include things like Promise.all method runs a series of promises to completion and then triggers a next step, and Promise.map which iterates over a list, running async code for each element, then advancing when the list is complete.
If you're disciplined about organizing your code it's not too bad. Node does require a lot more attention being paid to the order of operations than in a traditional sync-by-default language like Ruby, Python or Java, but you can get used to it. Once you start working with async code rather than fighting it you can often do a ton of work quickly, efficiently, and with a minimum of fuss, in many cases more effectively than in other languages where you must juggle threads plus locking and/or deal with IPC.
Yes, there is a difference in the two codes in terms of performance.
In synchronous code:
names = db.querySync(//select list of permitted files_names);
you are calling the database here to give list of names. Assume , this takes 10 sec. So for this time, nodeJS as it is single threaded gos into blocking state. After 10 sec, it executes the rest of the code . Assume this for loop takes 5 sec and some code takes 5 sec.
for(name in names) {
//open each file to put into array
}
//some code
Therefore it takes a total time of 20 sec.
whereas in Asynchronous code:
db.query(<select list of permitted files_names>, function(err, filenames) {
NodeJs will ask the database to give list of names to a callback. Assume that it takes 10 sec. And immediately it goes into the next step(some code), but not into the blocking state. Assume that some code takes 5 sec.
async.each(file_names, function(name, next) {
//open each file to put into array
});
})
//some code.
After 5 sec, it will check whether it has an i/o operations to be performed. Once the call back is returned. It will execute the function(name, next) {..} for the 5 sec.
So the total time here is 15sec.
In this manner the performance is improved.
If the asynchronous code should be clear and neat then make use of closures & promises.
For ex: Above asynchronous code can be written as
fun = function(err, filenames) {
async.each(file_names, function(name, next) {
//open each file to put into array
}
db.query(<select list of permitted files_names>, fun);
The benefit is simple: By using asynchronous code, the current thread (remember, Node.js is single-threaded) is able to handle other requests while the current request is waiting on something (like a database query) to return.
If you use synchronous code instead, the current thread will block while it waits, and it won't be able to handle other requests in the meantime. In other words, you lose concurrency.
To keep your asynchronous code clean, look into promises (to avoid deeply nested callbacks) and ES7 async/await (to avoid callbacks at all and write asynchronous code that looks just like synchronous code).

angularJS $q -- stopping execution to wait for all promises

I've got this problem that I couldn't find a solution for by googling.
I've got a library, that I'm using (and do not want to edit, unless it's really necessary) that allows the user to select an item, then calls my custom callback function to modify the item and then continues working with it.
I need to perform some asynchronous tasks on it, which may take some time. This creates a race condition as my async tasks have not yet finished when the callback function is finished and the library continues its work on the item.
library.onItemSelectionCallback = function (item) {
myService.modifyItem(item).then(
function (modifiedItemProperty) {
item.newProperty = modifiedItemProperty;
});
myService.anotherModifyItem(item).then(
function (modifiedItemProperty) {
item.existingProperty = modifiedItemProperty;
});
}
How do I wait for both of my async tasks to finish, before allowing this callback to finish?
Only thing I could think of is looping with while and sleep every hundred or so milliseconds until both of the promises have been resolved, but that doesn't seem to be a very good solution.
I understand that this makes async requests quite synchronous and might possibly be detrimental for UX, but do not really see another way out.
EDIT: I know that i'm risking with removing the generic nature of the question and thus making it too localized, I will say that I'm trying to use angular-file-upload module, specifically, trying to mount a custom imageService, that would resize the picture before it's upload. I'm mounting it on the onBeforeUploadItem callback. The idea is that creating the resized image may take a while and that is why I need to return a promise from my imageService, that needs to be resolved before upload.
If modifyItem and anotherModifyItem work independently (that is, one does not rely on the other), you can just pipe them both into $q.all, eg
library.onItemSelectionCallback = function(item) {
var promises = {
newProperty: myService.modifyItem(item),
existingProperty: myService.anotherModifyItem(item)
};
return $q.all(promises).then(function(values) {
return angular.extend(item, values);
});
}
This will return a promise that resolves with item.
For the first part of my question -- Yes, I guess the only way to really wait for those two promises to be resolved would be something with a while and sleep, making them synchronous, which would probably work and not even be that bad (except for the site pausing until the requests are fulfilled), but would make me feel very, very bad about myself as a person and how my actions affect this world.
It is not possible to correctly mix callbacks and promises without hacks afaik.
For the second part of my question -- as per comments of #georgeawg, figured that an AngularJS module that implements HTML5 API and callbacks instead of $http service and promises is not how a good AngularJS module should be implemented, and so I moved towards a different module ng-file-upload, which, even though one could argue is less stylish, does the job very well and in an Angular way (ng-file-upload provides a simple $upload service, that returns a promise. If you want to modify files before upload, suggested way is to simply $watch and catch the moment user drag-drops or selects a file.).

Forever loop while waiting for asynchronous task?

I'm wondering if there's a way to cause JavaScript to wait for some variable-length code execution to finish before continuing using events and loops. Before answering with using timeouts, callbacks or referencing this as a duplicate, hear me out.
I want to expose a large API to a web worker. I want this API to feel 'native' in the sense that you can access each member using a getter which gets the information from the other thread. My initial idea was to compile the API and rebuild the entire object on the worker. While this works (and was a really fun project), it's slow at startup and cannot show changes made to the API without it being sent to the worker again after modification. Observers would solve part of this, and web workers transferrable objects would solve all, but they aren't adopted widely yet.
Since worker round-trip calls happen in a matter of milliseconds, I think stalling the thread for a few milliseconds may be an alright solution. Of course I would think about terminating in cases where calls take too long, but I'm trying to create a proof of concept first.
Let's say I want to expose the api object to the worker. I would define a getter for self.api which would fetch the first layer of properties. Each property would then be another getter and the process would continue until the final object is found.
worker.js
self.addEventListener('message', function(event) {
self.dataRecieved = true;
self.data = event.data; // would actually build new getters here
});
Object.defineProperty(self, 'api', {
get: function() {
self.dataRecieved = false;
self.postMessage('request api first-layer properties');
while(!self.dataRecieved);
return self.data; // whatever properties were received from host
}
});
For experimentation, we'll do a simple round-trip with no data processing:
index.html (only JS part)
var worker = new Worker("worker.js");
worker.onmessage = function() {
worker.postMessage();
};
If onmessage would interrupt the loop, the script should theoretically work. Then the worker could access objects like window.document.body.style on the fly.
My question really boils down to: is there a way to guarantee that an event will interrupt an executing code block?
From my understanding of events in JavaScript, I thought they did interrupt the current thread. Does it not because it's executing a blank statement over and over? What if I generated code to be executed and kept doing that until the data returned?
is there a way to guarantee that an event will interrupt an executing code block
As #slebetman suggests in comments, no, not in Javascript running in a browser's web-worker (with one possible exception that I can think of, see suggestion 3. below).
My suggestions, in decreasing order of preference:
Give up the desire to feel "native" (or maybe "local" might be a better term). Something like the infinite while loop that you suggest also seems to be very much fighting agains the cooperative multitasking environment offered by Javascript, including when thinking about a single web worker.
Communication between workers in Javascript is asynchronous. Perhaps it can fail, take longer than just a few milliseconds. I'm not sure what your use case is, but my feeling is that when the project grows, you might want to use those milliseconds for something else.
You could change your defined property to return a promise, and then the caller would do a .then on the response to retrieve the value, just like any other asynchronous API.
Angular Protractor/Webdriver has an API that uses a control flow to simulate a synchronous environment using promises, by always passing promises about. Taking the code from https://stackoverflow.com/a/22697369/1319998
browser.get(url);
var title = browser.getTitle();
expect(title).toEqual('My Title');
By my understanding, each line above adds a promise to the control flow to execute asynchronously. title isn't actually the title, but a promise that resolves to the title for example. While it looks like synchronous code, the getting and testing all happens asynchronously later.
You could implement something similar in the web worker. However, I do wonder whether it will be worth the effort. There would be a lot of code to do this, and I can't help feeling that the main consequence would be that it would end up harder to write code using this, and not easier, as there would be a lot of hidden behaviour.
The only thing that I know of that can be made synchronous in Javascript, is XMLHttpRequest when setting the async parameter to false https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest#Parameters. I wonder if you could come up with some sort of way to request to the server that maintains a connection with the main thread and pass data along that way. I have to say, my instinct is that this is quite an awful idea, and would be much slower than just requesting data from the main thread.
For what I know, there is not something native in JS to do this but it is relatively easy to do something similar. I made one some time ago for myself: https://github.com/xpy/whener/blob/master/whener.js .
You use it like when( condition, callback ) where condition is a function that should return true when your condition is met, and callback is the function that you want to execute at that time.

Categories