Node.js asynchronous use cases - javascript

Recently, I have been developing web application and I realize that I am not making use of the asynchronous property at all. Hence I am ending up with a lot of nested callbacks.
For example, if the user want to get a file from the server through a particular API, I will have code similar to this,
db.query(<select list of permitted files_names>, function(err, filenames) {
async.each(file_names, function(name, next) {
//open each file to put into array
});
})
This code needs to query database to get a list of file names before looping asynchronously and putting each file content into an array. Finally it will return the finished array to the client.
With the nested callback, and async library, this code is behaving like a synchronous code.
names = db.querySync(//select list of permitted files_names);
for(name in names) {
//open each file to put into array
}
I am better off writing synchronous code like this since it is much neater. My use case might be a little strange but most of my api behaves in similar manner and that makes me think why do I even need asynchronous function?
Can someone please enlighten me if there are any differences between these two codes in term of performance? How do I make use of non-blocking property to enhance the performance in this use case?

If you're writing callback functions you're using by definition using async calls. The callback function fires only when the operation is complete or has errored out. You don't need a fancy library to use these, this is the backbone of how Node's event-loop driven subsystem operates.
Node strongly advises against using "Sync" calls. The Node core only includes a handful as a convenience, they're there as last-resort tools. Many libraries don't even support them so you absolutely must get used to writing async code. In the browser environment, for example, you simply cannot use blocking calls without jamming up the JavaScript runtime and stalling the page.
I prefer using Promises line Bluebird implements to keep code orderly. There are other ways, like the async library, which can help manage otherwise complicated nesting patterns.
Some of the perks include things like Promise.all method runs a series of promises to completion and then triggers a next step, and Promise.map which iterates over a list, running async code for each element, then advancing when the list is complete.
If you're disciplined about organizing your code it's not too bad. Node does require a lot more attention being paid to the order of operations than in a traditional sync-by-default language like Ruby, Python or Java, but you can get used to it. Once you start working with async code rather than fighting it you can often do a ton of work quickly, efficiently, and with a minimum of fuss, in many cases more effectively than in other languages where you must juggle threads plus locking and/or deal with IPC.

Yes, there is a difference in the two codes in terms of performance.
In synchronous code:
names = db.querySync(//select list of permitted files_names);
you are calling the database here to give list of names. Assume , this takes 10 sec. So for this time, nodeJS as it is single threaded gos into blocking state. After 10 sec, it executes the rest of the code . Assume this for loop takes 5 sec and some code takes 5 sec.
for(name in names) {
//open each file to put into array
}
//some code
Therefore it takes a total time of 20 sec.
whereas in Asynchronous code:
db.query(<select list of permitted files_names>, function(err, filenames) {
NodeJs will ask the database to give list of names to a callback. Assume that it takes 10 sec. And immediately it goes into the next step(some code), but not into the blocking state. Assume that some code takes 5 sec.
async.each(file_names, function(name, next) {
//open each file to put into array
});
})
//some code.
After 5 sec, it will check whether it has an i/o operations to be performed. Once the call back is returned. It will execute the function(name, next) {..} for the 5 sec.
So the total time here is 15sec.
In this manner the performance is improved.
If the asynchronous code should be clear and neat then make use of closures & promises.
For ex: Above asynchronous code can be written as
fun = function(err, filenames) {
async.each(file_names, function(name, next) {
//open each file to put into array
}
db.query(<select list of permitted files_names>, fun);

The benefit is simple: By using asynchronous code, the current thread (remember, Node.js is single-threaded) is able to handle other requests while the current request is waiting on something (like a database query) to return.
If you use synchronous code instead, the current thread will block while it waits, and it won't be able to handle other requests in the meantime. In other words, you lose concurrency.
To keep your asynchronous code clean, look into promises (to avoid deeply nested callbacks) and ES7 async/await (to avoid callbacks at all and write asynchronous code that looks just like synchronous code).

Related

Why are almost all Puppeteer calls asynchronous?

I've studied sync and async in JavaScript. I'm going to make a crawling program using Puppeteer.
There are many code examples of crawling in Puppeteer.
But, I have one question: Why do they use async in basic Puppeteer example scripts?
Can't I use sync programming in Puppeteer? Is there an issue that I don't know about that makes async necessary?
It doesn't seem useful if I don't use multiple threads (multi-crawling).
For starters, I recommend reading How the single threaded non blocking IO model works in Node.js. This thread motivates the callback and promise-based models Node provides for achieving concurrency.
Whenever the Node process needs to access an out-of-process resource such as the file system or a network socket (as Puppeteer does to communicate with the browser it's connected to), there are two options:
Block the whole process and wait for the response, as fs.readFileSync does.
Use a promise or a callback to be notified of the response and go about other things, as fs.readFile (either via callback or fs.promises) and Puppeteer do.
The first option is a poor choice, with the only advantage being easier syntax to write. Blocking the thread to wait for a resource is like ordering a pizza, then doing nothing until the pizza arrives. You might as well read a book or water your plants while you wait.
Historically, callbacks were originally the only way to write concurrent code in Node. Eventually, promises and then arrived, which were better, but still posed readability burdens. With the advent of async/await, it's no longer difficult to write asynchronous code that reads like synchronous code. Synchronous APIs like fs's __Sync functions that alias an asynchronous API are historical artifacts. It's normal that Puppeteer doesn't offer page.waitForSelectorSync, page.$evalSync, etc.
Now, it's understandable to think that Puppeteer's asynchronous API is pointless in a simple, straight-line script since your Node process doesn't have anything else to do while awaiting responses, but having to type await for each call is the least evil of the available design options for the API.
Simply not awaiting promises isn't an option even when a script is a single sequence of straight-line code. Without await, ordering of operations/results becomes nondeterministic as each promise runs concurrently, independent of the others. This interleaving would be unintended in sequential code, but is a useful tool in cases when concurrency is desired.
For the authors of an asynchronous API where almost all calls are accessesing an external resource, as is the case with Puppeteer, the options are:
Write and maintain two versions of the API, a synchronous and an asynchronous version. No libraries that I know of do this -- it's a major pain with little benefit and plenty of room for misuse.
Write and maintain a synchronous API only to cater to the simple use case at the expense of making the library virtually unusable for anyone that cares about concurrency. Clearly, this is horrible design, like forcing everyone who orders a pizza (in the above real-world example) to do nothing until it arrives.
Write and maintain one asynchronous API, and make clients who don't care about concurrency in a particular program have to write await in front of all the calls. That's what Puppeteer does.
Incidentally, the fact that the browser is in a separate process tends to cause all manner of confusion in Puppeteer beginners. For example, the fact that data is serialized and deserialized (converted to a string) on every call to page.evaluate (and family) means that you can't pass complex structures like DOM nodes across the inter-process gap. You can't access variables you've defined in Node from the body of an evaluate callback without passing them as arguments to the evaluate call, and these variables need to be able to respond correctly to JSON.stringify() (that is, be serializable).
Just 13 hours before this post, someone asked node.js puppeteer "document is not defined" -- they were trying to access the browser process' document object inside of Node.
If you're on Windows, try running a simple Puppeteer Node script that doesn't close the browser, then look at your task manager. On Linux, you can run ps -a. You'll see that there's a Chromium browser and a Node process. The two processes communicate over a socket, which has much higher latency than intra-process communication and involves the operating system's network stack. Every Puppeteer call provides an opportunity for concurrency that'd be lost if Puppeteer's API was synchronous.
Understanding the inter-process gap is critical to success in Puppeteer because it motivates why the API calls are asynchronous, and helps clarify which code is executing in which process.
async is very important for data fetching/crawling. You can imagine this case, you have 1 element is book-container, but inside book-container, it will have book data coming later on UI with API fetch.
const scraperObject = {
url: 'http://book-store.com',
scraper(browser){
let page = browser.newPage();
page.goto(this.url);
page.waitForSelector('.book-container');
page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
With this code snippet, it will run like this
page.goto(this.url) Go to the page with certain URL
page.waitForSelector('.book-container') No async here, so it will try to get .book-container element immediately (of course, it won't be there because the page is possibly still loading due to some network problem)
page.waitForSelector('.book') Similarly, it try to get book data immediately (even though book-container has not been in HTML yet)
To solve this problem, we should have async to WAIT for elements ready in HTML.
const scraperObject = {
url: 'http://book-store.com',
async scraper(browser){
let page = await browser.newPage();
await page.goto(this.url);
await page.waitForSelector('.book-container');
await page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
Explain it again with async/await.
page.goto(this.url) Go to the page with certain URL and wait till the page loaded
page.waitForSelector('.book-container') Wait till .book-container element appears in HTML
page.waitForSelector('.book') Wait till .book element appears in HTML (we can understand that API's data responded)

Multiple fs.writeFile on Node.js

Straight to the point, I am running an http server in Node.js managing a hotel's check-in/out info where I write all the JSON data from memory to the same file using "fs.writeFile".
The data usually don't exceed 145kB max, however since I need to write them everytime that I get an update from my DataBase, I have data loss/bad JSON format when calls to fs.writeFile happen one after each other immediately.
Currently I have solved this problem using "fs.writeFileSync" however I would like to hear for a more sophisticated solution and not using the easy/bad solution of sync function.
Using fs.promises results in the same error since again I have to make multiple calls to fs.promises.
According to Node's documentation , calling fs.writefile or fs.promises multiple times is not safe and they suggest using a filestream, however this is not currently an option.
To summarize, I need to wait for fs.writeFile to end normally before attempting any repeated write action, and using the callback is not useful since I don't know a priori when a write action needs to be done.
Thank you very much in advance
I assume you mean you are overwriting or truncating the file while the last write request is still being written. If I were you, I would use the promises API and heed the warning from the documentation:
It is unsafe to use fsPromises.writeFile() multiple times on the same file without waiting for the promise to be settled.
You can await the result in a traditional loop, or very carefully use .then() to "synchronize" your callbacks, but if you're not doing anything else in your event loop except reading from your database and writing to this file, you might as well just use writeFileSync to keep things simple/safe. The asynchronous APIs (callback and Promises) are intended to allow your program to do other things in the meantime; if this is not necessary and the async APIs add troublesome complexity for your code, just use the synchronous APIs. That's true for any node API or library function, not just fs.writeFile.
There are also libraries that will perform atomic filesystem operations for you and abstract away the implementation details, but I think these are probably overkill for you unless you describe your use case in more detail. For example, why you're dumping a database to disk as JSON as fast/frequently as you can, rather than keeping things in memory or using event-based incremental updates (e.g. a real, local database with atomicity and consistency guarantees).
thank you for your response!
Since my app is mainly an http server,yes I do other things rather than simply input/output, although with not a great amount of requests. I will review again the promises solution but the first time I had no luck.
To explain more I have a:function updateRoom(data){ ...update things in memory... writetoDisk(); }
and the function writetoDisk(){
fsWriteFile(....)
}
Making the function writetoDisk an async function and implementing "await" inside it still does not solve the problem since the updateRoom function will call the writetoDisk without waiting for it to end.
The ".then" approach can not be implemented since my updateRoom is being called constantly and dynamically .
If you happen to know 1-2 thing about async-await you are more than welcome to explain me a bit more, thanks again nevertheless!

Is there a limit to how many promises can or should run concurrently?

Surprisingly google had trouble returning the result for this question.
I'm wondering how many promises can or should be ran in parallel before queuing them and waiting for the next one to finish. I guess it might depend on the user's internet, but I figured it was worth asking.
If it's based on the user's ISP/connection type is there a way to test for the ideal amount of promises to send before starting a queue?
Also, I'm talking strictly from the client side. So, single thread js.
Example code:
function uploadToServer(requestData){
return Promise((...));
}
function sendRequests(requestArray){
var count = 0;
for(var requestData in requestArray){
if(count<idealAmount){
uploadToServer(idealAmount).then(count--);
count++;
}else{
// Logic to wait before attempting to fire event
}
}
}
Promises themselves have no particular coded limits. They are just a notification system and you could have millions of them just fine (as long as you had enough memory to hold those Javascript objects).
Now, if a promise represents an underlying asynchronous operation (which they usually do), there could very well be some limits to how many of that specific type of asynchronous operation can be in flight at the same time. For example, at some point you might run into limits of how many requests a single host would accept from you at the same time. Or, you might run into local resources issues with zillions of connections somewhere.
For things like node.js disk I/O operations, the underlying disk I/O sub-system already has a queuing system so that only a small number of operations are actually running at once and the rest are queued.
So, to answer a question about how many concurrent operations you can have, it can only be analyzed and answered in the context of a specific type of asynchronous request and sometimes even a specific type of receiving host.
If you know you're processing a large or potentially large array of requests and you'll be sending a network request for every item in the array, then it is common to code a limit yourself to avoid overwhelming either local resources or the target host resources. This is usually not done with a queue, but rather code that just launches N requests and then as one finishes, it launches the next one and so on. Both the Bluebird and Async libraries have methods for managing this for you. In Bluebird, it's the concurrency option for Promise.map(). I've also hand-coded loops that manage the number of concurrent connections several times myself and here are links to some of that code:
Promise.all consumes all my RAM
Javascript - how to control how many promises access network in parallel
Make several requests to an API that can only handle 20 request a minute
Loop through an api get request with variable URL
Choose proper async method for batch processing for max requests/sec
Nodejs: Async request with a list of URL
As #jfried00 mentioned there can't be any limits on a number of promises running, as there's no such thing as running a Promise. Once you run an async function or run a code like new Promise(res => something(res)), the method is run.
What you can do is limit the number of promise chains being resolved:
// ten promises ago:
let oldPromise = doSomethingAsync();
// and now:
oldPromise.then(doSomethingNewAsync());
But actually coding this on your own is gonna dye your hair grey rather quickly as my example has shown - error handling, finding the empty slots and keeping the flow in the right order will be hard.
That said it is possible and my framework, Scramjet, which I'll shamelessly plug here does what you need:
DataStream.from(requestArray)
.setOptions({maxParallel: 4})
.unorder(requestData => uploadToServer(requestData))
.run()
Scramjet will keep 4 promises resolving but won't try to keep order (there are other methods for that) and you can use any function - if it doesn't return a promise, it will work the same as if it did. Here's some more text on unordered transforms in scramjet. You can also peek at the source code if you'd rather do that yourself...

Performance impact during the multiple callback functions after sending response

Can anyone help me understand the function of NodeJS and performance impact for the below scenario.
a. Making the request to Rest API end point "/api/XXX". In this request, i am returning the response triggering the asynchronous function like below.
function update(req, res) {
executeUpdate(req.body); //Asynchronous function
res.send(200);
}
b. In this, I send the response back without waiting for the function to complete and this function executing four mongodb updates of different collection.
Questions:
As I read, the NodeJS works on the single thread, how this
asynchronous function is executing?
If there are multiple requests for same end point, how will be the
performance impact of NodeJS?
How exactly the NodeJS handles the asynchronous function of each
request, because as the NodeJS is runs on the single thread, is there
any possibility of the memory issue?
In short, it depends on what you are doing in your function.
The synchronous functions in node are executed on main thread, thus,
they will not preempt and execute until end of the function or until
return statement is encountered.
The async functions, on the other hand, are removed from main thread,
and will only be executed when async tasks are completed on a
separate worker thread.
There are, I think, two different parts in the answer to your question.
Actual Performance - which includes CPU & memory performance. It also obviously includes speed.
Understanding as the previous poster said, Sync and Async.
In dealing with #1 - actual performance the real only way to test it is to create or use a testing environment on your code. In a rudimentary way based upon the system you are using you can view some of the information in top (linux) or Glances will give you a basic idea of performance, but in order to know exactly what is going on you will need to apply some of the various testing environments or writing your own tests.
Approaching #2 - It is not only sync and async processes you have to understand, but also the ramifications of both. This includes the use of callbacks and promises.
It really all depends on the current process you are attempting to code. For instance, many Node programmers seem to prefer using promises when they make calls to MongoDB, especially when one requires more than one call based upon the return of the cursor.
There is really no written-in-stone formula for when you use sync or async processes. Avoiding callback hell is something all Node programmers try to do. Catching errors etc. is something you always need to be careful about. As I said some programmers will always opt for Promises or Async when dealing with returns of data. The famous Async library coupled with Bluebird are the choice of many for certain scenarios.
All that being said, and remember your question is general and therefore so is my answer, in order to properly know the implications on your performance, in memory, cpu and speed as well as in return of information or passing to the browser, it is a good idea to understand as best as you can sync, async, callbacks, promises and error catching. You will discover certain situations are great for sync (and much faster), while others do require async and/or promises.
Hope this helps somewhat.

Use of Promises for sequential processing

I'm relatively new to Node.js and JavaScript -please excuse if the question below is dumb.
To me, promises for async processing make sense but I'm not 100% sure about the use of promises when it comes to serial/sequential processing. Lets look at an example (pseudo code):
Objective: Read file, process what was read from the file and send notification
using HTTP post call.
bendUniverseWithoutPromise: function() {
var data = fs.readFileSync(..); //Read the file
var result = processData(data);
this.postNotification(data);
}
In the above function, processData() can not run until we've read the file. And we cannot send the notification until we've finished processing.
Lets look at a slightly different version (assuming each of the above method calls return a promise or we wrap them in a promise):
bendUniverseWithPromise: function() {
return new Promise(function() {
fs.readFileAsync(fileName)
.then(processData(data))
.then(postNotification(result))
})
}
Now, my questions are:
Seeing that we require serial/sequential processing in this instance, how is the promise version better than the non promise version? What is it doing better than the first example? Maybe it is a bad example, but then what would be a good example to demonstrate the differences?
Besides the syntax, the promise version adds a little (only a little) in terms readability of code and can get quite complicated with nested promises, context (this!) etc.
I do understand that technically, the first method will NOT return until all processing is done and the second will return immediately and the processing, although still sequential (in context of the method), will carry on in the background.
Is there a general rule regarding the use of promises? Are there any patterns and anti patterns?
Thank you in advance.
I will try to answer all four of your points by taking your example further.
Lets say the first operation (file read) is a slow I/O bound operation and takes 900 ms. The processing and notification are CPU bound and I/O bound respectively, taking 50 ms each. What do the terms “CPU bound” and “I/O bound” mean?
Now, both versions will take the same 1000 ms to complete, but the second example utilizes available resources better, as it is asynchronous. Herein lies the advantage of the promise based version. The first version will make the server completely unresponsive for an entire second, while the second version will only make the server unresponsive during the 50 ms CPU bound processing step.
This hopefully becomes even more lucid when we consider 10 of these requests coming in at the same time. The first example goes through them one at a time, serving request #1 after 1s, #2 after 2s, and so on, finishing after 10s. Its average performance is 1 req/s. The second version would start a file read for request #1, then immediately go on to request #2, spinning up another file read, and so on for all requests. All requests would then finish their reads in around 1s, assuming 100 ms overhead and little or no saturation of disk read bandwidth. The processing would then queue up, taking 500 ms in total for all requests. Lastly we could do the notification posting in parallel, due to it again being I/O-bound. All requests would then in this idealized example be finished in around 1.5 seconds at over 6 req/s, a 6x performance increase. This is exclusively due to the better resourcefulness provided by asynchronicity.
The rule is therefore, always use async/promises when performing I/O bound work.
Sidenote:
Your second example is not correct as there is no data or result variable defined in that scope, the correct version would pass in only the functions to then().
bendUniverseWithPromise: function (fileName) {
return fs.readFileAsync(fileName)
.then(processData)
.then(postNotification)
}
First let's correct the asynchronous code:
bendUniverseWithPromise: function() {
return fs.readFileAsync(fileName)
.then(processData)
.then(postNotification);
}
Now, that above was (almost, had it been complete) an anti-pattern - the explicit construction anti-pattern.
As for why you'd want to use the promised version, well, it's asynchronous
.. It allows other operations to take place while asynchronous (mostly I/O) operations are waited for.
Note: fs.readFileAsync does not return a promise by default and needs to be "promisified".

Categories