Performance impact during the multiple callback functions after sending response

Performance impact during the multiple callback functions after sending response - javascript

Can anyone help me understand the function of NodeJS and performance impact for the below scenario.
a. Making the request to Rest API end point "/api/XXX". In this request, i am returning the response triggering the asynchronous function like below.
function update(req, res) {
executeUpdate(req.body); //Asynchronous function
res.send(200);
}
b. In this, I send the response back without waiting for the function to complete and this function executing four mongodb updates of different collection.
Questions:
As I read, the NodeJS works on the single thread, how this
asynchronous function is executing?
If there are multiple requests for same end point, how will be the
performance impact of NodeJS?
How exactly the NodeJS handles the asynchronous function of each
request, because as the NodeJS is runs on the single thread, is there
any possibility of the memory issue?

In short, it depends on what you are doing in your function.
The synchronous functions in node are executed on main thread, thus,
they will not preempt and execute until end of the function or until
return statement is encountered.
The async functions, on the other hand, are removed from main thread,
and will only be executed when async tasks are completed on a
separate worker thread.

There are, I think, two different parts in the answer to your question.
Actual Performance - which includes CPU & memory performance. It also obviously includes speed.
Understanding as the previous poster said, Sync and Async.
In dealing with #1 - actual performance the real only way to test it is to create or use a testing environment on your code. In a rudimentary way based upon the system you are using you can view some of the information in top (linux) or Glances will give you a basic idea of performance, but in order to know exactly what is going on you will need to apply some of the various testing environments or writing your own tests.
Approaching #2 - It is not only sync and async processes you have to understand, but also the ramifications of both. This includes the use of callbacks and promises.
It really all depends on the current process you are attempting to code. For instance, many Node programmers seem to prefer using promises when they make calls to MongoDB, especially when one requires more than one call based upon the return of the cursor.
There is really no written-in-stone formula for when you use sync or async processes. Avoiding callback hell is something all Node programmers try to do. Catching errors etc. is something you always need to be careful about. As I said some programmers will always opt for Promises or Async when dealing with returns of data. The famous Async library coupled with Bluebird are the choice of many for certain scenarios.
All that being said, and remember your question is general and therefore so is my answer, in order to properly know the implications on your performance, in memory, cpu and speed as well as in return of information or passing to the browser, it is a good idea to understand as best as you can sync, async, callbacks, promises and error catching. You will discover certain situations are great for sync (and much faster), while others do require async and/or promises.
Hope this helps somewhat.

Related

Why are almost all Puppeteer calls asynchronous?

I've studied sync and async in JavaScript. I'm going to make a crawling program using Puppeteer.
There are many code examples of crawling in Puppeteer.
But, I have one question: Why do they use async in basic Puppeteer example scripts?
Can't I use sync programming in Puppeteer? Is there an issue that I don't know about that makes async necessary?
It doesn't seem useful if I don't use multiple threads (multi-crawling).

For starters, I recommend reading How the single threaded non blocking IO model works in Node.js. This thread motivates the callback and promise-based models Node provides for achieving concurrency.
Whenever the Node process needs to access an out-of-process resource such as the file system or a network socket (as Puppeteer does to communicate with the browser it's connected to), there are two options:
Block the whole process and wait for the response, as fs.readFileSync does.
Use a promise or a callback to be notified of the response and go about other things, as fs.readFile (either via callback or fs.promises) and Puppeteer do.
The first option is a poor choice, with the only advantage being easier syntax to write. Blocking the thread to wait for a resource is like ordering a pizza, then doing nothing until the pizza arrives. You might as well read a book or water your plants while you wait.
Historically, callbacks were originally the only way to write concurrent code in Node. Eventually, promises and then arrived, which were better, but still posed readability burdens. With the advent of async/await, it's no longer difficult to write asynchronous code that reads like synchronous code. Synchronous APIs like fs's __Sync functions that alias an asynchronous API are historical artifacts. It's normal that Puppeteer doesn't offer page.waitForSelectorSync, page.$evalSync, etc.
Now, it's understandable to think that Puppeteer's asynchronous API is pointless in a simple, straight-line script since your Node process doesn't have anything else to do while awaiting responses, but having to type await for each call is the least evil of the available design options for the API.
Simply not awaiting promises isn't an option even when a script is a single sequence of straight-line code. Without await, ordering of operations/results becomes nondeterministic as each promise runs concurrently, independent of the others. This interleaving would be unintended in sequential code, but is a useful tool in cases when concurrency is desired.
For the authors of an asynchronous API where almost all calls are accessesing an external resource, as is the case with Puppeteer, the options are:
Write and maintain two versions of the API, a synchronous and an asynchronous version. No libraries that I know of do this -- it's a major pain with little benefit and plenty of room for misuse.
Write and maintain a synchronous API only to cater to the simple use case at the expense of making the library virtually unusable for anyone that cares about concurrency. Clearly, this is horrible design, like forcing everyone who orders a pizza (in the above real-world example) to do nothing until it arrives.
Write and maintain one asynchronous API, and make clients who don't care about concurrency in a particular program have to write await in front of all the calls. That's what Puppeteer does.
Incidentally, the fact that the browser is in a separate process tends to cause all manner of confusion in Puppeteer beginners. For example, the fact that data is serialized and deserialized (converted to a string) on every call to page.evaluate (and family) means that you can't pass complex structures like DOM nodes across the inter-process gap. You can't access variables you've defined in Node from the body of an evaluate callback without passing them as arguments to the evaluate call, and these variables need to be able to respond correctly to JSON.stringify() (that is, be serializable).
Just 13 hours before this post, someone asked node.js puppeteer "document is not defined" -- they were trying to access the browser process' document object inside of Node.
If you're on Windows, try running a simple Puppeteer Node script that doesn't close the browser, then look at your task manager. On Linux, you can run ps -a. You'll see that there's a Chromium browser and a Node process. The two processes communicate over a socket, which has much higher latency than intra-process communication and involves the operating system's network stack. Every Puppeteer call provides an opportunity for concurrency that'd be lost if Puppeteer's API was synchronous.
Understanding the inter-process gap is critical to success in Puppeteer because it motivates why the API calls are asynchronous, and helps clarify which code is executing in which process.

async is very important for data fetching/crawling. You can imagine this case, you have 1 element is book-container, but inside book-container, it will have book data coming later on UI with API fetch.
const scraperObject = {
url: 'http://book-store.com',
scraper(browser){
let page = browser.newPage();
page.goto(this.url);
page.waitForSelector('.book-container');
page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
With this code snippet, it will run like this
page.goto(this.url) Go to the page with certain URL
page.waitForSelector('.book-container') No async here, so it will try to get .book-container element immediately (of course, it won't be there because the page is possibly still loading due to some network problem)
page.waitForSelector('.book') Similarly, it try to get book data immediately (even though book-container has not been in HTML yet)
To solve this problem, we should have async to WAIT for elements ready in HTML.
const scraperObject = {
url: 'http://book-store.com',
async scraper(browser){
let page = await browser.newPage();
await page.goto(this.url);
await page.waitForSelector('.book-container');
await page.waitForSelector('.book');
//TODO: save book data after this
});
}
}
Explain it again with async/await.
page.goto(this.url) Go to the page with certain URL and wait till the page loaded
page.waitForSelector('.book-container') Wait till .book-container element appears in HTML
page.waitForSelector('.book') Wait till .book element appears in HTML (we can understand that API's data responded)

Multiple fs.writeFile on Node.js

Straight to the point, I am running an http server in Node.js managing a hotel's check-in/out info where I write all the JSON data from memory to the same file using "fs.writeFile".
The data usually don't exceed 145kB max, however since I need to write them everytime that I get an update from my DataBase, I have data loss/bad JSON format when calls to fs.writeFile happen one after each other immediately.
Currently I have solved this problem using "fs.writeFileSync" however I would like to hear for a more sophisticated solution and not using the easy/bad solution of sync function.
Using fs.promises results in the same error since again I have to make multiple calls to fs.promises.
According to Node's documentation , calling fs.writefile or fs.promises multiple times is not safe and they suggest using a filestream, however this is not currently an option.
To summarize, I need to wait for fs.writeFile to end normally before attempting any repeated write action, and using the callback is not useful since I don't know a priori when a write action needs to be done.
Thank you very much in advance

I assume you mean you are overwriting or truncating the file while the last write request is still being written. If I were you, I would use the promises API and heed the warning from the documentation:
It is unsafe to use fsPromises.writeFile() multiple times on the same file without waiting for the promise to be settled.
You can await the result in a traditional loop, or very carefully use .then() to "synchronize" your callbacks, but if you're not doing anything else in your event loop except reading from your database and writing to this file, you might as well just use writeFileSync to keep things simple/safe. The asynchronous APIs (callback and Promises) are intended to allow your program to do other things in the meantime; if this is not necessary and the async APIs add troublesome complexity for your code, just use the synchronous APIs. That's true for any node API or library function, not just fs.writeFile.
There are also libraries that will perform atomic filesystem operations for you and abstract away the implementation details, but I think these are probably overkill for you unless you describe your use case in more detail. For example, why you're dumping a database to disk as JSON as fast/frequently as you can, rather than keeping things in memory or using event-based incremental updates (e.g. a real, local database with atomicity and consistency guarantees).

thank you for your response!
Since my app is mainly an http server,yes I do other things rather than simply input/output, although with not a great amount of requests. I will review again the promises solution but the first time I had no luck.
To explain more I have a:function updateRoom(data){ ...update things in memory... writetoDisk(); }
and the function writetoDisk(){
fsWriteFile(....)
}
Making the function writetoDisk an async function and implementing "await" inside it still does not solve the problem since the updateRoom function will call the writetoDisk without waiting for it to end.
The ".then" approach can not be implemented since my updateRoom is being called constantly and dynamically .
If you happen to know 1-2 thing about async-await you are more than welcome to explain me a bit more, thanks again nevertheless!

Is there a benefit to write a function as asynchronous when it is not needed?

It is written that:
Use async code but avoid blocking calls
Asynchronous programming is a recommended best practice, especially when blocking I/O operations are involved.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-best-practices#use-async-code-but-avoid-blocking-calls
I have functions which do not contain "await" lines. Is there a benefit to write these functions as async?

I have functions which do not contain "await" lines. Is there a
benefit to write these functions as async?
If the logic of your function is not complicated, then you can use sync function and it is no problem.
As the document says, if you do something like upload or download files, then use async function is a better way. You can put the IO operations in a async method, and call them in the async azure function. This will improve efficiency(This is because Non-IO operations will run together with IO operations).
If you didn't do some operations that take times such as IO operations, then it is no need to use async. Generate thread need to consume memory resources, if you use async in no need places, then this will be a waste.

Documentation you quoted contains some hints on why it's a good idea.
Use async code but avoid blocking calls
Asynchronous programming is a recommended best practice, especially
when blocking I/O operations are involved.
In C#, always avoid referencing the Result property or calling Wait
method on a Task instance. This approach can lead to thread
exhaustion.
Tip
If you plan to use the HTTP or WebHook bindings, plan to avoid port
exhaustion that can be caused by improper instantiation of HttpClient.
For more information, see How to manage connections in Azure
Functions.
When you have blocking calls they occupy some resources like threads/ports. Which could be shared by various functions.
E.g. Say
your implementation depends on a 3rd party service that can process 1000s of requests in parallel.
your worker process count is 4
Then:
If all 4 of them trigger some external job and wait till it's complete, then you'll be able to run only 4 of them in parallel. If your user wanted ability to run 1000 jobs in parallel, you'll have to scale to (and pay for) 250 instances of your app.
If each function did following:
trigger each job asynchronously and note job-id
check job-status in a loop for every job-id
Then a single app functions would be able to handle 1000s of requests in parallel.
I have functions which do not contain "await" lines. Is there a benefit to write these functions as async?
It's still good to be async, unless you're sure you've considered all future requirements.

Where does the Javascript asynchronicity REALLY comes from?

I mean, this may seem like a silly question, but I don't think it really is. In programming languages where we can use threads, the concept is very simple. But Javascript is single-threaded and handles events through callbacks. What really makes something "branch-out" the execution process (I know this is the wrong technical term but I hope you understand my point). Is it the timers? Like setTimeout and setInterval?
I mean, if I were to write a Database with JS (I know JS wouldn't fit it but please bear with me for the sake of the example) and I need to check for some hardware operations to be completed before calling a callback function to handle the completion of such operation I cannot for the sake of me think in a possible way to actually make this 'pseudo-code' async. I mean:
function op() {
while (someHardwareOpIsNotCompleted()) doWhatever();
callback("done");
}
How can I make this function run asynchronously without using some form of timer that refers and 'counts' to a different thread? In my mind the only way around it is to pass to setTimeout the function reference as such setTimeout(op). From here on I can implement my callbacks and listeners, or the syntax-sugar Promise or even the syntax sugar-sugar "async".
Hopefully I was able to get my point across and I don't get replies like: "Use promises!" or something of the sort. If the real structure has nothing to do with timers but only simple callbacks, can someone give me an example on how to achieve a non-blocking execution of some piece of code without timers?
Thanks

A function is asynchronous for one of two reasons.
It needs to happen later (e.g. when a timeout has finished or when a click happens)
It takes a long time and shouldn't tie up the JS engine while it runs (e.g. accessing a database / file / network resource)
The asynchronous parts of the latter functions are not written in JavaScript. They are provided by the host environment and are usually written in the same language a that (e.g. C). The host environment exposes an API to provide access to them to the JavaScript.
For example, the source code for Chrome's implementation of XMLHttpRequest is written in C++.
If you need to poll something, then testing it on a timer is the usual way.

async/await understanding in node.js

Have read tons of articles and docs, however this topic still not clear enough to me. Quote from one answer https://stackoverflow.com/a/46004461/630169:
As long as the code contained inside the async/await is non-blocking
it won't block, for example, db calls, network calls, filesystem calls.
But if the code contained inside async/await is blocking, then it will
block the entire Node.js process, for example, infinite loops, CPU
intensive tasks like image processing, etc.
However, Understanding the node.js event loop says:
Of course, on the backend, there are threads and processes for DB
access and process execution.
In C# it is enough to write function marked with async and call with await so .Net puts it in another thread. However, it confused me things organized differently in Node.js and async/await function still could block the main thread.
So the question is: how to write (organize) arbitrary async/await function in node.js to be sure it will run asynchronously in separate thread or process? Is there good code example? Some npm module? Also good to have it not much trickier than C# variant. Thanks!
Some function example to made it non-blocking, for example, if I want synchronous DB call to make asynchronous (non-blocking):
var Database = require('better-sqlite3');
var db = new Database('./my_db.sqlite');
async function DBRequest() {
var row = db.prepare("SELECT * FROM table");
return row;
};
Note: better-sqlite3 — synchronous module.

Well here's some example code. Decided it was a worthwhile exercise to provide it.
You can write long-running blocking code in such a way that it can yield execution time to other functions
var array = new Array(100);
function processNext(){
if (array.length === 0) return;
var item = array.shift(); // gets first item from array and removes it.
process(item); // 0.5 seconds of blocking time
setTimeout(processNext ,500); // wait 0.5 seconds and then process the next one
// during this waiting time, other code will run on your server.
}
processNext();
Admittedly I'm a novice and this may be a very bad idea for reasons I don't know about.

You're really at the mercy of the library your using here - if the code is synchronous and not I/O bound then there's really not much else you can do within your Node process to make it asynchronous.
Your only real alternative is to move that code into it's own process which consequently makes it I/O bound, that way your app can wait on it and not block it's own thread.

My interest was not for nothing - similar questions people asked before me, as well as on stackoverflow (and here):
But what's with longish, CPU-bound tasks?
How do you avoid blocking the event loop, when the task is at hand is
not I / O bound, and lasts more than a few fractions of a millisecond?
You simply can not, because there's no way ... well, there was not
before threads_a_gogo.
and to solve them they have already created a bunch of modules, and some, such as threads, work both in the browser and in Node.js:
threads
Threads à gogo
nPool
threadpool-js
threadpool
etc.
Good if someone could attach async/await to them or provide good example - that would be nice.
By the way, here's a comrade tests driven on Threads à gogo - results with threads is 40x faster than with Cluster. So single threaded idea of Node.js does not always work well.

We Keep Coding

JavaScript is the programming language of the Web.