I'm writing a node backend and I'm a little confused how should I deal with async functions. I've read about process.nextTick(), but how often should I use. Most of my code is based on callbacks, like database calls, which are asynchronous by themselves. But I also have a few functions of my own, that should be async.
So which one is a good example of async function?
function validateUser1(user, callback) {
process.nextTick(function() {
//validate user, some regex and stuff
callback(err, user);
});
}
function validateUser2(user, callback) {
//validate user, some regex and stuff
process.nextTick(callback, err, user);
}
function validateUser3(user, callback) {
process.nextTick(function() {
//validate user, some regex and stuff
process.nextTick(callback, err, user);
});
}
I don't know whether I should wrap everything in process.nextTick , or wrap just the callback? or both?
And overall, the idea with node.js is to write lots of small functions rather than bigger ones, and call them asynchronously to not block other events, right?
If you have just CPU code (no I/O) you should try and go as far along as you can. Avoid async and tiny functions which fragment your code unnecessarily.
Take the opportunity and write clean, readable, linear code whenever possible. Only revert to async when absolutely necessary, such as stream I/O (file or network).
Consider this. Even if you have 1000+ lines of JS code, it will still be executed blazingly fast. You really do not need to fragment it (unless proven to be too cumbersome, such as very deep loops, but you have to measure it first)!
If you don't test the linear code first and actually SEE that you need to fragment it, you'll end up with premature optimization, which is a bad thing for maintainability.
I'd really go straight away with this:
function validateUser1(user, callback) {
//validate user, some regex and stuff
callback(err, user);
}
And if possible, remove the function altogether (but this is a matter of how you write the rest of the code).
Also, don't use nextTick() if you don't really need it. I've implemented a cloud server with many TCP/IP sockets, database connections, logging, file reading and a lot of I/O, but NOT ONCE did I use nextTick() and it runs really smooth.
process.nextTick() will execute your callback before continuing with the event loop. This will block your thread and can stop incoming connections from being handled if the callback you passed to process.nextTick() is something CPU expensive like encrypting, calculating PI etc.
From what I understand you try to make your functions asynchronous by passing them to process.nextTick(). That is not how it works.
When you pass something to process.nextTick() it will execute before the eventloop is executed the next time. This will not make your function non-blocking, as the function you execute is still running in the main thread. Only I/O Operations can be non-blocking.
Therefore it is irrelevant if you wrap your CPU-intensive functions with process.nextTick() or just execute them right away.
If you want to read more background information, here is the resource: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/#process-nexttick
I still confused with the answer provided.
I watched short course on Lynda.com about NodeJS (Advanced NodeJS).
The guy provides the following example of using process.nextTick()
function hideString(str, done) {
process.nextTick(()=> {
done(str.replace(/[a-zA-Z]/g, 'X'))
})
}
hideString("Hello World", (hidden) => {
console.log( hidden );
});
console.log('end')
If you do not use, console.log('end') will be printed first. not async.
I understood it as to write async code, you will need process.nextTick.
Than it is not clear how async code is written in JS on frontend without process.next Tick()
Related
I am currently trying to test some Node.js code.
Right now I am trying to figure out why my code console.log()'s in different order relative to the code execution order. This is regarding OS related Node.js function calls.
My script does this in order:
//First
fs.readdir(){ //read from a directory all the available files
console.log some stuff #a
}
//Second
fs.readFile(){ //read entire text file
console.log some stuff #b
}
//Third
//create readline interface using fs.createReadStream(filedir) then
interface{ //read from text file line by line
console.log some stuff #c
}
Function a uses the path: __dirname + '/text_file/'
Function b and c uses the path: __dirname, '/text_file/text1.txt'
Output:
a
c
b
Can someone explain why this order happens? I am not an expert in Operating System or what is happening in the background with Node.js. Will this ordering mess up my app if I were to rely on this?
If you start multiple asynchronous operations like this one after the other, the completion order is indeterminate. These operations run independently of your Javascript. They are what is called "non-blocking" which means they return immediately from the function call and your Javascript keeps executing. Some time later, they finish, insert an event in the Javascript event queue and then the callback you passed to them gets called. The "asynchronous, non-blocking" aspect of these functions means that they don't "wait" for the operation to complete before returning.
The order of completion of multiple asynchronous operations that are all in flight at the same time will just depend upon which one finishes it's work before the others.
If you want them to run in sequence, then you have to specifically code them to run sequentially by starting the second operation in the completion callback of the first and starting the third one in the completion callback of the third.
For example:
fs.readdir(somePath, function(err, files) {
if (err) {
console.log(err);
} else {
fs.readFile(someFilename, function(err, data) {
if (err) {
console.log(err);
} else {
// done with both asynchronous operations here
console.log(data);
}
});
}
});
Some other relevant references on the topic:
How to let asynchronous readFile method follow order in node.js
How to synchronize a sequence of promises?
How do I set a specific order of execution when asynchronous calls are involved?
How does Asynchronous Javascript Execution happen? and when not to use return statement?
Single thread synchronous and asynchronous confusion
Why do callbacks functions allow us to do things asynchronously in Javascript?
readdir and readFile are asynchronous, that is, they do processing outside the main flow of code. That is why you get to give them a callback, which is executed once the operation finishes, which might take a long time (in a computer's notion of time).
You cannot rely on asynchronous functions being completed before the next synchronous line of code executes at all. That means, your code will eventually end up looking like this:
fs.readdir(someDir, function(err, files) {
if (err) return;
// do stuff with the list of files in the directory
fs.readFile(someFile, function(err, contents) {
if (err) return;
// do stuff with the file's contents
});
});
Why asynchronous, you might ask? Well, maybe you can go about doing other things while the (very slow) disk spins around trying to find your data. Stopping the whole program while the disk reads would be a very notable waste of time, so Node makes such operations that depend on disk or network asynchronous, which frees you to continue with any other processing until they are done.
Of course, that means you have to write code that depends on the output of such asynchronous operations inside the callback.
In another note:
Hopefully you can see that for big applications, this style of coding is very cumbersome, since the main code would ever go more to the right, and error tracking is tricky. That is why Promises were invented, and I would recommend you learn them, but after you understand normal asynchronicity.
All these functions are asynchronous and will be performed in parallel. fs.readFile will callback will execute after the whole file is read as for fs.createReadStream callback will execute when the first chunk of the stream is readable, and that´s faster than the first one.
Unlike traditional programming languages, JavaScript (and its variants like NodeJS) run the code in parallel, each asynchronous function runs independently from the others and because of this the program wont run in sequence, this makes the execution faster but more difficult to control
to handle synchronous functions there are libraries like ASYNC that will allow you to control what to do when one or more functions end, and will help you limit or eliminate the callback hell
https://www.sohamkamani.com/blog/2016/03/14/wrapping-your-head-around-async-programming/
Can anyone help me understand the function of NodeJS and performance impact for the below scenario.
a. Making the request to Rest API end point "/api/XXX". In this request, i am returning the response triggering the asynchronous function like below.
function update(req, res) {
executeUpdate(req.body); //Asynchronous function
res.send(200);
}
b. In this, I send the response back without waiting for the function to complete and this function executing four mongodb updates of different collection.
Questions:
As I read, the NodeJS works on the single thread, how this
asynchronous function is executing?
If there are multiple requests for same end point, how will be the
performance impact of NodeJS?
How exactly the NodeJS handles the asynchronous function of each
request, because as the NodeJS is runs on the single thread, is there
any possibility of the memory issue?
In short, it depends on what you are doing in your function.
The synchronous functions in node are executed on main thread, thus,
they will not preempt and execute until end of the function or until
return statement is encountered.
The async functions, on the other hand, are removed from main thread,
and will only be executed when async tasks are completed on a
separate worker thread.
There are, I think, two different parts in the answer to your question.
Actual Performance - which includes CPU & memory performance. It also obviously includes speed.
Understanding as the previous poster said, Sync and Async.
In dealing with #1 - actual performance the real only way to test it is to create or use a testing environment on your code. In a rudimentary way based upon the system you are using you can view some of the information in top (linux) or Glances will give you a basic idea of performance, but in order to know exactly what is going on you will need to apply some of the various testing environments or writing your own tests.
Approaching #2 - It is not only sync and async processes you have to understand, but also the ramifications of both. This includes the use of callbacks and promises.
It really all depends on the current process you are attempting to code. For instance, many Node programmers seem to prefer using promises when they make calls to MongoDB, especially when one requires more than one call based upon the return of the cursor.
There is really no written-in-stone formula for when you use sync or async processes. Avoiding callback hell is something all Node programmers try to do. Catching errors etc. is something you always need to be careful about. As I said some programmers will always opt for Promises or Async when dealing with returns of data. The famous Async library coupled with Bluebird are the choice of many for certain scenarios.
All that being said, and remember your question is general and therefore so is my answer, in order to properly know the implications on your performance, in memory, cpu and speed as well as in return of information or passing to the browser, it is a good idea to understand as best as you can sync, async, callbacks, promises and error catching. You will discover certain situations are great for sync (and much faster), while others do require async and/or promises.
Hope this helps somewhat.
The impression I get from people is... All JavaScript functions are synchronous unless used with process.nextTick. When's the best time to use it?
I want to make sure that I don't over use it in places where I don't need it. At this point, I'm thinking to use it right before something like a database call, however, at the same time, as I understand, those calls are asynchronous by default because of the whole "async IO" thing.
Are they to be used only when doing some intensive work within the JavaScript boundaries? Like parsing XML etc?
Btw, there's already a question like this but it seems dead so I raised another one.
I'm thinking to use it right before something like a database call, however, at the same time, as I understand, those calls are asynchronous by default because of the whole "async IO" thing.
Yes. The database driver itself should be natively asynchronous already, so you don't need to use process.nextTick yourself here to "make it asynchronous". The most time-consuming part is the IO and the computations inside the database, so waiting an extra tick just slows things down actually.
Are they to be used only when doing some intensive work within the JavaScript boundaries? Like parsing XML etc?
Yes, exactly. You can use it to prevent large synchronous functions from blocking your application. If you want to parse an XML file, instead of gnawing through it for 3 seconds during which no new connections can be opened, no requests received, and no responses be sent, you would stream the file and parse only small chunks of it every time before using nextTick and allowing other work to be done concurrently.
However, notice that the parser should use nextTick internally and offer an asynchronous API, instead of the caller using nextTick before invoking the parser.
This answer makes no claims of being complete, but here are my thoughts:
I can imagine two use cases. The first one is, to make sure something is really async. This comes in handy when using EventEmitter. Imagine you want to be able to use all methods of your emitter like this:
const EventEmitter = require('events');
class MyEmitter extends EventEmitter {
aMethod(){
console.log('some sync stuff');
this.emit('aMethodResponse');
return this;
}
}
var myEmitter = new MyEmitter();
myEmitter.aMethod()
.once('aMethodResponse', () => console.log('got response'));
This will simply not work as the event is fired before the listener is established. process.nextTick() makes sure that this won't happen.
aMethod(){
console.log('some sync stuff');
process.nextTick(() => this.emit('aMethodResponse'));
return this;
}
Edit: removed second suggestion because it was simply wrong
I'm learning node.js and I got most of the fundamentals down about asynchronous non-blocking I/O. My question is what's the point of creating a function with callbacks when the function itself isn't asynchronous. Even if the function you are creating has a call to an asynchronous function, I can't find a reason why you'd use a callback. I see this a lot in the node.js code i'm looking at.
For example, a function that sends an HTTP request and returns the parsed output of the request:
function withCallback(url, callback) {
request(url, function(err, response, html) {
if (err)
callback(err, null);
callback(null, JSON.parse(html));
});
}
function withoutCallback(url) {
request(url, function(err, response, html) {
if (err)
throw err;
return JSON.parse(html);
});
}
The first function with a callback returns the result through a callback while the second function just returns it normally.
Was going to write as a comment, but went a bit too long.
You are asking a couple of questions. To address the very correct point that the commenters make, the second example just won't work and as #Hawkings states more clearly, the result can't be captured (by your code). It won't work because the return in the second example the anonymous function you are creating (the actual callback being passed to request) is being invoked and returning its result deep within the request function. Also, in your example, control would have already returned to the caller of withoutCallback well before that return JSON.parse() line gets called, and as written, foo = withoutCallback(...) would result in foo being undefined.
If you look at the code for a library that uses callbacks you will see how these are invoked and it may make more sense why this isn't going to work. (Although I would suggest looking at a simpler library than request - if you are fairly new to node, I think you will find the request library to be a a bit confusing).
However, in the case of what you state your question is (which is not illustrated in your examples): "My question is what's the point of creating a function with callbacks when the function itself isn't asynchronous[?]"
There is not much point in that particular circumstance unless a) you want to future proof it in case it may become asynchronous because of added functionality or b) you want to have a common interface in which other implementations would be asynchronous. To use a browser example just because it comes readily to mind, if you were implementing a generic basic data storage solution, one implementation of which would use LocalStorage (synchronous) but others which might use IndexedDB, or a remote call (both asynchronous) - you would still want to write the LocalStorage implementation using callbacks so you could easily switch among the implementations.
If you don't like the callback style, consider learning to work with, and use libraries that make use of, other techniques or language features for handling asynchronicity, including Promises, Generators or in applicable cases, EventEmitters. I am personally a big fan of Promises. Having said that, I wouldn't suggest any of those until you get your head around the hows and whys of callbacks.
// synchronous Javascript
var result = db.get('select * from table1');
console.log('I am syncronous');
// asynchronous Javascript
db.get('select * from table1', function(result){
// do something with the result
});
console.log('I am asynchronous')
I know in synchronous code, console.log() executes after result is fetched from db, whereas in asynchronous code console.log() executes before the db.get() fetches the result.
Now my question is, how does the execution happen behind the scenes for asynchronous code and why is it non-blocking?
I have searched the Ecmascript 5 standard to understand how asynchronous code works but could not find the word asynchronous in the entire standard.
And from nodebeginner.org I also found out that we should not use a return statement as it blocks the event loop. But nodejs api and third party modules contain return statements everywhere. So when should a return statement be used and when shouldn't it?
Can somebody throw some light on this?
First of all, passing a function as a parameter is telling the function that you're calling that you would like it to call this function some time in the future. When exactly in the future it will get called depends upon the nature of what the function is doing.
If the function is doing some networking and the function is configured to be non-blocking or asychronous, then the function will execute, the networking operation will be started and the function you called will return right away and the rest of your inline javascript code after that function will execute. If you return a value from that function, it will return right away, long before the function you passed as a parameter has been called (the networking operation has not yet completed).
Meanwhile, the networking operation is going in the background. It's sending the request, listening for the response, then gathering the response. When the networking request has completed and the response has been collected, THEN and only then does the original function you called call the function you passed as a parameter. This may be only a few milliseconds later or it may be as long as minutes later - depending upon how long the networking operation took to complete.
What's important to understand is that in your example, the db.get() function call has long since completed and the code sequentially after it has also executed. What has not completed is the internal anonymous function that you passed as a parameter to that function. That's being held in a javascript function closure until later when the networking function finishes.
It's my opinion that one thing that confuses a lot of people is that the anonymous function is declared inside of your call to db.get and appears to be part of that and appears that when db.get() is done, this would be done too, but that is not the case. Perhaps that would look less like that if it was represented this way:
function getCompletionfunction(result) {
// do something with the result of db.get
}
// asynchronous Javascript
db.get('select * from table1', getCompletionFunction);
Then, maybe it would be more obvious that the db.get will return immediately and the getCompletionFunction will get called some time in the future. I'm not suggesting you write it this way, but just showing this form as a means of illustrating what is really happening.
Here's a sequence worth understanding:
console.log("a");
db.get('select * from table1', function(result){
console.log("b");
});
console.log("c");
What you would see in the debugger console is this:
a
c
b
"a" happens first. Then, db.get() starts its operation and then immediately returns. Thus, "c" happens next. Then, when the db.get() operation actually completes some time in the future, "b" happens.
For some reading on how async handling works in a browser, see How does JavaScript handle AJAX responses in the background?
jfriend00's answer explains asynchrony as it applies to most users quite well, but in your comment you seemed to want more details on the implementation:
[…] Can any body write some pseudo code, explaining the implementation part of the Ecmascript specification to achieve this kind of functionality? for better understanding the JS internals.
As you probably know, a function can stow away its argument into a global variable. Let's say we have a list of numbers and a function to add a number:
var numbers = [];
function addNumber(number) {
numbers.push(number);
}
If I add a few numbers, as long as I'm referring to the same numbers variable as before, I can access the numbers I added previously.
JavaScript implementations likely do something similar, except rather than stowing numbers away, they stow functions (specifically, callback functions) away.
The Event Loop
At the core of many applications is what's known as an event loop. It essentially looks like this:
loop forever:
get events, blocking if none exist
process events
Let's say you want to execute a database query like in your question:
db.get("select * from table", /* ... */);
In order to perform that database query, it will likely need to perform a network operation. Since network operations can take a significant amount of time, during which the processor is waiting, it makes sense to think that maybe we should, rather than waiting rather than doing some other work, just have it tell us when it's done so we can do other things in the mean time.
For simplicity's sake, I'll pretend that sending will never block/stall synchronously.
The functionality of get might look like this:
generate unique identifier for request
send off request (again, for simplicity, assuming this doesn't block)
stow away (identifier, callback) pair in a global dictionary/hash table variable
That's all get would do; it doesn't do any of the receiving bit, and it itself isn't responsible for calling your callback. That happens in the process events bit. The process events bit might look (partially) like this:
is the event a database response? if so:
parse the database response
look up the identifier in the response in the hash table to retrieve the callback
call the callback with the received response
Real Life
In real life, it's a little more complex, but the overall concept is not too different. If you want to send data, for example, you might have to wait until there's enough space in the operating system's outgoing network buffers before you can add your bit of data. When reading data, you might get it in multiple chunks. The process events bit probably isn't one big function, but itself just calling a bunch of callbacks (which then dispatch to more callbacks, and so on…)
While the implementation details between real life and our example are slightly different, the concept is the same: you kick off ‘doing something’, and a callback will be called through some mechanism or another when the work is done.