I'm learning node.js and I got most of the fundamentals down about asynchronous non-blocking I/O. My question is what's the point of creating a function with callbacks when the function itself isn't asynchronous. Even if the function you are creating has a call to an asynchronous function, I can't find a reason why you'd use a callback. I see this a lot in the node.js code i'm looking at.
For example, a function that sends an HTTP request and returns the parsed output of the request:
function withCallback(url, callback) {
request(url, function(err, response, html) {
if (err)
callback(err, null);
callback(null, JSON.parse(html));
});
}
function withoutCallback(url) {
request(url, function(err, response, html) {
if (err)
throw err;
return JSON.parse(html);
});
}
The first function with a callback returns the result through a callback while the second function just returns it normally.
Was going to write as a comment, but went a bit too long.
You are asking a couple of questions. To address the very correct point that the commenters make, the second example just won't work and as #Hawkings states more clearly, the result can't be captured (by your code). It won't work because the return in the second example the anonymous function you are creating (the actual callback being passed to request) is being invoked and returning its result deep within the request function. Also, in your example, control would have already returned to the caller of withoutCallback well before that return JSON.parse() line gets called, and as written, foo = withoutCallback(...) would result in foo being undefined.
If you look at the code for a library that uses callbacks you will see how these are invoked and it may make more sense why this isn't going to work. (Although I would suggest looking at a simpler library than request - if you are fairly new to node, I think you will find the request library to be a a bit confusing).
However, in the case of what you state your question is (which is not illustrated in your examples): "My question is what's the point of creating a function with callbacks when the function itself isn't asynchronous[?]"
There is not much point in that particular circumstance unless a) you want to future proof it in case it may become asynchronous because of added functionality or b) you want to have a common interface in which other implementations would be asynchronous. To use a browser example just because it comes readily to mind, if you were implementing a generic basic data storage solution, one implementation of which would use LocalStorage (synchronous) but others which might use IndexedDB, or a remote call (both asynchronous) - you would still want to write the LocalStorage implementation using callbacks so you could easily switch among the implementations.
If you don't like the callback style, consider learning to work with, and use libraries that make use of, other techniques or language features for handling asynchronicity, including Promises, Generators or in applicable cases, EventEmitters. I am personally a big fan of Promises. Having said that, I wouldn't suggest any of those until you get your head around the hows and whys of callbacks.
Related
If you know that the Promise has already been resolved why can't you just call get() on it and receive the value? As opposed to using then(..) with a callback function.
So instead of doing:
promise.then(function(value) {
// do something with value
});
I want to be able to do the much simpler:
var value = promise.get();
Java offers this for it's CompletableFuture and I see no reason why JavaScript couldn't offer the same.
Java's get method "Waits if necessary for this future to complete", i.e. it blocks the current thread. We absolutely never want to do that in JavaScript, which has only one "thread".
It would have been possible to integrate methods in the API to determine synchronously whether and with what results the promise completed, but it's a good thing they didn't. Having only one single method, then, to get results when they are available, makes things a lot easier, safer and more consistent. There's no benefit in writing your own if-pending-then-this-else-that logic, it only opens up possibilities for mistakes. Asynchrony is hard.
Of course it not, because the task will run asynchronously so you can't get result immediately.
But you can use a sync/await to write sequential asynchronous code.
I'm writing a node backend and I'm a little confused how should I deal with async functions. I've read about process.nextTick(), but how often should I use. Most of my code is based on callbacks, like database calls, which are asynchronous by themselves. But I also have a few functions of my own, that should be async.
So which one is a good example of async function?
function validateUser1(user, callback) {
process.nextTick(function() {
//validate user, some regex and stuff
callback(err, user);
});
}
function validateUser2(user, callback) {
//validate user, some regex and stuff
process.nextTick(callback, err, user);
}
function validateUser3(user, callback) {
process.nextTick(function() {
//validate user, some regex and stuff
process.nextTick(callback, err, user);
});
}
I don't know whether I should wrap everything in process.nextTick , or wrap just the callback? or both?
And overall, the idea with node.js is to write lots of small functions rather than bigger ones, and call them asynchronously to not block other events, right?
If you have just CPU code (no I/O) you should try and go as far along as you can. Avoid async and tiny functions which fragment your code unnecessarily.
Take the opportunity and write clean, readable, linear code whenever possible. Only revert to async when absolutely necessary, such as stream I/O (file or network).
Consider this. Even if you have 1000+ lines of JS code, it will still be executed blazingly fast. You really do not need to fragment it (unless proven to be too cumbersome, such as very deep loops, but you have to measure it first)!
If you don't test the linear code first and actually SEE that you need to fragment it, you'll end up with premature optimization, which is a bad thing for maintainability.
I'd really go straight away with this:
function validateUser1(user, callback) {
//validate user, some regex and stuff
callback(err, user);
}
And if possible, remove the function altogether (but this is a matter of how you write the rest of the code).
Also, don't use nextTick() if you don't really need it. I've implemented a cloud server with many TCP/IP sockets, database connections, logging, file reading and a lot of I/O, but NOT ONCE did I use nextTick() and it runs really smooth.
process.nextTick() will execute your callback before continuing with the event loop. This will block your thread and can stop incoming connections from being handled if the callback you passed to process.nextTick() is something CPU expensive like encrypting, calculating PI etc.
From what I understand you try to make your functions asynchronous by passing them to process.nextTick(). That is not how it works.
When you pass something to process.nextTick() it will execute before the eventloop is executed the next time. This will not make your function non-blocking, as the function you execute is still running in the main thread. Only I/O Operations can be non-blocking.
Therefore it is irrelevant if you wrap your CPU-intensive functions with process.nextTick() or just execute them right away.
If you want to read more background information, here is the resource: https://nodejs.org/en/docs/guides/event-loop-timers-and-nexttick/#process-nexttick
I still confused with the answer provided.
I watched short course on Lynda.com about NodeJS (Advanced NodeJS).
The guy provides the following example of using process.nextTick()
function hideString(str, done) {
process.nextTick(()=> {
done(str.replace(/[a-zA-Z]/g, 'X'))
})
}
hideString("Hello World", (hidden) => {
console.log( hidden );
});
console.log('end')
If you do not use, console.log('end') will be printed first. not async.
I understood it as to write async code, you will need process.nextTick.
Than it is not clear how async code is written in JS on frontend without process.next Tick()
I'd like to be able to take a function that doesn't take a callback and determine if it will execute asynchronously.
In particular, I'm working with Node.js on the Intel Edison using mraa, and it has native C++ implemented functions like i2c.readReg(address) that doesn't accept a callback.
How can I determine if a function is blocking the processor for other system processes?
How can I determine if other JS can run in the interim?
Or am I not even approaching this the right way?
You can't really determine asynchronicity programmatically. It should be clear from the API presented because if it's asynchronous, then there pretty much have to be signs of that in the way you use it.
If a function is asynchronous, then that means that it does not directly return the result from the function call because the function returns before the result is ready. As such, the documentation for the function has to tell you how to obtain the result and if it's asynchronous there has to be another mechanism such as:
a callback function you can pass in
a returned promise
some sort of event listener on the object
some other notification mechanism
examine the code of the function
function naming convention (such as the suffix "Sync" that node.js uses)
If the function directly returns the result of the function call, then it is synchronous and other Javascript code will not run during that call.
If a function is not already asynchronous, the only way to turn that into an async operation is to run it in a different thread or process and marshall the value back to the main thread (calling some sort of callback in the main thread when the value is ready).
you can analyze js transformed into an abstract syntax tree with a tool like acorn. you could check to see if function arguments get executed. however, it would be difficult to tell if it was being executed as a callback or for some other purpose. you could also check to see if blocking functions were being called.
i'm not sure if this would get you all the way there but it would be a handy tool to have.
This is not exactly a solution, just a hint on cases which might be indeed a solution.
Many franeworks or libraries, define functions which can work either synchronously or asynchronously, depending on whether the last function argument (the callback) is given.
For example:
function dummy_sync_or_async(data, callback)
{
if (callback)
{
// call async code
//..
callback(result);
}
else
{
// call sync code
// ..
return result;
}
}
Then one can check the number of arguments received (for example if working as a proxy to these function methods) and check these against the function signature (i.e function.length)
Then one can decide whether the function is called in sync or async mode.
I'm quite new to Node.js and I'm trying to wrap my head around its error-first callback style.
I was curious as to how a function would know if the first parameter was for handling errors and when it's for handling requests and what not since those variable names are arbitrary.
Here are some examples of code with and without the err first
fs.readFile('/foo.txt', function(err, data) {
if (err) {
console.log('Ahh! An Error!');
return;
}
console.log(data);
});
and
http.createServer(function(req, res) { ... });
I believe the Express framework does this by counting the number of arguments, having err as the optional first arg but in these cases, there are only two arguments.
Having the first argument to a callback be the error code is just a convention that node.js uses in its runtime library for all asynchronous function callbacks and has now become a somewhat standard in the node.js world for all asynchronous callbacks even in add-on modules.
You only "know" it is a convention being used because of the documentation. There is no run-time check that this is the convention being used.
FYI, this is true of all callbacks in Javascript. You only know what arguments are what by the documentation for the callback.
Note that callbacks in Javascript are used for different types of things. Some uses need to communicate an error status and some do not. For example, an http request handler is a callback, but there is no error to communicate to the callback. Either there's an incoming http request and the callback is called or there is not and it is not called. There is no error status to communicate to that type of callback so there is no error argument passed to it.
But, pretty much all callbacks that signal the completion of an asynchronous operation can end with an error and therefore must pass the err argument to the callback so the callback knows what happened and whether the async operation was successful or not.
There are two very different uses for callbacks. Only the callbacks that need to communicate a possible error status will include the err argument that you asked about.
For example, event handler callbacks (callbacks that are called when some event occurs - like a click in the browser, an incoming http request, an incoming webSocket packet, a timer event) generally don't have an error status associated with them so they don't use the err argument convention.
This makes it a little easier for me. Think of the function that codes the objects as having a last line something like this tacked on to the function.
if (callback && typeof(callback) === "function") {
callback(param1, param2);
}
It will in practice be more general than that. But you can work your way down to a more general approach ( using an arguments array etc ) once you know that's there.
I got that here in the comments when I was looking for the same thing. http://www.impressivewebs.com/callback-functions-javascript/
// synchronous Javascript
var result = db.get('select * from table1');
console.log('I am syncronous');
// asynchronous Javascript
db.get('select * from table1', function(result){
// do something with the result
});
console.log('I am asynchronous')
I know in synchronous code, console.log() executes after result is fetched from db, whereas in asynchronous code console.log() executes before the db.get() fetches the result.
Now my question is, how does the execution happen behind the scenes for asynchronous code and why is it non-blocking?
I have searched the Ecmascript 5 standard to understand how asynchronous code works but could not find the word asynchronous in the entire standard.
And from nodebeginner.org I also found out that we should not use a return statement as it blocks the event loop. But nodejs api and third party modules contain return statements everywhere. So when should a return statement be used and when shouldn't it?
Can somebody throw some light on this?
First of all, passing a function as a parameter is telling the function that you're calling that you would like it to call this function some time in the future. When exactly in the future it will get called depends upon the nature of what the function is doing.
If the function is doing some networking and the function is configured to be non-blocking or asychronous, then the function will execute, the networking operation will be started and the function you called will return right away and the rest of your inline javascript code after that function will execute. If you return a value from that function, it will return right away, long before the function you passed as a parameter has been called (the networking operation has not yet completed).
Meanwhile, the networking operation is going in the background. It's sending the request, listening for the response, then gathering the response. When the networking request has completed and the response has been collected, THEN and only then does the original function you called call the function you passed as a parameter. This may be only a few milliseconds later or it may be as long as minutes later - depending upon how long the networking operation took to complete.
What's important to understand is that in your example, the db.get() function call has long since completed and the code sequentially after it has also executed. What has not completed is the internal anonymous function that you passed as a parameter to that function. That's being held in a javascript function closure until later when the networking function finishes.
It's my opinion that one thing that confuses a lot of people is that the anonymous function is declared inside of your call to db.get and appears to be part of that and appears that when db.get() is done, this would be done too, but that is not the case. Perhaps that would look less like that if it was represented this way:
function getCompletionfunction(result) {
// do something with the result of db.get
}
// asynchronous Javascript
db.get('select * from table1', getCompletionFunction);
Then, maybe it would be more obvious that the db.get will return immediately and the getCompletionFunction will get called some time in the future. I'm not suggesting you write it this way, but just showing this form as a means of illustrating what is really happening.
Here's a sequence worth understanding:
console.log("a");
db.get('select * from table1', function(result){
console.log("b");
});
console.log("c");
What you would see in the debugger console is this:
a
c
b
"a" happens first. Then, db.get() starts its operation and then immediately returns. Thus, "c" happens next. Then, when the db.get() operation actually completes some time in the future, "b" happens.
For some reading on how async handling works in a browser, see How does JavaScript handle AJAX responses in the background?
jfriend00's answer explains asynchrony as it applies to most users quite well, but in your comment you seemed to want more details on the implementation:
[…] Can any body write some pseudo code, explaining the implementation part of the Ecmascript specification to achieve this kind of functionality? for better understanding the JS internals.
As you probably know, a function can stow away its argument into a global variable. Let's say we have a list of numbers and a function to add a number:
var numbers = [];
function addNumber(number) {
numbers.push(number);
}
If I add a few numbers, as long as I'm referring to the same numbers variable as before, I can access the numbers I added previously.
JavaScript implementations likely do something similar, except rather than stowing numbers away, they stow functions (specifically, callback functions) away.
The Event Loop
At the core of many applications is what's known as an event loop. It essentially looks like this:
loop forever:
get events, blocking if none exist
process events
Let's say you want to execute a database query like in your question:
db.get("select * from table", /* ... */);
In order to perform that database query, it will likely need to perform a network operation. Since network operations can take a significant amount of time, during which the processor is waiting, it makes sense to think that maybe we should, rather than waiting rather than doing some other work, just have it tell us when it's done so we can do other things in the mean time.
For simplicity's sake, I'll pretend that sending will never block/stall synchronously.
The functionality of get might look like this:
generate unique identifier for request
send off request (again, for simplicity, assuming this doesn't block)
stow away (identifier, callback) pair in a global dictionary/hash table variable
That's all get would do; it doesn't do any of the receiving bit, and it itself isn't responsible for calling your callback. That happens in the process events bit. The process events bit might look (partially) like this:
is the event a database response? if so:
parse the database response
look up the identifier in the response in the hash table to retrieve the callback
call the callback with the received response
Real Life
In real life, it's a little more complex, but the overall concept is not too different. If you want to send data, for example, you might have to wait until there's enough space in the operating system's outgoing network buffers before you can add your bit of data. When reading data, you might get it in multiple chunks. The process events bit probably isn't one big function, but itself just calling a bunch of callbacks (which then dispatch to more callbacks, and so on…)
While the implementation details between real life and our example are slightly different, the concept is the same: you kick off ‘doing something’, and a callback will be called through some mechanism or another when the work is done.