Impact of sync function inside async functions - javascript

Lets imagine an asynchronous function that loads a file first and does something asynchronously with it afterwards. The function can't continue without the file, so my assumption is that loading this file could be done synchronously (*):
const asyncFnWithSyncCode(filePath, next) {
// Load file
const file = fs.readFileSync(filePath)
// Continue to process file with async functions
// ...
next(null, processedFile)
}
asyncFnWithSyncCode could be called several times for different files:
async.parallel([
(done) => { asyncFnWithSyncCode('a.json', done) },
(done) => { asyncFnWithSyncCode('b.json', done) },
(done) => { asyncFnWithSyncCode('c.json', done) }
], next)
My question is: How does this impact the performance? Will the sync function cause the other readFileSyncs to be delayed? Will it have an impact at all?
Best-practices, resources and opinions are welcome. Thanks!
(*) I know that I could simply use the async readFile-version, but I would really like to know how it works in this special construction.

Will the sync function cause the other readFileSyncs to be delayed?
Yes. NodeJS runs all of your JavaScript code on a single thread, using an event loop (job queue), which is one of the reasons that using asynchronous system calls is strongly encouraged over synchronous ones.
readFile schedules the read operation and then lets other things happen on the JavaScript thread while the I/O layer is waiting for the data to come in; Node's I/O layer queues a task for the JavaScript thread when data is available, which is what ultimately makes your readFile callback get called.
In contrast, readFileSync holds up that one single JavaScript thread, waiting for the file data to become available. Since there's only one thread, that holds up everything else your code might otherwise be doing, including other readFileSync calls.
Your code doesn't need to use readFileSync (you almost never do); just use readFile's callback:
const asyncFnWithSyncCode(filePath, next) {
// Load file
fs.readFile(filePath, function(err, file) {
if (err) {
// ...handle error...
// ...continue if appropriate:
next(err, null);
} else {
// ...use `file`...
// Continue to process file with async functions
// ...
next(null, processedFile);
}
});
}

Related

What happens when some functions take a long time? Are they asynchronous?

Let's say I have code like:
app.get('/url', (req, res) => {
if (req.some_magic == 1) {
do_1();
}
});
function do_1() {
let requests = get_requests();
setTimeout(function() { request({
"uri": "url",
"method": "POST",
"json": rq
}, (err, res, body) => {
do_1();
})}, 1000})
}
Basically for some requests that come to /url, I have to send bunch of requests to some service. How can I make this asynchronous so other requests from other people coming to /url wouldn't have to wait for do_1 to be finished? Or Node is already working like that? If yes, do you have any quick explanations or tutorials I could look into to understand how this works? I come from LEMP, so it's super different. Thanks a lot.
Pretty much any function that involves getting data from outside of Node (such as a network request or file read) will use a function that is asynchronous. The documentation for the function should tell you (or at least imply it saying that the function returns a Promise or accepts a callback function as an argument).
The example you give shows the request module accepting a callback function.
The main exceptions are functions which are explicitly defined as being sync (such as fileWriteSync).
If you need to free up the main event loop explicitly, then you can use a worker thread. It's very rare that you will need to do this, and the main need comes when you are performing CPU intensive calculations in JS (which aren't farmed out to a library that is already asynchronous).

How do I know when asynchronous JavaScript execution complete in Rhino

I have a JavaScript code, which calls some asynchronous API and it works great. But I also need to call other API to report when script execution completed. The issue is that Context.evaluateString(...) returns immediately, but script code continues to execute because its asynchronous nature. JS example:
function f1(function (err, res) {
function f2(function (err, res) {
function f3(function (err, res) {
handleResult(err, res);
// ideally I need to know when handleResult(...) has completed execution
// but Rhino's Context.evaluateString(...) returns immediately
// after f1() is called, but script continues execution
});
});
});
Yes, I could add some method to script to call it from script when all operations done, and handle it on Java side, but this will force me to call it every time. This is just workaround.
But I need more generic way without applying any rules to script code.
Also, what if customer will forget to call say sendResult() from script? App on other side will wait for result forever. So I need bullet proof solution.
In iOS, using javascriptcore I just reacted when added to script engine top-level object destroyed, but in Java this trick doesn't work because unlike Objective-C/Swift, Java is not reference-counting but using GC and you never know when object will be deallocated.
I have no experience using Rhino, so take this answer with a grain of salt. However this answer might steer you in the right direction.
The documentation states:
evaluateString
...
Returns:
the result of evaluating the string
So I would create a Future that is returned by the JavaScript. Resolve the future after handleResult is executed. Then on the Java side, simply cast the result into the correct object, then wait for the value to be resolved.
// create an empty task
const future = new java.util.concurrent.FutureTask(function () {});
f1(function (err, res) {
f2(function (err, res) {
f3(function (err, res) {
handleResult(err, res);
// run the empty task, doing nothing more than resolving the future
future.run();
});
});
});
// return future to evaluateString
future;
You can find more info about Java objects in JavaScript here.

When working with NodeJS FS mkdir, what is the importance of including callbacks?

I'm playing with the NodeJS REPL console and following this tutorial.
http://www.tutorialspoint.com/nodejs/nodejs_file_system.htm
I'm focusing on the File System(FS) module. Let's look at the mkdir function used for creating directories.
According to TutorialsPoint, this is how you create a directory with FS
var fs = require("fs");
console.log("Going to create directory /tmp/test");
fs.mkdir('/tmp/test',function(err){
if (err) {
return console.error(err);
}
console.log("Directory created successfully!");
});
They specifically say you need this syntax
fs.mkdir(path[, mode], callback)
Well I just tried using less code without the callback and it worked.
var fs = require('fs');
fs.mkdir('new-directory');
And the directory was created. The syntax should just be
fs.mkdir(path);
I have to ask, what is the purpose of the callback and do you really need it? For removing a directory I could understand why you would need it, in case the directory didn't exist. But I can't see what could possibly go wrong with the mkdir command. Seems like a lot of unnecessary code.
As of node v10.0, the callback to fs.mkdir() is required. You must pass it, even if you just pass a dummy function that does nothing.
The point of the callback is to let you know if and when the call succeeded and if it didn't succeed, what the specific error was.
Remember, this type of function is asynchronous. It completes some unknown time in the future so the only way to know when it is done or if it completed successfully is by passing a callback function and when the callback is called, you can check the error and see that it has completed.
As it turns out, there are certainly things that can go wrong with mkdir() such as a bad path, a permissions error, etc... so errors can certainly happen. And, if you want to immediately use that new directory, you have to wait until the callback is called before using it.
In response to one of your other comments, the fs.mkdir() function is always asynchronous whether you pass the callback or not.
Here's an example:
var path = '/tmp/test';
fs.mkdir(path, function (err) {
if (err) {
console.log('failed to create directory', err);
} else {
fs.writeFile(path + "/mytemp", myData, function(err) {
if (err) {
console.log('error writing file', err);
} else {
console.log('writing file succeeded');
}
});
}
});
Note: Modern versions of nodejs, include fs.promises.mkdir() which returns a promise that resolves/rejects instead of using plain callbacks. This allows you to use await with try/catch or .then() and .catch() instead of the plain callback to know when it's done and promises make it typically easier to sequence in with other asynchronous operations and to centralize error handling.
Because mkdir is async.
Example:
If you do:
fs.mkdir('test');
fs.statSync('test').isDirectory();//might return false cause it might not be created yet
But if you do:
fs.mkdir('test', function() {
fs.statSync('test').isDirectory();//will be created at this point
});
You can still use mkdirSync if you need a sync version.
Many things could go wrong by using mkdir, and you should probably handle exceptions and errors and return them back to the user, when possible.
e.g. mkdir /foo/bar could go wrong, as you might need root (sudo) permissions in order to create a top-level folder.
However, the general idea behind callbacks is that the method you're using is asynchronous, and given the way Javascript works you might want to be notified and continue your program execution once the directory has been created.
Update: bare in mind that if you need — let's say — to save a file in the directory, you'll need to use that callback:
fs.mkdir('/tmp/test', function (err) {
if (err) {
return console.log('failed to write directory', err);
}
// now, write a file in the directory
});
// at this point, the directory has not been created yet
I also recommend you having a look at promises, which are now being used more often than callbacks.
Because it's an async call, it may be that further execution of the program depends on the outcome of the operation (dir created sucessfully). When the callback executes is the first point in time when this can be checked.
However, this operation is really fast, it may seem as it's happening instantly, but really (because it's async), the following line after fs.mkdir(path); will be executed without waiting for any feedback from the fs.mkdir(path); thus w/o any guarantee that the directory creation finished already, or if it failed.

Stringing together an unknown number of callbacks to be executed one after another

I have a phonegap application and I want it to upload multiple files to a server which is an async operation. However, because of bandwidth concerns I want to upload the files sequentially and alert the user to the progress.
However, I'm a bit stuck. Since the api is non-blocking and I am (somewhat) attempting to block, I'm not sure exactly how to do this.
I need to do something like this:
files[0].upload().done = function() {
files[1].upload().done = function() {
files[2].upload().done = function() {
files[3].....files[n]
}
}
}
How can I do this? At this time, I don't care about failed uploads.
In JavaScript you can define a function that returns a function.
function get_callback(index){
return function(){
// TODO: check if files[index] exists
files[index].upload().done = get_callback(index+1);
}
}
files[0].upload().done = get_callback(1);
You could use a Promises/Futures library, such as FuturesJS and one of its components, Sequence.
The sequence module allows one to chain asynchronous functions through callbacks. You first need to create the sequence, then append as many callbacks as you need. Every callback to the sequence object receives at least two arguments, next and err. You need to call next when the asynchronous function ends.
Something like this should work:
var Sequence = require('sequence').Sequence,
sequence = Sequence.create();
sequence
.then(function (next, err) {
if (err) {...}
files[1].upload(next);
})
.then(function (next, err) {
if (err) {...}
files[2].upload(next);
})
.then(...)
.then(function (next, err) {
console.log('all files uploaded');
});
Your upload function must receive a callback in order to the code above to work. If it doesn't, just change it in the following way:
var upload = function (callback) {
//the rest of your code
//at the very end
callback();
}

How to end on first async parallel task completion in Node?

I have a list of tasks that I want to run in parallel using https://github.com/caolan/async.
I want the program to proceed (probably through a callback) after the first of these parallel tasks is complete, not all of them. So I don't think the naive
async.parallel([task1, task2], callback)
works for me.
Alternatively I could spawn two tasks and cancel the incomplete one, but I can't figure out how to do that using async either.
Thanks!
-Charlie
Parallel Race
You can get async to initiate the final callback by returning an error that evaluates as true but isn't actually an error.
I've put together an example that uses -1 as an error code. In the final callback I check the error value and if it's not -1 then it's an actual error. If the error value is -1 then we'll have a valid value in results. At that point, we just need to remove extra elements from results of the other async functions that have not completed yet.
In the below example I've used the request module to pull html pages and the underscore module to filter the results in the final callback.
var request = require('request');
var _ = require('underscore');
exports.parallel = function(req, res) {
async.parallel([
/* Grab Google.jp */
function(callback) {
request("http://google.jp", function(err, response, body) {
if(err) { console.log(err); callback(true); return; }
callback(-1,"google.jp");
});
},
/* Grab Google.com */
function(callback) {
request("http://google.com", function(err, response, body) {
if(err) { console.log(err); callback(true); return; }
callback(-1,"google.com");
});
}
],
/* callback handler */
function(err, results) {
/* Actual error */
if(err && err!=-1) {
console.log(err);
return;
}
/* First data */
if(err===-1) {
/*
* async#parallel returns a list, one element per parallel function.
* Functions that haven't finished yet are in the list as undefined.
* use underscore to easily filter the one result.
*/
var one = _.filter(results, function(x) {
return (x===undefined ? false : true);
})[0];
console.log(results);
console.log(one);
res.send(one);
}
}
);
};
Remaining Function Results
When you setup async#parallel to work like this you won't have access to the results of the other asynchronous functions. If you're only interested in the first one to respond then this isn't a problem. However, you will not be able to cancel the other requests. That's most likely not a problem, but it might be a consideration.
The async.parallel documentation says:
If any of the functions pass an error to its callback, the main callback is immediately called
with the value of the error.
So you could return an error object from all of your parallel functors, and the first one to finish would jump you to the completion callback. Perhaps even your own special error class, so you can tell the difference between an actual error and a "hey I won" error.
Having said that, you would still have your parallel functions running, potentially waiting for callbacks to complete or whatever. Perhaps you could use async.parallelLimit to make sure you're not firing off too many tasks in parallel ?
Having said all that, it's possible you are better served by trying another method from the async library for this task - firing off parallel tasks then having these tasks race each other may not be the best idea.

Categories