Let's say I have code like:
app.get('/url', (req, res) => {
if (req.some_magic == 1) {
do_1();
}
});
function do_1() {
let requests = get_requests();
setTimeout(function() { request({
"uri": "url",
"method": "POST",
"json": rq
}, (err, res, body) => {
do_1();
})}, 1000})
}
Basically for some requests that come to /url, I have to send bunch of requests to some service. How can I make this asynchronous so other requests from other people coming to /url wouldn't have to wait for do_1 to be finished? Or Node is already working like that? If yes, do you have any quick explanations or tutorials I could look into to understand how this works? I come from LEMP, so it's super different. Thanks a lot.
Pretty much any function that involves getting data from outside of Node (such as a network request or file read) will use a function that is asynchronous. The documentation for the function should tell you (or at least imply it saying that the function returns a Promise or accepts a callback function as an argument).
The example you give shows the request module accepting a callback function.
The main exceptions are functions which are explicitly defined as being sync (such as fileWriteSync).
If you need to free up the main event loop explicitly, then you can use a worker thread. It's very rare that you will need to do this, and the main need comes when you are performing CPU intensive calculations in JS (which aren't farmed out to a library that is already asynchronous).
Related
I have a JavaScript code, which calls some asynchronous API and it works great. But I also need to call other API to report when script execution completed. The issue is that Context.evaluateString(...) returns immediately, but script code continues to execute because its asynchronous nature. JS example:
function f1(function (err, res) {
function f2(function (err, res) {
function f3(function (err, res) {
handleResult(err, res);
// ideally I need to know when handleResult(...) has completed execution
// but Rhino's Context.evaluateString(...) returns immediately
// after f1() is called, but script continues execution
});
});
});
Yes, I could add some method to script to call it from script when all operations done, and handle it on Java side, but this will force me to call it every time. This is just workaround.
But I need more generic way without applying any rules to script code.
Also, what if customer will forget to call say sendResult() from script? App on other side will wait for result forever. So I need bullet proof solution.
In iOS, using javascriptcore I just reacted when added to script engine top-level object destroyed, but in Java this trick doesn't work because unlike Objective-C/Swift, Java is not reference-counting but using GC and you never know when object will be deallocated.
I have no experience using Rhino, so take this answer with a grain of salt. However this answer might steer you in the right direction.
The documentation states:
evaluateString
...
Returns:
the result of evaluating the string
So I would create a Future that is returned by the JavaScript. Resolve the future after handleResult is executed. Then on the Java side, simply cast the result into the correct object, then wait for the value to be resolved.
// create an empty task
const future = new java.util.concurrent.FutureTask(function () {});
f1(function (err, res) {
f2(function (err, res) {
f3(function (err, res) {
handleResult(err, res);
// run the empty task, doing nothing more than resolving the future
future.run();
});
});
});
// return future to evaluateString
future;
You can find more info about Java objects in JavaScript here.
Admittedly I'm a novice with node, but it seems like this should be working fine. I am using multiparty to parse a form, which returns an array. I am then using a for each to step through the array. However - the for each is not waiting for the inner code to execute. I am a little confused as to why it is not, though.
var return_GROBID = function(req, res, next) {
var form = new multiparty.Form();
var response_array = [];
form.parse(req, function(err, fields, files) {
files.PDFs.forEach(function (element, index, array) {
fs.readFile(element.path, function (err, data) {
var newPath = __dirname + "/../public/PDFs/" + element.originalFilename;
fs.writeFile(newPath, data, function (err) {
if(err) {
res.send(err);
}
GROBIDrequest.GROBID2js(newPath, function(response) {
response_array.push(response);
if (response_array.length == array.length) {
res.locals.body = response_array;
next();
}
});
});
});
});
});
}
If someone can give me some insight on the proper way to do this that would be great.
EDIT: The mystery continues. I ran this code on another machine and IT WORKED. What is going on? Why would one machine be inconsistent with another?
I'd guess the PDFs.forEach is you just calling the built-in forEach function, correct?
In Javascript many things are asynchronous - meaning that given:
linea();
lineb();
lineb may be executed before linea has finished whatever operation it started (because in asynchronous programming, we don't wait around until a network request comes back, for example).
This is different from other programming languages: most languages will "block" until linea is complete, even if linea could take time (like making a network request). (This is called synchronous programming).
With that preamble done, back to your original question:
So forEach is a synchronous function. If you rewrote your code like the following, it would work (but not be useful):
PDFs.forEach(function (element, index, array) {
console.log(element.path)
}
(console.log is a rare synchronous method in Javascript).
But in your forEach loop you have fs.readFile. Notice that last parameter, a function? Node will call that function back when the operation is complete (a callback).
Your code will currently, and as observed, hit that fs.readFile, say, "ok, next thing", and move on to the next item in the loop.
One way to fix this, with the least changing the code, is to use the async library.
async.forEachOf(PDFs, function(value, key, everythingAllDoneCallback) {
GROBIDrequest.GROBID2js(newPath, function(response) {
response_array.push(response);
if (response_array.length = array.length) {
...
}
everythingAllDoneCallback(null)
} );
With this code you are going through all your asynchronous work, then triggering the callback when it's safe to move on to the next item in the list.
Node and callbacks like this are a very common Node pattern, it should be well covered by beginner material on Node. But it is one of the most... unexpected concepts in Node development.
One resource I found on this was (one from a set of lessons) about NodeJS For Beginners: Callbacks. This, and playing around with blocking (synchronous) and non-blocking (asynchronous) functions, and hopefully this SO answer, may provide some enlightenment :)
I have this code:
var resources = myFunc();
myFunc2(resources);
The problem is that JavaScript calls myFunc() asynchronous, and then myFunc2(), but I don't have the results of myFunc() yet.
Is there a way to block the first call? Or a way to make this work?
The reason why this code doesn't work represents the beauty and pitfalls of async javascript. It doesn't work because it is not supposed to.
When the first line of code is executed, you have basically told node to go do something and let you know when it is done. It then moves on to execute the next line of code - which is why you don't have the response yet when you get here. For more on this, I would study the event-loop in greater detail. It's a bit abstract, but it might help you wrap your head around control flow in node.
This is where callbacks come in. A callback is basically a function you pass to another function that will execute when that second function is complete. The usual signature for a callback is (err, response). This enables you to check for errors and handle them accordingly.
//define first
var first = function ( callback ) {
//This function would do something, then
// when it is done, you callback
// if no error, hand in null
callback(err, res);
};
//Then this is how we call it
first ( function (err, res) {
if ( err ) { return handleError(err); }
//Otherwise do your thing
second(res)
});
As you might imagine, this can get complicated really quickly. It is not uncommon to end up with many nested callbacks which make your code hard to read and debug.
Extra:
If you find yourself in this situation, I would check out the async library. Here is a great tutorial on how to use it.
myFunc(), if asynchronous, needs to accept a callback or return a promise. Typically, you would see something like:
myFunc(function myFuncCallback (resources) {
myFunc2(resources);
});
Without knowing more about your environment and modules, I can't give you specific code. However, most asynchronous functions in Node.js allow you to specify a callback that will be called once the function is complete.
Assuming that myFunc calls some async function, you could do something like this:
function myFunc(callback) {
// do stuff
callSomeAsyncFunction(callback);
}
myFunc(myFunc2);
Today is my first foray into nodejs and I am particularly stumped trying to understand the way the following piece of logic flows. The logic is as follows:
request({ uri: db.createDbQuery('identifier:abcd1234') },
function(err, response, body) {
response.should.have.status(200);
var search = JSON.parse(body);
search.response.numFound.should.equal(1);
done();
});
});
At a higher level I do understand is that an http request is being made and the function is being called at some juncture that is taking the response and doing something to it. What I am trying to understand is the proper order of the calls and how does the binding of variables take place in the above given logic. How does the compiler know how to bind the return values from the request to the anonymous function? Basically, I want to gain an understanding on how things work under the hood for this snippet.
Thanks
Your question isnt specific to node.js, this is basically a feature of javascript.
Basically you are calling request() which is defined like function request(obj, callback)
Internally, the http request is being called, and once its completed, it calls callback which is actually a function pointer.
function request(obj, callback){
//http request logic...
var err = request_logic_internal_function();
var response = ...
var body = ...
callback(err, response, body)
}
Your code can actually be restructured as :
var options = { uri: db.createDbQuery('identifier:abcd1234') };
var request_callback = function(err, response, body) {
response.should.have.status(200);
var search = JSON.parse(body);
search.response.numFound.should.equal(1);
done();
};
request(options, request_callback);
What you're basically doing is sending in a function pointer as a variable.
I don't know what library(ies) you're using, and it looks like you may have anonymized them by assigning methods into your code's global scope like request, done, and db.
What I can say is this:
That indentation is horrible and initially misled me on what it was doing, please gg=G (vim syntax) your code so it's properly indented.
request takes two arguments, a configuration object and a callback.
db.createDbQuery must be a blocking method or the anonymous object you're creating won't have the proper value.
request uses that configuration value, makes a non-blocking I/O request of some kind, and later will call the callback function you provide. That means that the code immediately after that request call will execute before the callback you provide will execute.
Some time later the request data will come back, Node.js's event loop will provide the data to the library's registered event handler (which may or may not be your callback directly -- it could do something to it and then call your event handler afterwards, you don't know or really care).
Then the function does some checks that will throw errors if they fail, and finally calls a done function in its scope (defined somewhere else) that will execute and continue the logical stream of execution.
I have a function in my nodejs application called get_source_at. It takes a uri as an argument and its purpose is to return the source code from that uri. My problem is that I don't know how to make the function synchronously call request, rather than giving it that callback function. I want control flow to halt for the few seconds it takes to load the uri. How can I achieve this?
function get_source_at(uri){
var source;
request({ uri:uri}, function (error, response, body) {
console.log(body);
});
return source;
}
Also, I've read about 'events' and how node is 'evented' and I should respect that in writing my code. I'm happy to do that, but I have to have a way to make sure I have the source code from a uri before continuing the control flow of my application - so if that's not by making the function synchronous, how can it be done?
You can with deasync:
function get_source_at(uri){
var source;
request({ uri:uri}, function (error, response, body) {
source = body;
console.log(body);
});
while(source === undefined) {
require('deasync').runLoopOnce();
}
return source;
}
You should avoid synchronous requests. If you want something like synchronous control flow, you can use async.
async.waterfall([
function(callback){
data = get_source_at(uri);
callback(null, data);
},
function(data,callback){
process(data, callback);
},
], function (err,result) {
console.log(result)
});
The process is promised to be run after get_source_at returns.
This is better way of using deasync.
var request = require("request")
var deasync = require("deasync")
var getHtml = deasync(function (url, cb) {
var userAgent = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"}
request({
url: url,
headers: userAgent
},
function (err, resp, body) {
if (err) { cb(err, null) }
cb(null, body)
})
})
var title = /<title>(.*?)<\/title>/
var myTitle = getHtml("http://www.yahoo.com").match(title)[1]
console.log(myTitle)
Please refer to documentation of deasync, you will find that you can use desync(function (n params, cb) {})
to make the function where cb should come back with (err, data). So fs.readFile() like functions can be easily wrapped with deasync function. But for functions like request which don't come back with cb(err, data). You can make you own function (named or anonymous) with a custom cb(err, data) callback format just as I have done in the above code. This way you can force almost any async function perform like sync by waiting for callback cb(err, data) to come back on a different javascript layer (as the documentation says). Also make sure that you have covered all ways to get out from the function which you are wrapping with deasync with cb(err, data) callbacks, otherwise your program will block.
Hope, it helps someone out there!
Update:
Don't use this way of doing synchronous requests. Use Async/Await for writting promises based synchronous looking code. You can use request-promise-native npm module to avoid wrapping requests module with promises yourself.
Having a simple blocking function is a great boon for interactive development! The sync function (defined below) can synchronize any promise, cutting down dramatically on the amount of syntax needed to play with an API and learn it. For example, here's how to use it with the puppeteer library for headless Chrome:
var browser = sync(puppeteer.connect({ browserWSEndpoint: "ws://some-endpoint"}));
var pages = sync(browser.pages())
pages.length
1
var page = pages[0]
sync(page.goto('https://duckduckgo.com', {waitUntil: 'networkidle2'}))
sync(page.pdf({path: 'webpage.pdf', format: 'A4'}))
The best part is, each one of these lines can be tweaked until it does what you want, without having to re-run or re-type all of the previous lines each time you want to test it. This works because you have direct access to the browser and pages variables from the top-level.
Here's how it works:
const deasync = require("deasync");
const sync = deasync((promise, callback) => promise.then(result) => callback(null, result)));
It uses the deasync package mentioned in other answers. deasync creates a partial application to the anonymous function, which adds callback as the last argument, and blocks until callback has been called. callback receives the error condition as its first argument (if any), and the result as its second (if any).
I have to have a way to make sure I have the source code from a uri before continuing the control flow of my application - so if that's not by making the function synchronous, how can it be done?
Given this entry point to your application:
function app(body) {
// Doing lots of rad stuff
}
You kick it off by fetching the body:
request({ uri: uri }, function (error, response, body) {
if(err) return console.error(err);
// Start application
app(body);
}
This is something you will have to get used to when programming for node.js (and javascript in general). There are control flow modules like async (which I, too, recommend) but you have to get used to continuation passing style, as it's called.
Ok, first of all, to keep that code asynchronous you can simply place the relevant code inside the callback of the request function meaning it will run after the request finished, but not stop the processor from handling other tasks in your application. If you need it multiple times I would advice you to check out Synchronous request in Node.js which outlines various methods to get this more streamlined and discusses various control flow libraries.