I have a function in my nodejs application called get_source_at. It takes a uri as an argument and its purpose is to return the source code from that uri. My problem is that I don't know how to make the function synchronously call request, rather than giving it that callback function. I want control flow to halt for the few seconds it takes to load the uri. How can I achieve this?
function get_source_at(uri){
var source;
request({ uri:uri}, function (error, response, body) {
console.log(body);
});
return source;
}
Also, I've read about 'events' and how node is 'evented' and I should respect that in writing my code. I'm happy to do that, but I have to have a way to make sure I have the source code from a uri before continuing the control flow of my application - so if that's not by making the function synchronous, how can it be done?
You can with deasync:
function get_source_at(uri){
var source;
request({ uri:uri}, function (error, response, body) {
source = body;
console.log(body);
});
while(source === undefined) {
require('deasync').runLoopOnce();
}
return source;
}
You should avoid synchronous requests. If you want something like synchronous control flow, you can use async.
async.waterfall([
function(callback){
data = get_source_at(uri);
callback(null, data);
},
function(data,callback){
process(data, callback);
},
], function (err,result) {
console.log(result)
});
The process is promised to be run after get_source_at returns.
This is better way of using deasync.
var request = require("request")
var deasync = require("deasync")
var getHtml = deasync(function (url, cb) {
var userAgent = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.111 Safari/537.36"}
request({
url: url,
headers: userAgent
},
function (err, resp, body) {
if (err) { cb(err, null) }
cb(null, body)
})
})
var title = /<title>(.*?)<\/title>/
var myTitle = getHtml("http://www.yahoo.com").match(title)[1]
console.log(myTitle)
Please refer to documentation of deasync, you will find that you can use desync(function (n params, cb) {})
to make the function where cb should come back with (err, data). So fs.readFile() like functions can be easily wrapped with deasync function. But for functions like request which don't come back with cb(err, data). You can make you own function (named or anonymous) with a custom cb(err, data) callback format just as I have done in the above code. This way you can force almost any async function perform like sync by waiting for callback cb(err, data) to come back on a different javascript layer (as the documentation says). Also make sure that you have covered all ways to get out from the function which you are wrapping with deasync with cb(err, data) callbacks, otherwise your program will block.
Hope, it helps someone out there!
Update:
Don't use this way of doing synchronous requests. Use Async/Await for writting promises based synchronous looking code. You can use request-promise-native npm module to avoid wrapping requests module with promises yourself.
Having a simple blocking function is a great boon for interactive development! The sync function (defined below) can synchronize any promise, cutting down dramatically on the amount of syntax needed to play with an API and learn it. For example, here's how to use it with the puppeteer library for headless Chrome:
var browser = sync(puppeteer.connect({ browserWSEndpoint: "ws://some-endpoint"}));
var pages = sync(browser.pages())
pages.length
1
var page = pages[0]
sync(page.goto('https://duckduckgo.com', {waitUntil: 'networkidle2'}))
sync(page.pdf({path: 'webpage.pdf', format: 'A4'}))
The best part is, each one of these lines can be tweaked until it does what you want, without having to re-run or re-type all of the previous lines each time you want to test it. This works because you have direct access to the browser and pages variables from the top-level.
Here's how it works:
const deasync = require("deasync");
const sync = deasync((promise, callback) => promise.then(result) => callback(null, result)));
It uses the deasync package mentioned in other answers. deasync creates a partial application to the anonymous function, which adds callback as the last argument, and blocks until callback has been called. callback receives the error condition as its first argument (if any), and the result as its second (if any).
I have to have a way to make sure I have the source code from a uri before continuing the control flow of my application - so if that's not by making the function synchronous, how can it be done?
Given this entry point to your application:
function app(body) {
// Doing lots of rad stuff
}
You kick it off by fetching the body:
request({ uri: uri }, function (error, response, body) {
if(err) return console.error(err);
// Start application
app(body);
}
This is something you will have to get used to when programming for node.js (and javascript in general). There are control flow modules like async (which I, too, recommend) but you have to get used to continuation passing style, as it's called.
Ok, first of all, to keep that code asynchronous you can simply place the relevant code inside the callback of the request function meaning it will run after the request finished, but not stop the processor from handling other tasks in your application. If you need it multiple times I would advice you to check out Synchronous request in Node.js which outlines various methods to get this more streamlined and discusses various control flow libraries.
Related
Let's say I have code like:
app.get('/url', (req, res) => {
if (req.some_magic == 1) {
do_1();
}
});
function do_1() {
let requests = get_requests();
setTimeout(function() { request({
"uri": "url",
"method": "POST",
"json": rq
}, (err, res, body) => {
do_1();
})}, 1000})
}
Basically for some requests that come to /url, I have to send bunch of requests to some service. How can I make this asynchronous so other requests from other people coming to /url wouldn't have to wait for do_1 to be finished? Or Node is already working like that? If yes, do you have any quick explanations or tutorials I could look into to understand how this works? I come from LEMP, so it's super different. Thanks a lot.
Pretty much any function that involves getting data from outside of Node (such as a network request or file read) will use a function that is asynchronous. The documentation for the function should tell you (or at least imply it saying that the function returns a Promise or accepts a callback function as an argument).
The example you give shows the request module accepting a callback function.
The main exceptions are functions which are explicitly defined as being sync (such as fileWriteSync).
If you need to free up the main event loop explicitly, then you can use a worker thread. It's very rare that you will need to do this, and the main need comes when you are performing CPU intensive calculations in JS (which aren't farmed out to a library that is already asynchronous).
Admittedly I'm a novice with node, but it seems like this should be working fine. I am using multiparty to parse a form, which returns an array. I am then using a for each to step through the array. However - the for each is not waiting for the inner code to execute. I am a little confused as to why it is not, though.
var return_GROBID = function(req, res, next) {
var form = new multiparty.Form();
var response_array = [];
form.parse(req, function(err, fields, files) {
files.PDFs.forEach(function (element, index, array) {
fs.readFile(element.path, function (err, data) {
var newPath = __dirname + "/../public/PDFs/" + element.originalFilename;
fs.writeFile(newPath, data, function (err) {
if(err) {
res.send(err);
}
GROBIDrequest.GROBID2js(newPath, function(response) {
response_array.push(response);
if (response_array.length == array.length) {
res.locals.body = response_array;
next();
}
});
});
});
});
});
}
If someone can give me some insight on the proper way to do this that would be great.
EDIT: The mystery continues. I ran this code on another machine and IT WORKED. What is going on? Why would one machine be inconsistent with another?
I'd guess the PDFs.forEach is you just calling the built-in forEach function, correct?
In Javascript many things are asynchronous - meaning that given:
linea();
lineb();
lineb may be executed before linea has finished whatever operation it started (because in asynchronous programming, we don't wait around until a network request comes back, for example).
This is different from other programming languages: most languages will "block" until linea is complete, even if linea could take time (like making a network request). (This is called synchronous programming).
With that preamble done, back to your original question:
So forEach is a synchronous function. If you rewrote your code like the following, it would work (but not be useful):
PDFs.forEach(function (element, index, array) {
console.log(element.path)
}
(console.log is a rare synchronous method in Javascript).
But in your forEach loop you have fs.readFile. Notice that last parameter, a function? Node will call that function back when the operation is complete (a callback).
Your code will currently, and as observed, hit that fs.readFile, say, "ok, next thing", and move on to the next item in the loop.
One way to fix this, with the least changing the code, is to use the async library.
async.forEachOf(PDFs, function(value, key, everythingAllDoneCallback) {
GROBIDrequest.GROBID2js(newPath, function(response) {
response_array.push(response);
if (response_array.length = array.length) {
...
}
everythingAllDoneCallback(null)
} );
With this code you are going through all your asynchronous work, then triggering the callback when it's safe to move on to the next item in the list.
Node and callbacks like this are a very common Node pattern, it should be well covered by beginner material on Node. But it is one of the most... unexpected concepts in Node development.
One resource I found on this was (one from a set of lessons) about NodeJS For Beginners: Callbacks. This, and playing around with blocking (synchronous) and non-blocking (asynchronous) functions, and hopefully this SO answer, may provide some enlightenment :)
I'm new to Node.js, so I'm still wrapping my head around asynchronous functions and callbacks. My struggle now is how to return a response after reading data from a file in an asynchronous operation.
My understanding is that sending a response works like this (and this works for me):
app.get('/search', function (req, res) {
res.send("request received");
});
However, now I want to read a file, perform some operations on the data, and then return the results in a response. If the operations I wanted to perform on the data were simple, I could do something like this -- perform them inline, and maintain access to the res object because it's still within scope.
app.get('/search', function (req, res) {
fs.readFile("data.txt", function(err, data) {
result = process(data.toString());
res.send(result);
});
});
However, the file operations I need to perform are long and complicated enough that I've separated them out into their own function in a separate file. As a result, my code looks more like this:
app.get('/search', function (req, res) {
searcher.do_search(res.query);
// ??? Now what ???
});
I need to call res.send in order to send the result. However, I can't call it directly in the function above, because do_search completes asynchronously. And I can't call it in the callback to do_search because the res object isn't in scope there.
Can somebody help me understand the proper way to handle this in Node.js?
To access a variable in a different function, when there isn't a shared scope, pass it as an argument.
You could just pass res and then access both query and send on the one variable within the function.
For the purposes of separation of concerns, you might be better off passing a callback instead.
Then do_search only needs to know about performing a query and then running a function. That makes it more generic (and thus reusable).
searcher.do_search(res.query, function (data) {
res.send(...);
});
function do_search(query, callback) {
callback(...);
}
The existing answers are perfectly valid, you can also use async/await keywords since ES2017. Using your own function:
app.get('/search', async(req, res, next) {
try {
const answer = await searcher.do_search(req.query);
res.send(answer);
}
catch(error) {
return next(error);
}
});
Disclaimer: I know that synchro stuff is to be avoided, and that promises and callbacks are preferrable, but I'm stuck writing a system that needs a small amount of backwards compatability, and need to write this for a temporary stop gap.
Writing an express.js app, I have a function that takes the request object from a .get or .post etc function, and confirms whether the session key is valid (after checking this with a remote API server). The main version of the function is like this:
module.exports.isValidSession(req, cb) {
// REST API Calls and callbacks, error handling etc, all works fine.
}
I need a version of the above to be compatible with an older system (that will soon be phased out). I'm quite new to Node, although not to JS, so I'm wondering if there's a good convention on how to do this?
Additional Info
The main problem I hit is returning a synchronous function from some kind of 'watcher' - See below for one approach I considered, which doesn't work, but I figure maybe someone knows an approach to. (It's essentialy polling the value of a variable until the async function sets it to indicate it's done.)
function isValidSession(req) {
var running = 1, ret = -1;
var looper = setInterval(function () {
if (!running) {
clearInterval(looper);
return ret; // This is the problem bit. Returns an async interval function, not the parent.
}
}, 500);
request(requestOpts, function (error, response, body) {
if (error) {
ret = new Error('Some error went on');
running = -1;
}
if (response.statusCode === 200) {
ret = true;
running = -1;
}
});
}
Might well not be possible, or more likely, not viable, but it'd be really useful if I could include this for backwards compat for a week or so. It's a dev project atm, so if it's not good practice, it'll do as long as it doesn't comprimise everything. I do realise, though, that it basically goes against the whole point of Node (although Node itself includes several functions that are available in both sync and async versions).
All help gratefully received.
Lets pretend there is a strong reason why author desires this.
There is a rather popular (so, you are not alone) module called synchronize.js
Use case:
Add callback to your isValidSession
function isValidSession(req, cb) {
request(requestOpts, function (error, response, body) {
if (error) {
cb(new Error('Some error went on'));
}
if (response.statusCode === 200) {
cb(null, true);
}
});
}
Use it with sync module:
var ret = sync.await(isValidSession(req, sync.defer()))
P.S. Might require testing, refer to the documentation.
Today is my first foray into nodejs and I am particularly stumped trying to understand the way the following piece of logic flows. The logic is as follows:
request({ uri: db.createDbQuery('identifier:abcd1234') },
function(err, response, body) {
response.should.have.status(200);
var search = JSON.parse(body);
search.response.numFound.should.equal(1);
done();
});
});
At a higher level I do understand is that an http request is being made and the function is being called at some juncture that is taking the response and doing something to it. What I am trying to understand is the proper order of the calls and how does the binding of variables take place in the above given logic. How does the compiler know how to bind the return values from the request to the anonymous function? Basically, I want to gain an understanding on how things work under the hood for this snippet.
Thanks
Your question isnt specific to node.js, this is basically a feature of javascript.
Basically you are calling request() which is defined like function request(obj, callback)
Internally, the http request is being called, and once its completed, it calls callback which is actually a function pointer.
function request(obj, callback){
//http request logic...
var err = request_logic_internal_function();
var response = ...
var body = ...
callback(err, response, body)
}
Your code can actually be restructured as :
var options = { uri: db.createDbQuery('identifier:abcd1234') };
var request_callback = function(err, response, body) {
response.should.have.status(200);
var search = JSON.parse(body);
search.response.numFound.should.equal(1);
done();
};
request(options, request_callback);
What you're basically doing is sending in a function pointer as a variable.
I don't know what library(ies) you're using, and it looks like you may have anonymized them by assigning methods into your code's global scope like request, done, and db.
What I can say is this:
That indentation is horrible and initially misled me on what it was doing, please gg=G (vim syntax) your code so it's properly indented.
request takes two arguments, a configuration object and a callback.
db.createDbQuery must be a blocking method or the anonymous object you're creating won't have the proper value.
request uses that configuration value, makes a non-blocking I/O request of some kind, and later will call the callback function you provide. That means that the code immediately after that request call will execute before the callback you provide will execute.
Some time later the request data will come back, Node.js's event loop will provide the data to the library's registered event handler (which may or may not be your callback directly -- it could do something to it and then call your event handler afterwards, you don't know or really care).
Then the function does some checks that will throw errors if they fail, and finally calls a done function in its scope (defined somewhere else) that will execute and continue the logical stream of execution.