Nested queries in Node JS / MongoDB - javascript

My userlist table in mongo is set up like this:
email: email#email.com, uniqueUrl:ABC, referredUrl: ...
I have the following code where I query all of the users in my database, and for each of those users, find out how many other's users' referredUrl's equal the current user's unique url:
exports.users = function(db) {
return function(req, res) {
db.collection('userlist').find().toArray(function (err, items) {
for(var i = 0; i<= items.length; i++) {
var user = items[i];
console.log(user.email);
db.collection('userlist').find({referredUrl:user.uniqueUrl}).count(function(err,count) {
console.log(count);
});
}
});
};
};
Right now I'm first logging the user's email, then the count associated with the user. So the console should look as such:
bob#bob.com
1
chris#chris.com
3
grant#grant.com
2
Instead, it looks like this:
bob#bob.com
chris#chris.com
grant#grant.com
1
3
2
What's going on? Why is the nested query only returning after the first query completes?

Welcome to asynchronous programming and callbacks.
What you are expecting to happen is that everything works in a linear order, but that is not how node works. The whole subject is a little too broad for here, but could do with some reading up on.
Luckily the methods invoked by the driver all key of process.nextTick, which gives you something to look up and search on. But there is a simple way to remedy the code due to the natural way that things are queued.
db.collection('userlist').find().toArray(function(err,items) {
var processing = function(user) {
db.collection('userlist').find({ referredUrl: user.uniqueUrl })
.count(function(err,count) {
console.log( user.email );
console.log( count );
});
};
for( var i = 0; i < items.length; i++) {
var user = items[i];
processing(user);
}
});
Now of course that is really an oversimplified way of explaining this, but understand here that you are passing parameters through to your repeating .find() and then doing all the output there.
As said, fortunately some of the work is done for you in the API functions and the event stack is maintained as you added the calls. But mostly, now the output calls are made together and are not occurring within different sets of events.
For a detailed explanation of event loops and callbacks, I'm sure there a much better ones out there than I could write here.

Callbacks are asynchronous in node.js. So, your count function (function(err,count) { console.log(count); }) is not executed immediately after console.log(user.email);. Therefore, the output is normal, nothing wrong with it. What the wrong is the coding style. You shouldn't call callbacks consecutively to get same result when you call functions in same manner in python (in single thread). To get desired result, you should do all work in single callback. But before doing that, I recommend you to understand how callbacks work in nodejs. This will significantly help your coding in nodejs

Related

Emscripten sandwiched by asynchronous Javascript Code

I'm trying to use Emscripten to write a Software to run in browser but also on other architectures (e.g. Android, PC-standalone app).
The Software structure is something like this:
main_program_loop() {
if (gui.button_clicked()) {
run_async(some_complex_action, gui.text_field.to_string())
}
if (some_complex_action_has_finished())
{
make_use_of(get_result_from_complex_action());
}
}
some_complex_action(string_argument)
{
some_object = read_local(string_argument);
interm_res = simple_computation(some_object);
other_object = expensive_computation(interm_res);
send_remote(some_object.member_var, other_object);
return other_object.member_var;
}
Let's call main_program_loop the GUI or frontend, some_complex_action the intermediate layer, and read_local, send_remode and expensive_computation the backend or lower layer.
Now the frontend and backend would be architecture specific (e.g. for Javascript read_local could use IndexDB, send_remote could use fetch),
but the intermediate layer should make up more then 50% of the code (that's why I do not want to write it two times in two different languages, and instead write it once in C and transpile it to Javascript, for Android I would use JNI).
Problems come in since in Javascript the functions on the lowest layer (fetch etc) run asyncronously (return a promise or require a callback).
One approach I tried was to use promises and send IDs through the intermediate layer
var promises = {};
var last_id = 0;
handle_click() {
var id = Module.ccall('some_complex_action', 'number', ['string'], [text_field.value]);
promises[id].then((result) => make_us_of(result));
}
recv_remote: function(str) {
promises[last_id] = fetch(get_url(str)).then((response) => response.arrayBuffer());
last_id += 1;
return last_id - 1;
}
It works for the simple case of
some_complex_action(char *str)
{
return recv_remote(str);
}
But for real cases it seem to be getting really complicated, maybe impossible. (I tried some approach where I'd given every function a state and every time a backend function finishes, the function is recalled and advances it's state or so, but the code started getting complicated like hell.) To compare, if I was to call some_complex_action from C or Java, I'd just call it in a thread separate from the GUI thread, and inside the thread everything would happen synchronously.
I wished I could just call some_complex_action from an async function and put await inside recv_remote but of cause I can put await only directly in the async function, not in some function called down the line. So that idea did not work out either.
Ideally if somehow I could stop execution of the intermediate Emscripten transpiled code until the backend function has completed, then return from the backend function with the result and continue executing the transpiled code.
Has anyone used Emterpreter and can imagine that it could help me get to my goal?
Any ideas what I could do?

Calculating when multiple writes to a file will cause inaccuracies?

in my node server I have a variable,
var clicks = 0;
each time a user clicks in the webapp, a websocket event sends a message. on the server,
clicks++;
if (clicks % 10 == 0) {
saveClicks();
}
function saveClicks() {
var placementData = JSON.stringify({'clicks' : clicks});
fs.writeFile( __dirname + '/clicks.json', placementData, function(err) {
});
}
At what rate do I have to start worrying about overwrites? How would I calculate this math?
(I'm looking at creating a MongoDB json object for each click but I'm curious what a native solution can offer).
From the node.js doc for fs.writeFile():
Note that it is unsafe to use fs.writeFile() multiple times on the
same file without waiting for the callback. For this scenario,
fs.createWriteStream() is strongly recommended.
This isn't a math problem to figure out when this might cause a problem - it's just bad code that gives you the chance of a conflict in circumstances that cannot be predicted. The node.js doc clearly states that this can cause a conflict.
To make sure you don't have a conflict, write the code in a different way so a conflict cannot happen.
If you want to make sure that all writes happen in the proper order of incoming requests so the last request to arrive is always the one who ends up in the file, then you make need to queue your data as it arrives (so order is preserved) and write to the file in a way that opens the file for exclusive access so no other request can write while that prior request is still writing and handle contention errors appropriately.
This is an issue that databases mostly do for you automatically so it may be one reason to use a database.
Assuming you weren't using clustering and thus do not have multiple processes trying to write to this file and that you just want to make sure the last value sent is the one written to the file by this process, you could do something like this:
var saveClicks = (function() {
var isWriting = false;
var lastData;
return function() {
// always save most recent data here
lastData = JSON.stringify({'clicks' : clicks});
if (!isWriting) {
writeData(lastData);
}
function writeData(data) {
isWriting = true;
lastData = null;
fs.writeFile(__dirname + '/clicks.json', data, function(err) {
isWriting = false;
if (err) {
// decide what to do if an error occurs
}
// if more data arrived while we were writing this, then write it now
if (lastData) {
writeData(lastData);
}
});
}
}
})();
#jfriend00 is definitely right about createWriteStream and already made a point about the database, and everything's pretty much said, but I would like to emphasize on the point about databases because basically the file-saving approach seems weird to me.
So, use databases.
Not only would this save you from the headache of tracking such things, but would significantly speed up things (remember that the way stuff is done in node, the numerous file reading-writing processes would be parallelized in a single thread, so basically if one of them lasts for ages, it might slightly affect the overall performance).
Redis is a perfect solution to store key-value data, so you can store data like clicks per user in a Redis database which you'll have to get running alongside anyway when your get enough traffic :)
If you're not convinced yet, take a look at this simple benchmark:
Redis:
var async = require('async');
var redis = require("redis"),
client = redis.createClient();
console.time("To Redis");
async.mapLimit(new Array(100000).fill(0), 1, (el, cb) => client.set("./test", 777, cb), () => {
console.timeEnd("To Redis");
});
To Redis: 5410.383ms
fs:
var async = require('async');
var fs = require('fs');
console.time("To file");
async.mapLimit(new Array(100000).fill(0), 1, (el, cb) => fs.writeFile("./test", 777, cb), () => {
console.timeEnd("To file");
});
To file: 20344.749ms
And, by the way, you can significantly increase the number of clicks after which the progress would be stored (now it's 10) by simply adding this "click-saver" to the socket socket.on('disconnect', ....

Javascript techniques to embed nested callback functions

Im building an app with Nodejs. Im fairly fluent with Front end javascript where asynchronous events rarely get too complex and don't go that deep. But now that I'm using Node which is all event driven, making a lot of calls to different servers and databases that all rely on each other becomes rather clustered.
It seems to be common place to have a next() function passed as a parameter that gets called once the first event has finished. This works great however I'm struggling to keep readable code when needing to have next functions after next functions.
Let me explain through example.
Lets say I have a route defined like so:
app.use('/fetchData', function(req, res) {
});
So before we can return the data I need to make a few async calls.
First to the database to retrieve login details.
Then using the login details i need to make another call to an external server to login in and retrieve the raw information.
Then third I need to go back to the database to do some checks.
And then finally return the data to the user.
How would you do that? Im trying like this but cant get it right nor looking readable:
app.use('/fetchData', function(req, res) {
//First I create a user object to pass information around to each function
var user = {...};
var third = database.doSomeChecks;
var second = server.externalCall(user, third);
//first
database.getLoginDetails(user, second);
});
Obviously second actually runs the function and sets second as the returned value. But I can seem to pass the right information through to second.
One Option i thought could be to pass through an array of callbacks and to always call the last function in the array and remove it.
app.use('/fetchData', function(req, res) {
//First I create a user object to pass information around to each function including the req and res object to finally return information
var user = {...};
var third = database.doSomeChecks;
var second = server.externalCall;
//first
database.getLoginDetails(user, [third, second]);
});
What are your techniques? Is the array idea as pointed out above the best solution?
I'd recommend you to use promises as a personal preference I like to use bluebird it's easy to implement, it has a very nice performance and also it has some cool features to play with.
with promises, it's easier to read the control flow execution (at least to me), a lot of people complain about the callback hell and promises it's one of the possible solutions.
you can do something like this:
from:
var user = {...};
var third = database.doSomeChecks;
var second = server.externalCall(user, third);
to:
var user = {...};
checkDB(query).then(
function(data){
//data checks
return value
}).then(
function(value){
// value returned from the previous promise
return server.externalCall(value);
});
you can take a look to this answer and see how you can deal with nested promises which are far easier than callbacks.
I hope that helps.

How to pass data around from an asynchronous function in NodeJS, using unconventional methods?

I've been playing with jsdom, which is a module for node.js. This following code on the bottom is from their documentation page. My problem is how to return something from asynchronous functions.
I know that this is a question that is asked a lot, probably by me as well. I also know that callbacks are a good friend when it comes to these kind of problems. My goal here is to find a workaround which might act like a cookie or a Session variable in PHP, in order to transfer that little bit of data to the outer scope, outside the asynchronous function. Then it should be accessible once the data is set from the outer scope.
First thing I want to know is:
Is there already a way to store data somewhere like a cookie or session that exists in the outer scope and accessible once I've done what I had to do?
If I was to write the data to a file, in point B in the code, and read it at point C, wouldn't I have to write a some sort of timeout function to wait for a few seconds before reading the file? My experience in working with Asynchronous functions in nodejs has sometimes shown that I had to wait a few seconds before the writing process was done before trying to read it. Would this be the case here too? If yes, wouldn't it mean it would have to happen if where I saved the data was memory?
If I was to write a c++ plugin for this purpose, that acted as a separate data bay where we could save(data) at point B to the memory and retrieve(data) at point C from the memory; would this work?
Honestly, I do not like writing temporary files to work around the asynchronous functions. I have been looking for a simple yet effective way to pass data around, but I need some guidance from experienced programmers like you to surpass the unnecessary approaches to this problem.
If you could toss around some ideas for me, stating what might work and what might not work, I'd appreciate it.
Here's the example code:
// Print all of the news items on hackernews
var jsdom = require("jsdom");
// var result;
// A) Outer Scope: Since the function is async in done, storing the result here and echoing in point C is pointless.
jsdom.env({
html: "http://news.ycombinator.com/",
scripts: ["http://code.jquery.com/jquery.js"],
done: function (errors, window) {
var $ = window.$;
console.log("HN Links");
$("td.title:not(:last) a").each(function() {
console.log(" -", $(this).text());
});
// B) let's say I want to return something I've scavenged here.
// result = $("a");
}
});
// C)
// console.log(result)
You need to clear your head of your synchronous experience that thinks code lower in the file happens later in time. It does not necessarily do that in node, ever. Here's the deal. In node, you place orders like at a restaurant, and you don't do it like:
1. Order a salad
2. Wait 11 minutes
3. Eat the salad
You do it like this
1. Order a salad
2. Wait for the server to serve you the salad
3. Eat the salad
The first example is a race condition and a terrible bug in your program that will cause either the salad waiting around to be eaten for no reason or trying to eat a salad that isn't there yet.
Don't think "I want to return something here", think "this data is ready". So you can have:
function eatSalad() {...}
placeOrder("salad", eatSalad);
Where eatSalad is the callback for the placeOrder routine, which does the necessary I/O to get the salad. Notice how even though eatSalad is earlier in the file, it happens later chronologically. You don't return things, you invoke callbacks with data you have prepared.
Here's your snippet made asynchronous.
// Print all of the news items on hackernews
var jsdom = require("jsdom");
// var result;
// A) Outer Scope: Since the function is async in done, storing the result here and echoing in point C is pointless.
jsdom.env({
html: "http://news.ycombinator.com/",
scripts: ["http://code.jquery.com/jquery.js"],
done: function (errors, window) {
var $ = window.$;
console.log("HN Links");
$("td.title:not(:last) a").each(function() {
console.log(" -", $(this).text());
});
// B) let's say I want to return something I've scavenged here.
// result = $("a");
resultIsReady($("a"));
}
});
function resultIsReady(element) {
console.log(element);
}
EDIT TO ADD to answer your question from the comments, node code will generally be built up not from functions that return things, but from functions that invoke callback functions with their "return value". The return keyword is only used to actually return a value for in-memory code that does not do any I/O. So finding the mean of an in-memory array can just return it, but finding the mean from a database result set must invoke a callback function. The basic paradigm is built up your programs from functions like this (pseudocode DB library):
function getUser(email, callback) {
db.users.where({email: email}, function (error, user) {
if (error) {
//This return statement is just for early termination
//of the function. The value returned is discarded
return callback(error);
}
callback(null, user);
});;
}
So that's how you do things. And typically functions like this do a very limited number of IO calls (1 or 2 are common, then you start falling into nesting hell and need to refactor).
I personally write lots of functions like that and then use the async library to describe the higher-order sequence of things that needs to happen. There are lots of other popular flow control libraries as well, and some people like the promises pattern. However, at the moment some of the core node community members seem to be advocating callbacks as the one true way and promises seem to be falling out of favor.
Avoid using synchronous code in any place where there is a chance for blocking the execution, such as database operations, file IO, network data retrieval, long calculations etc.
In your example use callback when you finish your computation to continue with execution, you could also take a look at async https://npmjs.org/package/async library which is de facto standard for hairier calls :
function sendData(result) {
res.json(result);
}
var jsdom = require("jsdom");
// var result;
// A) Outer Scope: Since the function is async in done, storing the result here
// and echoing in point C is pointless.
jsdom.env({
html: "http://news.ycombinator.com/",
scripts: ["http://code.jquery.com/jquery.js"],
done: function (errors, window) {
var $ = window.$;
console.log("HN Links");
$("td.title:not(:last) a").each(function () {
console.log(" -", $(this).text());
});
// B) let's say I want to return something I've scavenged here.
var result = $("a");
sendData(result);
}
});

NodeJS and asynchronous hell

I just came to this awful situation where I have an array of strings each representing a possibly existing file (e.g. var files = ['file1', 'file2', 'file3']. I need to loop through these file names and try to see if it exists in the current directory, and if it does, stop looping and forget the rest of the remaining files. So basically I want to find the first existing file of those, and fallback to a hard-coded message if nothing was found.
This is what I currently have:
var found = false;
files.forEach(function(file) {
if (found) return false;
fs.readFileSync(path + file, function(err, data) {
if (err) return;
found = true;
continueWithStuff();
});
});
if (found === false) {
// Handle this scenario.
}
This is bad. It's blocking (readFileSync) thus it's slow.
I can't just supply callback methods for fs.readFile, it's not that simple because I need to take the first found item... and the callbacks may be called at any random order. I think one way would be to have a callback that increases a counter and keeps a list of found/not found information and when it reaches the files.length count, then it checks through the found/not found info and decides what to do next.
This is painful. I do see the performance greatness in evented IO, but this is unacceptable. What choices do I have?
Don't use sync stuff in a normal server environment -- things are single threaded and this will completely lock things up while it waits for the results of this io bound loop. CLI utility = probably fine, server = only okay on startup.
A common library for asynchronous flow control is
https://github.com/caolan/async
async.filter(['file1','file2','file3'], path.exists, function(results){
// results now equals an array of the existing files
});
And if you want to say, avoid the extra calls to path.exists, then you could pretty easily write a function 'first' that did the operations until some test succeeded. Similar to https://github.com/caolan/async#until - but you're interested in the output.
The async library is absolutely what you are looking for. It provides pretty much all the types of iteration that you'd want in a nice asynchronous way. You don't have to write your own 'first' function though. Async already provides a 'some' function that does exactly that.
https://github.com/caolan/async#some
async.some(files, path.exists, function(result) {
if (result) {
continueWithStuff();
}
else {
// Handle this scenario
}
});
If you or someone reading this in the future doesn't want to use Async, you can also do your own basic version of 'some.'
function some(arr, func, cb) {
var count = arr.length-1;
(function loop() {
if (count == -1) {
return cb(false);
}
func(arr[count--], function(result) {
if (result) cb(true);
else loop();
});
})();
}
some(files, path.exists, function(found) {
if (found) {
continueWithStuff();
}
else {
// Handle this scenario
}
});
You can do this without third-party libraries by using a recursive function. Pass it the array of filenames and a pointer, initially set to zero. The function should check for the existence of the indicated (by the pointer) file name in the array, and in its callback it should either do the other stuff (if the file exists) or increment the pointer and call itself (if the file doesn't exist).
Use async.waterfall for controlling the async call in node.js for example:
by including async-library and use waterfall call in async:
var async = require('async');
async.waterfall(
[function(callback)
{
callback(null, taskFirst(rootRequest,rootRequestFrom,rootRequestTo, callback, res));
},
function(arg1, callback)
{
if(arg1!==undefined )
{
callback(null, taskSecond(arg1,rootRequest,rootRequestFrom,rootRequestTo,callback, res));
}
}
])
(Edit: removed sync suggestion because it's not a good idea, and we wouldn't want anyone to copy/paste it and use it in production code, would we?)
If you insist on using async stuff, I think a simpler way to implement this than what you described is to do the following:
var path = require('path'), fileCounter = 0;
function existCB(fileExists) {
if (fileExists) {
global.fileExists = fileCounter;
continueWithStuff();
return;
}
fileCounter++;
if (fileCounter >= files.length) {
// none of the files exist, handle stuff
return;
}
path.exists(files[fileCounter], existCB);
}
path.exists(files[0], existCB);

Categories