NodeJS: Asynchronous map for blocking jobs

NodeJS: Asynchronous map for blocking jobs - javascript

Lets take an example where I have a huge array with elements being stringified JSON. I want to iterate over this array and convert all strings to JSON using JSON.parse(which blocks the event-loop).
var arr = ["{...}", "{...}", ... ] //input array
Here is the first approach(may keep the event loop blocked for some time):
var newArr = arr.map(function(val){
try{
var obj = JSON.parse(val);
return obj;
}
catch(err){return {};}
});
The second approach was using async.map method(Will this be more efficient compared to the first approach?):
var newArr = [];
async.map(arr,
function(val, done){
try{
var obj = JSON.parse(val);
done(null, obj);
}
catch(err){done(null, {});}
},
function(err, results){
if(!err)
newArr = results;
}
);
If the second approach is same or almost same then what is efficient way of doing this in node.js.
I came across child processes, will this be a good approach for this problem?

I don't think async.map guarantees a non-blocking handling of a sync function. Though it wraps your function with an asyncify function, I can't find anything in that code that actually makes it non-blocking. It's one of the problems I've encountered with async in the past (but maybe it's improved now)
You could definitely handroll your own solution with child processes, but it might be easier to use something like https://github.com/audreyt/node-webworker-threads

use async.map but wrap the callback in setImmediate(done)
I find the async functions quite convenient but not very efficient; if the mapped computation is very fast, calling done via setImmediate only once every 10 times and calling it directly otherwise will run visibly faster. (The setImmediate breaks up the call stack and yields to the event loop, but the setImmediate overhead is non-negligible)

Related

Cycling through a list with async call inside

I have an array of Ids, I need to iterate through all the Ids, and for each Ids of the array make an async call to retrieve a value from DB, then sums all the value gathered. I did something like this
let quantity = 0;
for (const id of [1,2,3,4]) {
const subQuantity = await getSubQuantityById(id);
quantity += subQuantity;
}
Is there a more elegant and coincise way to write this for in javascript?

It is totally fine because your case include an async operation. Using a forEach instead is not possible here at all.
Your for loop is perfectly clean. If you want to make it shorter you could even do:
let totalQuantity = 0;
for (const id of arrayOfIds) {
totalQuantity += await getSubQuantityById(id);
}
As-is, it may even be more clear than using += await as above.
Naming could be improved as suggested.
I find the following one liner suggested in comments more cryptic/dirty:
(await Promise.all([1,2,3,4].map(i => getSubQuantityById(id))).reduce((p, c) => p + c, 0)
Edit: Props to #vitaly-t, who indicates that using Promise.all the way this one liner does will result in uncontrollable concurrency and lead to troubles in the context of a database

I can't follow #vitaly-t's argument that concurrent database queries will cause "problems" - at least not when we are talking about simple queries and there is a "moderate" number of these queries.
Here is my version of doing the summation. Obviously, the console.log in the last .then() needs to be replaced by the actual action that needs to happen with the calculated result.
// a placeholder function for testing:
function getSubQuantityById(i){
return fetch("https://jsonplaceholder.typicode.com/users/"+i).then(r=>r.json()).then(u=>+u.address.geo.lat);
}
Promise.all([1,2,3,4].map(id => getSubQuantityById(id)))
.then(d=>d.reduce((p, c) => p + c,0))
.then(console.log)

Is there a more elegant and coincise way to write this for in javascript?
Certainly, by processing your input as an iterable. The solution below uses iter-ops library:
import {pipeAsync, map, wait, reduce} from 'iter-ops';
const i = pipeAsync(
[1, 2, 3, 4], // your list of id-s
map(getSubQuantityById), // remap ids into async requests
wait(), // resolve requests
reduce((a, c) => a + c) // calculate the sum
); //=> AsyncIterableExt<number>
Testing the iterable:
(async function () {
console.log(await i.first); //=> the sum
})();
It is elegant, because you can inject more processing logic right into the iteration pipeline, and the code will remain very easy to read. Also, it is lazy-executing, initiates only when iterated.
Perhaps even more importantly, such a solution lets you control concurrency, to avoid producing too many requests against the database. And you can fine-tune concurrency, by replacing wait with waitRace.
P.S. I'm the author of iter-ops.

how to 'mark' an Object for garbage collection in NodeJS

I have a recursive function:
const fn = async (val) => {
let obj = await make_some_api_call();
// example response {a: 'apple', b : 'bbb', c: 'ccc', d: 'ddd'};
if ('a' in obj) {
const var1 = obj.a;
obj = null;
return fn(var1);
}
}
I want the Object obj to be gc'ed after each run.
I have a Object property value assigned to a local variable (var1), will setting obj=null force it to be gc'ed in next cycle?
Meaning after each run will Object obj get gc'ed?
If not, how to achieve this?

Since some commentors have missed the fact that this is potentially recursive code, I want to point out that this answer is written in that recursive context. If it wasn't recursive, then setting obj = null would not be necessary because the variable obj would immediately be eligible for garabage collection anyway as soon as the function returned.
will setting obj=null force it to be gc'ed in next cycle?
Assuming that no other code such as code inside of await make_some_api_call(); has any persistent references to obj, then setting obj = null will clear your one reference to that variable and will make it "eligible" for garbage collection at the point in time where nodejs next runs a garbage collection cycle.
That may or may not be "after each run". You don't really describe what "after each run" means as it pertains to your code, but in any case, when the garbage collector runs is not precisely defined.
Nodejs will run a GC cycle when it thinks it needs to and when it appears to have time to. Garbage collection can sometimes be a bit lazy. If you're busy doing a lot of things, then nodejs will attempt to not get in the way of what your code is doing just to run the GC. It will try to wait until the event loop isn't really doing anything. There are exceptions to this, but to avoid impacting run-time performance, it looks for opportunities to run GC when the event loop is idle.
In your recursive code, you do have an await so assuming that takes some amount of time to resolve its promise, then that next await could be an opportunity for nodejs to run the GC cycle to clean up the obj from the prior recursive call.
I should also point out that code like this can also be written with some sort of loop instead of using recursion and that sometimes simplifies things. For one, it prevents stack-build up. Since there is no complicated local function context or lots of function arguments, this could easily be turned into a while(more) kind of loop with the await and some sort of condition test inside the loop that either sets the more flag or uses break or return to stop the looping when done. If this may recurse many times, just avoiding the stack build-up (which also includes a promise for each async function called recursively) could be beneficial.
Here's some similar pseudo-code that avoids the recursion and automatically reuses the obj variable (freeing the reference to the prior object to it is available for GC):
const fn = async (val) => {
// value for the first one comes from the function argument,
// subsequent iterations get the value from the prior call
let var1 = val;
let obj;
while (true) {
obj = await make_some_api_call(var1);
if (!('a' in obj)) {
// all done
break;
}
// run again using obj.a
var1 = obj.a;
}
}

JavaScript for loop not executed although valid array provided

Trouble finding the reason why JavaScript for loop is not executing. Wrote 2 simple functions below that I want to execute and run i.e.: Bought method should try to "simulate" synchronous code.
The problem is that for some reason the for loop in the addNodes() method is never executed. However, if I run this separately i.e. for example line by line
var result = [];
var addressBookNodes = await generateAddressBooksNodes();
addNodes(result, addressBookNodes);
that tells me that the code is running fine however most likely it has something to do with the asynchronous nature of the generateAddressBooksNodes method. If I simply run the command :
var addressBookNodes = await generateAddressBooksNodes();
in the browser, I get an array of objects exactly what I was expecting to get. Basically the generateAddressBooksNodes returns a promise and when that promise is resolved I can see the correct array returned however what I do not understand why the for loop is not executed if the nodes objects have at least one element as shown in the picture below.
function addNodes(result, nodes){
console.log("3");
console.log(nodes);
for (var num in nodes) {
console.log("4");
let singleNode = nodes[num];
console.log(singleNode);
console.log("5");
result.push(singleNode);
}
}
async function getAddressBookAndContactNodes() {
var result = [];
console.log("1");
var addressBookNodesPromise = generateAddressBooksNodes();
addressBookNodesPromise.then( (arrayOfAddressBookNodes) => {
console.log("2");
addNodes(result, arrayOfAddressBookNodes);
})
return result;
}
Update 26 August 2020 :
After poking around the "arrayOfAddressBookNodes" object i noticed something really strange. I added additional print statement and printed the length of the "arrayOfAddressBookNodes" array. The length of the array is 0 when runned in the function. I do not understand how the length can be 0 if the object is printed shortly before the for loop and as shown on the picture below the length there is :1. What the hell is going on here?
I found another article i.e. JavaScript Array.length returning 0 that is basically explaining this. And in one of the commends it has been mentioned to use Map instead of an Array. I decided to use Set, and still get the same error i.e. the size of the set is 0 although the Set contains an object. i.e. below is the code and the picture of that execution.
async function getAddressBookAndContactNodes() {
var result = new Set();
console.log("1");
var addressBookNodes = await generateAddressBooksNodes();
console.log("2");
console.log(addressBookNodes);
console.log("3");
console.log("addressBookNodes size : " + addressBookNodes.size);
addressBookNodes.forEach(function(value) {
result.add(value);
});
console.log("6");
console.log(result);
console.log("7");
return result;
}
example using set
all this is really confusing to someone having a c++ backgroud, it makes my head explode.
Update 2 : 26 August 2020.
Ok i solved my problem. The problem was that the the promises are not working withing the for loop everything is explained here.
i need to use the regular "for (index = 0; index < contactsArray.length; ++index) " instead of foreach. after that it all worked. Somehow this leaves the impression that the tools of the language are broken in so many ways.

If generateAddressBooksNodes is returning a promise, you can use async to wait for the results:
async function getAddressBookAndContactNodes() {
var result = [];
console.log("1");
var addressBookNodesPromise = await generateAddressBooksNodes();
// note await here. Also, unless you're using this at a later time in your code, you can save space by not assigning it to a variable and simply returning generateAddressBooksNodes
addNodes(result, arrayOfAddressBookNodes);
return result;
}

javascript: Only return if not false

Scenario: I'm searching for a specific object in a deep object. I'm using a recursive function that goes through the children and asks them if I'm searching for them or if I'm searching for their children or grandchildren and so on. When found, the found obj will be returned, else false. Basically this:
obj.find = function (match_id) {
if (this.id == match_id) return this;
for (var i = 0; i < this.length; i++) {
var result = this[i].find(match_id);
if (result !== false) return result;
};
return false;
}
i'm wondering, is there something simpler than this?:
var result = this[i].find(match_id);
if (result) return result;
It annoys me to store the result in a variable (on each level!), i just want to check if it's not false and return the result. I also considered the following, but dislike it even more for obvious reasons.
if (this[i].find(match_id)) return this[i].find(match_id);
Btw I'm also wondering, is this approach even "recursive"? it isn't really calling itself that much...
Thank you very much.
[edit]
There is another possibility by using another function check_find (which just returns only true if found) in the if statement. In some really complicated cases (e.g. where you don't just find the object, but also alter it) this might be the best approach. Or am I wrong? D:

Although the solution you have is probably "best" as far as search algorithms go, and I wouldn't necessarily suggest changing it (or I would change it to use a map instead of an algorithm), the question is interesting to me, especially relating to the functional properties of the JavaScript language, and I would like to provide some thoughts.
Method 1
The following should work without having to explicitly declare variables within a function, although they are used as function arguments instead. It's also quite succinct, although a little terse.
var map = Function.prototype.call.bind(Array.prototype.map);
obj.find = function find(match_id) {
return this.id == match_id ? this : map(this, function(u) {
return find.call(u, match_id);
}).filter(function(u) { return u; })[0];
};
How it works:
We test to see if this.id == match_id, if so, return this.
We use map (via Array.prototype.map) to convert this to an array of "found items", which are found using the recursive call to the find method. (Supposedly, one of these recursive calls will return our answer. The ones which don't result in an answer will return undefined.)
We filter the "found items" array so that any undefined results in the array are removed.
We return the first item in the array, and call it quits.
If there is no first item in the array, undefined will be returned.
Method 2
Another attempt to solve this problem could look like this:
var concat = Function.prototype.call.bind(Array.prototype.concat),
map = Function.prototype.call.bind(Array.prototype.map);
obj.find = function find(match_id) {
return (function buildObjArray(o) {
return concat([ o ], map(o, buildObjArray));
})(this).filter(function(u) { return u.id == match_id })[0];
};
How it works:
buildObjArray builds a single, big, 1-dimensional array containing obj and all of obj's children.
Then we filter based on the criteria that an object in the array must have an id of match_id.
We return the first match.
Both Method 1 and Method 2, while interesting, have the performance disadvantage that they will continue to search even after they've found a matching id. They don't realize they have what they need until the end of the search, and this is not very efficient.
Method 3
It is certainly possible to improve the efficiency, and now I think this one really gets close to what you were interested in.
var forEach = Function.prototype.call.bind(Array.prototype.forEach);
obj.find = function(match_id) {
try {
(function find(obj) {
if(obj.id == match_id) throw this;
forEach(obj, find);
})(obj);
} catch(found) {
return found;
}
};
How it works:
We wrap the whole find function in a try/catch block so that once an item is found, we can throw and stop execution.
We create an internal find function (IIFE) inside the try which we reference to make recursive calls.
If this.id == match_id, we throw this, stopping our search algorithm.
If it doesn't match, we recursively call find on each child.
If it did match, the throw is caught by our catch block, and the found object is returned.
Since this algorithm is able to stop execution once the object is found, it would be close in performance to yours, although it still has the overhead of the try/catch block (which on old browsers can be expensive) and forEach is slower than a typical for loop. Still these are very small performance losses.
Method 4
Finally, although this method does not fit the confines of your request, it is much, much better performance if possible in your application, and something to think about. We rely on a map of ids which maps to objects. It would look something like this:
// Declare a map object.
var map = { };
// ...
// Whenever you add a child to an object...
obj[0] = new MyObject();
// .. also store it in the map.
map[obj[0].id] = obj[0];
// ...
// Whenever you want to find the object with a specific id, refer to the map:
console.log(map[match_id]); // <- This is the "found" object.
This way, no find method is needed at all!
The performance gains in your application by using this method will be HUGE. Please seriously consider it, if at all possible.
However, be careful to remove the object from the map whenever you will no longer be referencing that object.
delete map[obj.id];
This is necessary to prevent memory leaks.

No there is no other clear way, storing the result in a variable isn't that much trouble, actually this is what variables are used for.
Yes, that approach is recursive:
you have the base case if (this.id==match_id) return this
you have the recursive step which call itself obj.find(match_id) { ... var result = this[i].find(match_id); }

I don't see any reason, why storing the variable would be bad. It's not a copy, but a reference, so it's efficient. Plus the temporary variable is the only way, that I can see right now (I may be wrong, though).
With that in mind, I don't think, that a method check_find would make very much sense (it's most probably basically the same implementation), so if you really need this check_find method, I'd implement it as
return this.find(match_id) !== false;
Whether the method is recursive is hard to say.
Basically, I'd say yes, as the implementations of 'find' are all the same for every object, so it's pretty much the same as
function find(obj, match_id) {
if (obj.id == match_id) return obj;
for (var i = 0; i < obj.length; ++i) {
var result = find(obj[i], match_id);
if (result !== false) return result;
}
}
which is definitely recursive (the function calls itself).
However, if you'd do
onesingleobjectinmydeepobject.find = function(x) { return this; }
I'm not quite sure, if you still would call this recursive.

ES5: Returning parent function once all items have called back with a forEach

I'm looking at using ES5's new Array.forEach(). I believe the callbacks for each item should execute in parallel. I'd like a function to return once all values have processed.
// I'd like to make parent_function return a new array with the changed items.
function parent_function(array_of_things) {
new_array = []
array_of_things.forEach( function(thing) {
new_array.push('New '+thing)
if ( new_array.length === array_of_things.length ) {
// OK we've processed all the items. How do I make parent_function return the completed new_array?
return new_array
}
})
}
I suspect that this may be the wrong approach: because the parent_function may finish before every callback has been fired. In this case, firing a custom event (that's handled by whatever needs the data produced after we've processed all the items) may be the best approach. However I didn't find any mention of that in the MDC docs.
But I'm fairly new to JS and would love some peer review to confirm!
Edit: Looks like forEach() is synchronous after all. I was logging the resulting array afterwards and seeing inconsistent results - sometimes two items, sometimes four, etc, which made me think it was async. If you're reading this and having similar behaviour, I suggest you read this item about issues in Chrome's array logging.

The forEach() call is synchronous, you can just return the new Array when the forEach() is finished:
function parent_function(array_of_things) {
var new_array = [];
array_of_things.forEach(function(thing) {
new_array.push('New '+thing);
});
return new_array;
}

Nope, all these iterator functions are synchronous. That means, the callbacks are executed one after the other each with the next item, then in the end forEach() returns (and returns undefined).
If you want a new_array with the results of each callback invokation, you'd have a look at map().

We Keep Coding

JavaScript is the programming language of the Web.

NodeJS: Asynchronous map for blocking jobs - javascript

Related

Cycling through a list with async call inside

how to 'mark' an Object for garbage collection in NodeJS

JavaScript for loop not executed although valid array provided

javascript: Only return if not false

ES5: Returning parent function once all items have called back with a forEach

Categories

Resources