I have a NodeJS application which I've built around dependency injection. The app can run any combination of its functions (modules) at the same time, and any modules that request data from the same async resource will instead share a single request. So if module A and module B both need data from https://example.com/path/to/resource and I run both, then that request gets made once and the result is handed off to both modules. Great.
But what if module C only wants part of the data from the same resource? If I run modules A-C, then it can just await the data and filter the result, but that's wasteful when I only want to run module C. Conversely, if I just have module C request path + ?filter=foo, then that's efficient when only module C is run, but wasteful if A-C are as two requests would be made for a superset of the same data.
I sort of see a way to create an optimizing technique around this, but I'm somewhat new to dependency injection and I'm afraid I'll end up creating an anti-pattern or something convoluted and hard to maintain. What's the best approach to handling this?
EDIT
To clarify, in an ideal solution, that example would only have three possible flows and request one of two possible URLs depending on the set of modules being run:
Set = {A}, {B}, or {A, B}. We request /path/to/resource and pass the result as is to the modules.
Set = {C}. We request /path/to/resource?filter=foo and pass the result as is to the module.
Set = {C, A}, {C, B}, or {C, A, B}. We request /path/to/resource. To modules A/B, we pass the result as is. But to module C, we first process the result for foo before passing it along.
This way, C never needs to be aware of what the requested URL was, no data is ever wasted, and no unnecessary CPU cycles need to be burned.
I would use two Promise.all():
1. for all the request that don't need the results
2. for all the request that needs the results
Assume modules = [] is an array of modules that you want to execute with objects:
{
requiredResult: boolean;
url: string;
}
runCombination(modules = []){
let needResultPromises = [];
let backgroundPromises = [];
modules.forEach((eachModule) => {
if(eachModule.requiredResult){
needResultPromises.push(RunModule(eachModule.url))
}else{
backgroundPromises.push(RunModule(eachModule.url))
}
})
Promise.all(backgroundPromises);
return Promise.all(needResultPromises);
}
So from the function above, you can see that some promise all is running in the background and the one with return will wait to the results.
Hope this helps you.
Related
Suppose I have a module implemented like this:
const dbLib = require('db-lib');
const dbConfig = require('db-config');
const dbInstance = dbLib.createInstance({
host: dbConfig.host,
database: dbConfig.database,
user: dbConfig.user,
password: dbConfig.password,
});
module.exports = dbInstance;
Here an instance of database connection pool is created and exported. Then suppose db-instance.js is required several times throughout the app. Node.js should execute its code only once and always pass the same one instance of database pool. Is it safe to rely on this behavior of Node.js require command? I want to use it so that I don't need to implement dependency injection.
Every single file that you require in Node.js is Singleton.
Also - require is synchronous operation and it also works deterministically. This means it is predictable and will always work the same.
This behaviour is not for require only - Node.js has only one thread in Event-Loop and it works like this (little simplified):
Look if there is any task that it can do
Take the task
Run the task synchronously from the beginning to the end
If there are any asynchronous calls, it just push them to "do later", but it never starts them before synchronous part of the task is done (unless there is worker spawned, but you dont need to know details about this)
Repeat the whole process
For example imagine this code is file infinite.js:
setTimeout(() => {
while(true){
console.log('only this');
}
}, 1000)
setTimeout(() => {
while(true){
console.log('never this');
}
}, 2000)
while(someConditionThatTakes5000Miliseconds){
console.log('requiring');
}
When you require this file, it first register to "doLater" the first setTimeout to "after 1000ms be resolved", the second for "after 2000ms be resolved" (note that it is not "run after 1000ms").
Then it run the while cycle for 5000ms (if there is condition like that) and nothing else happens in your code.
After 5000ms the require is completed, the synchronous part is finished and Event Loop looks for new task to do. And the first one to see is the setTimeout with 1000ms delay (once again - it took 1000ms to just mark as "can be taken by Event-Loop", but you dont know when it will be run).
There is neverending while cycle, so you will see in console "only this". The second setTimeout will never be taken from Event-Loop as it is marked after 2000ms to "can be taken", but Event Loop is stuck in never-ending while loop already.
With this knowledge, you can use require (and other Node.js aspects) very confidently.
Conclusion - the require is synchronous, deterministic. Once it finishes with requiring file (the output of it is a object with methods and properties you export, or empty object if you dont export anything) the reference to this object is saved to Node.js core memory. When you require file from somewhere else, it firsts look into the core memory and if it finds the require there, it just use the reference to the object and therefore never execute it twice.
POC:
Create file infinite.js
const time = Date.now();
setTimeout(() => {
let i=0;
console.log('Now I get stuck');
while(true){
i++;
if (i % 100000000 === 0) {
console.log(i);
}
}
console.log('Never reach this');
}, 1000)
setTimeout(() => {
while(true){
console.log('never this');
}
}, 2000)
console.log('Prepare for 5sec wait')
while(new Date() < new Date(time + 5*1000)){
// blocked
}
console.log('Done, lets allow other')
Then create server.js in same folder with
console.log('start program');
require('./infinite');
console.log('done with requiring');
Run it with node server
This will be the output(with numbers neverending):
start program
Prepare for 5sec wait
Done, lets allow other
done with requiring
Now I get stuck
100000000
200000000
300000000
400000000
500000000
600000000
700000000
800000000
900000000
The documentation of Node.js about modules explains:
Modules are cached after the first time they are loaded. This means (among other things) that every call to require('foo') will get exactly the same object returned, if it would resolve to the same file.
Multiple calls to require('foo') may not cause the module code to be executed multiple times.
It is also worth mentioning the situations when require produce Singletons and when this goal could not reached (and why):
Modules are cached based on their resolved filename. Since modules may resolve to a different filename based on the location of the calling module (loading from node_modules folders), it is not a guarantee that require('foo') will always return the exact same object, if it would resolve to different files.
Additionally, on case-insensitive file systems or operating systems, different resolved filenames can point to the same file, but the cache will still treat them as different modules and will reload the file multiple times. For example, require('./foo') and require('./FOO') return two different objects, irrespective of whether or not ./foo and ./FOO are the same file.
To summarize, if your module name is unique inside the project then you'll always get Singletons. Otherwise, when there are two modules having the same name, require-ing that name in different places may produce different objects. To ensure they produce the same object (the desired Singleton) you have to refer the module in a manner that is resolved to the same file in both places.
You can use require.resolve() to find out the exact file that is resolved by a require statement.
At the moment, I'm trying to request a very large JSON object from an API (particularly this one) which, depending on various factors, can be upwards of a few MB. The problem is, however, is that NodeJS takes forever to do anything and then just runs out of memory: the first line of my response callback doesn't ever execute.
I could request each item individually, but that is a tremendous amount of requests. To quote the a dev behind the new API:
Until now, if you wanted to get all the market orders for Tranquility you had to request every type per region individually. That would generally be 50+ regions multiplied by upwards of 13,000 types. Even if it was just 13,000 types and 50 regions, that is 650,000 requests required to get all the market information. And if you wanted to get all the data in the 5-minute cache window, it would require almost 2,200 requests per second.
Obviously, that is not a great idea.
I'm trying to get the array items into redis for use later, then follow the next url and repeat until the last page is reached. Is there any way to do this?
EDIT:
Here's the problem code. Visiting the URL works fine in-browser.
// ...
REGIONS.forEach((region) => {
LOG.info(' * Grabbing data for `' + region.name + '#' + region.id + '`');
var href = url + region.id + '/orders/all/', next = href;
var page = 1;
while (!!next) {
https.get(next, (res) => {
LOG.info(' * * Page ' + page++ + ' responded with ' + res.statusCode);
// ...
The first LOG.info line executes, while the second does not.
It appears that you are doing a while(!!next) loop which is the cause of your problem. If you show more of the server code, we could advise more precisely and even suggest a better way to code it.
Javascript run your code single threaded. That means one thread of execution runs to completion before any other events can be run.
So, if you do:
while(!!next) {
https.get(..., (res) => {
// hoping this will run
});
}
Then, your callback to http.get() will never get called. Your while loop just keeps running forever. As long as it is running, the callback from the https.get() can never get called. That request is likely long since completed and there's an event sitting in the internal JS event queue to call the callback, but until your while() loop finished, that event can't get called. So you have a deadlock. The while() loop is waiting for something else to run to change it's condition, but nothing else can run until the while() loop is done.
There are several other ways to do serial async iterations. In general, you can't use .forEach() or while().
Here are several schemes for async looping:
Node.js: How do you handle callbacks in a loop?
While loop with jQuery async AJAX calls
How to synchronize a sequence of promises?
How to use after and each in conjunction to create a synchronous loop in underscore js
Or, the async library which you mentioned also has functions for doing async looping.
First of all, a few MBs of json payload is not exactly huge. So the route handler code might require some close scrutiny.
However, to actually deal with huge amounts of JSON, you can consume your request as a stream. JSONStream (along with many other similar libraries) allow you to do this in a memory efficient way. You can specify the paths you need to process using JSONPath (XPath analog for JSON) and then subscribe to the stream for matching data sets.
Following example from the README of JSONStream illustrates this succinctly:
var request = require('request')
, JSONStream = require('JSONStream')
, es = require('event-stream')
request({url: 'http://isaacs.couchone.com/registry/_all_docs'})
.pipe(JSONStream.parse('rows.*'))
.pipe(es.mapSync(function (data) {
console.error(data)
return data
}))
Use the stream functionality of the request module to process large amounts of incoming data. As data comes through the stream, parse it to a chunk of data that can be worked with, push that data through the pipe, and pull in the next chunk of data.
You might create a transform stream to manipulate a chunk of data that has been parsed and a write stream to store the chunk of data.
For example:
var stream = request ({ url: your_url }).pipe(parseStream)
.pipe(transformStream)
.pipe (writeStream);
stream.on('finish', () => {
setImmediate (() => process.exit(0));
});
Try for info on creating streams https://bl.ocks.org/joyrexus/10026630
I have 3 modules in a project: A,B,C; all of them are using Rethinkdb, which requires an async r.connect call upon initialization.
I'm trying to make a call from module A to B from command line; however, despite starting r.connect on require(), B couldn't serve this, because rethinkdb haven't loaded by the time the module A calls.
In what ways might this code be refactored, such that I can actually make sure all initializations are complete before calling B?
I've tried to use closures to pass state around modules; however, due to r.connect only being available as async function, this would take the form of:
r.connect( config.rethinkdb, function(err, connection) {
rconn = connection;
// all module requires
require("./moduleB")(rconn);
require("./moduleC")(rconn);
...lotsacode...
});
Which feels very wrong. Any better suggestions?
You can use promise, and pass the connection around. Something like this
r.connect(config.rethinkdb)
.then(function(connection) {
//do some stuff here if you want
initWholeApp(connection)
})
and inside initWholeApp connection, you can put your application code.
You can even simplify it to
r.connect(config.rethinkdb)
.then(initWholeApp)
With initWholeApp is a function that accept an argument as the establish connection.
More than that, you can even run each of query on a connection(just ensure to close the connection) once you are done with that query, or using a RethinkDB connection pool with a driver that support it such as https://github.com/neumino/rethinkdbdash or roll your own.
I'm hoping someone who uses the Q module in Node.js can advise me.
So, my question is:
Is it typically necessary to create more than one promise if you are trying to perform multiple functions in sequence?
For example, I'm trying to create an application that:
1) Read data from a file (then)
2) Opens a database connection (then)
3) Executes a query via the database connection (then)
4) Store JSON dataset to a variable (then)
5) Close Database connection. (then)
6) Perform other code base on the JSON dataset I've stored to a variable.
In raw Node.js, each method of an object expects a callback, and in order to do these tasks in the proper order (and not simultaneously -which wouldn't work), I have to chain these callbacks together using an ugly amount of code nesting.
I discovered the Q module, which prevents nesting via the Promise concept.
However, in my first attempt at using Q, I'm trying to make a promise out of everything, and I think I might be over-complicating my code.
I'm think that maybe you only really have to create one promise object to perform the steps mentioned above, and that it may be unnecessary for me to convert every method to a promise via the Q.denodeify method.
For example, in my code I will be connecting to a db2 database using the ibm_db module. Probably due to misunderstanding, I've converted all the ibm_db methods to promises like this:
var ibmdb = require('ibm_db');
var q = require('q');
var ibmdbOpen = q.denodeify(ibmdb.open);
var ibmdbConn = q.denodeify(ibmdb.conn);
var ibmdbClose = q.denodeify(ibmdb.close);
var ibmdbQuery = q.denodeify(ibmdb.query);
Is this really necessary?
In order to do one thing after another in Q, is it necessary for me to denodeify each method I'll be using in my script?
Or, can I just create one promise at the beginning of the script, and use the q.then method to perform all the asynchronous functions in sequence (without blocking).
Is it typically necessary to create more than one promise if you are trying to perform multiple functions in sequence?
Yes, definitively. If you didn't have promises for all the intermediate steps, you'd have to use callbacks for them - which is just what you were trying to avoid.
I'm trying to make a promise out of everything
That should work fine. Indeed, you should try to promisify on the lowest possible level - the rule is to make a promise for everything that is asynchronous. However, there is no reason to make promises for synchronous functions.
Especially your steps 4 and 5 trouble me. Storing something in a variable is hardly needed when you have a promise for it - one might even consider this an antipattern. And the close action of a database should not go in a then handler - it should go in a finally handler instead.
I'd recommend not to use a linear chain but rather:
readFile(…).then(function(fileContents) {
return ibmdbOpen(…).then(function(conn) {
return imbmdbQuery(conn, …).finally(function() {
return imbdbClose(conn);
});
}).then(function(queriedDataset) {
…
});
});
I'm using NodeJS to walk over a list of files and generate an MD5 hash for each one. Here's how I would normally do this synchronously:
// Assume files is already populated with an array of file objects
for(file in files) {
var currentFile = files[file];
currentFile.md5 = md5(file.path);
}
The problem here is that the MD5 function is asynchronous and actually has a callback function that is runs once the MD5 hash has been generated for the file. Thus, all of my currentFile.md5 variables are just going to be set to undefined.
Once I have gotten all of the MD5 hashes for all of the files I'll need to move onto another function to deal with that information.
How gnarly is the code going to get in order for me to do this asynchronously? What's the cleanest way to accomplish what I want to do? Are there common different approaches that I should be aware of?
To call an async function multiple times, you should make a function and call it in recursion like this.
I have assumed your md5 function has a callback with two params err and result.
var keys = Object.keys(files); // taking all keys in an array.
function fn() {
var currentFile = files[keys.shift()];
md5(currentFile, function (err, result) {
// Use result, store somewhere
// check if more files
if (keys.length) {
fn();
} else {
// done
}
});
}
One great approach is to use async. (Search on npm)
If you want to roll your own
Count the files, put that in a var
Everytime fs opens a file and calls your intermediate callback, compute and store the MD5
Also, decrement that counter.
When counter === 0, call a "final" callback, passing back all the MD5s.
To answer your questions (theoretically), in Javascript world, there are (at the moment) 2 different ways to deal with asynchronous code
Using callbacks. This is the most basic way that people start using Javascript know. However , there are plenty of libraries to help people deal with callback in a less painful way such as async, step. In your particular problem. Assuming that md5 is somehow weirdly asynchronous, you can use https://github.com/caolan/async#parallel to achieve it
Another way is to use promise, there are also plenty of promise-compliant libraries such as q, when. Basically, with a promise you have a nicer way to organize your code flow (IMO). With the problem above you can use when.all to gather the result of md5. However, you need to turn md5 into a promise-compliant function
To avoid "callback hell" you should introduce the world of promises to your Node toolset. I suggest q https://npmjs.org/package/q
Here is a post on SO that can help and give you an idea of the syntax how to use q.js promises to work with multiple asynchronous operations.
You essentially would run all your async functions with defered promises, the .then() chained method would fire when all promises are resolved and the function passed inside then() can process your MD5'd data.
I hope this helps.