DB connection caching code in serverless function executes only once - javascript

I have a lambda function that connects to a MongoDB instance, and am adding a connection pool reuse logic so that when a container is reused by lambda, the previous connection is reused instead of creating a new one.
In my database.ts file, previously there was a single export const db = mongo.createConnection(..) that was being used everywhere in the code, which created a new connection on every function invocation.
Now, as per online tutorials, I stored the actual connection variable in a global variable in the file and converted the export to a function which checks whether the above-mentioned connection object is not null, and if so returns it without creating a new connection. If it is indeed null, then it creates a connection and assigns the result of createConnection to the variable and also returns it.
Of course, I needed to convert all the usages of db to db(), since it was a function now. Logs added in the file indicated that the connection was indeed being reused for all subsequent calls to db after the first one.
Then, I converted db into an IIFE and back to a normal export. I imagined that instead of having to go and replace all usages to a function call, it'd be better to call it in the file instead and then export it.
However now, what's happening is strange. The first invocation shows that a new connection is being created. (=> Creating a new database connection) in the code. However on subsequent invocations, none of the log statements are printed. Ideally, shouldn't (=> Reusing an existing connection) be printed on the console? I suppose the connection is still being reused since the first log does not appear, but why does this phenomenon occur?
I know it has something to do with how modules are required or cached and how AWS Lambda handles JS modules. Any further insight will be helpful.
let connection: mongoose.Connection | null = null;
let isPreviousConnectionAvailable = false;
export const db = (() => {
if (!isPreviousConnectionAvailable) {
log.debug('=> Creating a new database connection.');
connection = mongoose.createConnection(dbConnectionURL, { useNewUrlParser: true });
isPreviousConnectionAvailable = true;
} else {
log.debug('=> Reusing an existing database connection');
}
return connection;
})();

The logs are not going to be printed. As it gets called the first time it gets imported and then subsequently you are accessing the same constant when the container is warm. It is the expected behavior.

Related

Understanding weird bug with promises and undefined var

I've just discovered that my API is doing weird things when 2 requests are triggered at almost the same time.
I figured out that the issue was me missing the "var" declaration before my "user" variable below, but I'm really curious about the root issue that caused the bug described below:
I have two API endpoints that call the same function as follow:
router.get('/refresh_session_token', function (req, res) {
let user_id = req.body.user_id // The value sent is 8
findUserWithId(user_id)
.then(user_data => {
user = user_data // I forgot 'var' here
})
.then(() => {
console.log(user) // This should always show user data from user_id = 8
})
}
router.get('/resend_invite', function (req, res) {
let user_id = req.body.user_id // The value sent is 18
findUserWithId(user_id)
.then(user_data => {
user = user_data // I forgot 'var' here
})
.then(() => {
console.log(user) // This should always show user data from user_id = 18
})
}
const findUserWithId = (id) => {
return knex.raw(`SELECT * FROM users WHERE id = ?`, [id]).then((data) => data.rows[0])
}
All this code is in the same file that I export through module.exports = router;
What I discovered is that if I trigger the endpoints /refresh_session_token and /resend_invite at almost the same time each with two different user_id, it happens that sometimes, my console.log returns the same result for both as if I was using the same user_id.
Adding var to user fixed the issue but I'm very surprised as to what is actually happening on the background.
Do you have any idea?
When you don't declare your variable and you aren't running your module in Javascript's strict mode, then the first assignment to that variable with:
user = user_data
creates an automatic global variable named user. This means that your two routes are then sharing that same variable.
And, since your two routes both have asynchronous operations in them, even with the single-threadedness of things, your two routes can still be in-flight at the same time and both trying to use the same global variable. One route will overwrite the value from the other. This is a disaster in server-based code because usually, the bug won't show until you get into production and it will be really, really hard to find a reproducible case.
The best answer here is to always run your code in strict mode and then the JS interpreter will make this an error and you will never be allowed to run your code this way in the first place. The error will be found very quickly and easily.
Then obviously, always declare variables with let or const. There are very, very few reasons to ever use var any more as let and const give you more control over the scope of your variable.
To run your module in strict mode, insert this:
'use strict';
before any other Javascript statements.
Or, use something like TypeScript that doesn't let you do sloppy things like not declare your variables.

Proper node.js abstractions to prevent race conditions when accessing ethereum blockchain

I am using web3 version 1.0.0-beta.27 where all accesses to the blockchain will be asynchronous, clearly this opens up the possibility of race conditions, ie:
var Web3 = require("web3");
// connect to etherum blockchain
var ether_port = 'http://localhost:8545'
var web3 = new Web3(new Web3.providers.HttpProvider(ether_port));
// this is how we set the value, note there is the possiblity of race condidtions here
var accounts = []
web3.eth.getAccounts().then(function(accts){
console.log("printing account: ", accts)
accounts = accts
})
// observe race condition
console.log("assert race condition: ", accounts[0])
The last line above is contrived, it is there to demonstrate that I would like to use accounts after it has been evaluated. Ie, eventually I would like modify/read the blockchain from a front end express.js web app or even a mobile app, so in the interest of being rigorous, what are the common tools in node.js to ensure race conditions never occur? Do these tools exist? If not what are some common practices. I am new to node.js as well.
One idea is to not attempt to directly store the data because code trying to access the data has no idea when it's valid due to the uncertain nature of asynchronous results. So, instead you store the promise and any code that wants access to the data, just uses .then()/.catch() on the promise. This will always work, regardless of the async timing. If the data is already there, the .then() handler will be called quickly. If the data is not yet there, then the caller will be in line to be notified when the data arrives.
let accountDataPromise = web3.eth.getAccounts().then(function(accts){
console.log("printing account: ", accts)
return accts;
});
// then, elsewhere in the code
accountDataPromise.then(accts => {
// use accts here
}).catch(err => {
// error getting accts data
});
FYI, assigning data from a .then() handler to a higher scoped variable that you want to generally use in other code outside the promise chain is nearly always a sign of troublesome code - don't do it. This is because other code outside the promise chain has no idea when that data will or will not be valid.

mdg:validated-method _execute asynchronous issues

I'm running into problems with the validated method package in my app tests. I'm calling my methods through the _execute function in order to be able to pass a userId to simulate a logged-in user while testing. The problem is that my asserts right underneath that _execute are called before the method has a chance of completing. I know my test works though because it only happens sometimes, mostly because mongo isn't always returning results quite as fast.
I looked around and found a todos app that uses the _execute function in its tests. I can't get those tests to fail no matter how many times I rerun them, though.
This is an example of my test code.
describe('clients.add', function() {
it('should add an empty (draft) client', function() {
const res = clients_add._execute({ userId: 'CURRENTUSERID' }, { company_id: c1._id });
assert.instanceOf(res, Mongo.ObjectID, 'method returns the newly created clients ID');
const db_client = Clients.findOne(res);
assert.isTrue(db_client.draft, 'client is drafted');
assert.isDefined(db_client.created, 'there\'s a created date');
});
});
clients_add does quite a few permission checks and can therefor take a little while before completing. Rerunning this test 20 times will fail about 5 times and pass the other 15.
Shouldn't the _execute function be synchronous? How do I make it? What am I missing?
In server code, if you provide a callback to database modification functions like insert, it returns the created ID instantaneously, and runs the callback only once the database has acknowledged the write. If you don't provide a callback, the insert call is synchronous and throws an error if the operation fails. See more about this in Meteor docs.
It seems that you have provided an error-handling callback to the insert-function in your method code. This causes the inconsistent behavior, since the database might not actually have had time to do the write before you call findOne in your test. Also, this is redundant since if an error occurs in the insert, the method has already returned and the error is never shown to the user. It's better to simply omit the error-handling callback altogether:
return Clients.insert(new_client);

Node.js database initalization from multiple modules

I have 3 modules in a project: A,B,C; all of them are using Rethinkdb, which requires an async r.connect call upon initialization.
I'm trying to make a call from module A to B from command line; however, despite starting r.connect on require(), B couldn't serve this, because rethinkdb haven't loaded by the time the module A calls.
In what ways might this code be refactored, such that I can actually make sure all initializations are complete before calling B?
I've tried to use closures to pass state around modules; however, due to r.connect only being available as async function, this would take the form of:
r.connect( config.rethinkdb, function(err, connection) {
rconn = connection;
// all module requires
require("./moduleB")(rconn);
require("./moduleC")(rconn);
...lotsacode...
});
Which feels very wrong. Any better suggestions?
You can use promise, and pass the connection around. Something like this
r.connect(config.rethinkdb)
.then(function(connection) {
//do some stuff here if you want
initWholeApp(connection)
})
and inside initWholeApp connection, you can put your application code.
You can even simplify it to
r.connect(config.rethinkdb)
.then(initWholeApp)
With initWholeApp is a function that accept an argument as the establish connection.
More than that, you can even run each of query on a connection(just ensure to close the connection) once you are done with that query, or using a RethinkDB connection pool with a driver that support it such as https://github.com/neumino/rethinkdbdash or roll your own.

JavaScript leaking memory (Node.js/Restify/MongoDB)

Update 4: By instantiating the restify client (see controllers/messages.js) outside of the function and calling global.gc() after every request it seems the memory growth rate has been reduced a lot (~500KB per 10secs). Yet, the memory usage is still constantly growing.
Update3: Came across this post: https://journal.paul.querna.org/articles/2011/04/05/openssl-memory-use/
It might be worth noting that I'm using HTTPS with Restify.
Update 2: Updated the code below to the current state. I've tried swapping out Restify with Express. Sadly this didn't make any difference. It seems that the api call at the end of the chain (restify -> mongodb -> external api) causes everything to retain to memory.
Update 1: I have replaced Mongoose with the standard MongoDB driver. Memory usage seems to grow less fast, yet the leak remains..
I've been working on trying to locate this leak for a couple of days now.
I'm running an API using Restify and Mongoose and for every API call I do at least one MongoDB lookup. I've got about 1-2k users that hit the API multiple times in a day.
What I have tried
I've isolated my code to just using Restify and used ApacheBench to fire a huge amount of requests (100k+). The memory usage stays around 60MB during the test.
I've isolated my code to just using Restify and Mongoose and tested it the same way as above. Memory usage stays around 80MB.
I've tested the full production code locally using ApacheBench. Memory usage stays around 80MB.
I've automatically dumped the heap on intervals. The biggest heap dump I had was 400MB. All I can see that there are tons of Strings and Arrays but I cannot clearly see a pattern in it.
So, what could be wrong?
I've done the above tests using just one API user. This means that Mongoose only grabs the same document over and over. The difference with production is that a lot of different users hit the API meaning mongoose gets a lot of different documents.
When I start the nodejs server the memory quickly grows to 100MB-200MB. It eventually stabilizes around 500MB. Could this mean that it leaks memory for every user? Once every user has visited the API it will stabilize?
I've included my code below which outlines the general structure of my API. I would love to know if there's a critical mistake in my code or any other approach to finding out what is causing the high memory usage.
Code
app.js
var restify = require('restify');
var MongoClient = require('mongodb').MongoClient;
// ... setup restify server and mongodb
require('./api/message')(server, db);
api/message.js
module.exports = function(server, db) {
// Controllers used for retrieving accounts via MongoDB and communication with an external api
var accountController = require('../controllers/accounts')(db);
var messageController = require('../controllers/messages')();
// Restify bind to put
server.put('/api/message', function(req, res, next) {
// Token from body
var token = req.body.token;
// Get account by token
accountController.getAccount(token, function(error, account) {
// Send a message using external API
messageController.sendMessage(token, account.email, function() {
res.send(201, {});
return next();
});
});
});
};
controllers/accounts.js
module.exports = function(db) {
// Gets account by a token
function getAccount(token, callback) {
var ObjectID = require('mongodb').ObjectID;
var collection = db.collection('accounts');
collection.findOne({
token: token
}, function(error, account) {
if (error) {
return callback(error);
}
if (account) {
return callback('', account);
}
return callback('Account not found');
});
}
};
controllers/messages.js
module.exports = function() {
function sendMessage(token, email, callback) {
// Get a token used for external API
getAccessToken(function() {}
// ... Setup client
// Do POST
client.post('/external_api', values, function(err, req, res, obj) {
return callback();
});
});
}
return {
sendMessage: sendMessage
};
};
Heap snapshot of suspected leak
Might be a bug in getters, I got it when using virtuals or getters for mongoose schema https://github.com/LearnBoost/mongoose/issues/1565
It's actually normal to only see string and arrays, as most programs are largely based on them. The profiler that allow sorting by total object count are therefore not of much use as they many times give the same results for many different programs.
A better way to use the memory profiling of chrome is to take one snapshot for example after one user calls an API, and then a second heap snapshot after a second user called the API.
The profiler gives the possibility to compare two snapshots and see what is the difference between one and the other (see this tutorial), this will help understand why the memory grew in an unexpected way.
Objects are retained in memory because there is still a reference to them that prevents the object from being garbage collected.
So another way to try to use the profiler to find memory leaks is to look for an object that you believe should not be there and see what is it's retaining paths, and see if there are any unexpected paths.
Not sure whether this helps, but could you try to remove unnecessary returns?
api/message.js
// Send a message using external API
messageController.sendMessage(token, account.email, function() {
res.send(201, {});
next(); // remove 'return'
});
controllers/accounts.js
module.exports = function(db) {
// Gets account by a token
function getAccount(token, callback) {
var ObjectID = require('mongodb').ObjectID;
var collection = db.collection('accounts');
collection.findOne({
token: token
}, function(error, account) {
if (error) {
callback(error); // remove 'return'
} else if (account) {
callback('', account); // remove 'return'
} else {
callback('Account not found'); // remove 'return'
}
});
}
return { // I guess you missed to copy this onto the question.
getAccount: getAccount
};
};
controllers/messages.js
// Do POST
client.post('/external_api', values, function(err, req, res, obj) {
callback(); // remove 'return'
});
Your issue is in the getAccount mixed with how GC work's.
When you chain lots of function the GC only clears one at a time and the older something is on memory the less chances it has of being collected so on your get account you need at least that I can count 6 calls to global.gc() or auto executes before it can be collected by this time the GC assumes its something that it probably wont collect so it doesn't check it anyway.
collection{
findOne{
function(error, account){
callback('', account)
sendMessage(...)
getAccessToken(){
Post
}
}
}
}
}
}
as suggested by Gene remove this chaining.
PS: This is just a representation of how the GC works and depends on Implementation but you get the point.

Categories