Node JS forEach memory leak issue - javascript

I've been creating a small node js app that iterates through an array of names and queries an API for the names. The issue I have is that the array is very large (400,000+ words) and my application runs out of memory before the forEach is complete.
I've been able to diagnose the issue by researching about how JS works with the call stack, web api, and callback queue. What I believe the issue to be is that the forEach loop is blocking the call stack and so the http requests continue to clog up the callback queue without getting resolved.
If anyone can provide a solution for unblocking the forEach loop or an alternative way of coding this app I would be very greatful.
Node JS App
const mongoose = require("mongoose");
const fs = require("fs");
const ajax = require("./modules/ajax.js");
// Bring in Models
let Dictionary = require("./models/dictionary.js");
//=============================
// MongoDB connection
//=============================
// Opens connection to database "test"
mongoose.connect("mongodb://localhost/bookCompanion");
let db = mongoose.connection;
// If database test encounters an error, output error to console.
db.on("error", (err)=>{
console.error("Database connection failed.");
});
db.on("open", ()=>{
console.info("Connected to MongoDB database...");
}).then(()=>{
fs.readFile("./words-2.json", "utf8", (err, data)=>{
if(err){
console.log(err);
} else {
data = JSON.parse(data);
data.forEach((word)=>{
let search = ajax.get(`API url Here?=${word}`);
search.then((response)=>{
let newWord = new Dictionary ({
word: response.word,
phonetic: response.phonetic,
meaning: response.meaning
}).save();
console.log("word saved");
}).catch((err)=>{
console.log("Word not found");
});
});
};
});
});

Check
Check whether api accepts multiple query params.
Try to use async Promises.
Resolve the promises and try to perform the save operation on the Promises by Promise#all

Related

Node / Express race condition on DB operations

I have an express route that needs to retrieve a row from a db, check if a field has a value set and if not set it. The row data then gets sent back to the client.
I know that node runs on a single thread but I/O operations do actually run asynchronously so I think I may have a problem if whilst the first client is waiting to write to db, a second client comes and reads a null value and performs the write a second time.
I can't have this happen as the value written is a shared value that there can only be one of.
Am I correct that this could happen and if so what is a recommended way to handle this?
Thanks.
let express = require('express');
let router = express.Router();
router.post('/getRoomStateByRoomUrl', async (req, res, next) => {
const roomUrl = req.body.room_url;
try {
//READ FROM DB
const roomState = await RoomStateModel.getRoomStateByRoomUrl(roomUrl);
if(!roomState.tokboxSessionID) {
const newSessionID = await TokboxService.createSession();
//WRITE TO DB
await RoomStateModel.setTokboxSessionID(newSessionID);
roomState.tokboxSessionID = newSessionID;
}
res.status(200).json(roomState);
} catch (error) {
next(error);
}
});

How can data schema be rejected before the mongodb connection is made by node.js and mongoose?

I don't understand a specific asynchronous javascript code: I have a very simple few lines of javascript running by node.js where I query a local mongoDB, on the main lines it does this:
require mongoose
a promise to connect to the db
mongoose.connect("...url to my local mongoDB...")
.then(console.log("Connected to DB..."))
create a schema
create a model from schema
define an async function to create a new object, save it as document in mongoDB and console.log the result returned after the attempt to save the document.
What I don't understand is the order of the console.log("Connect to DB") and console.log(result from document.save()): indeed, when there are no error on saving, the order seems ok: i have first the "Connected to DB..." then the returned saved document:
But when there is a data validation error for not respecting some requirements, then the "Connected to DB" is printed after the "Connected to DB":
Regarding the structure of the code, I don't understand why the "Connected to the DB..." is printed after the print of the Error. I suspect ansynchronous code to be the reason but i don't understand why. This very simple few lines of code come from the "Programming with Mosh" course where we can see the exact same behavior on his console.
A little bit more code details:
const mongoose = require("mongoose")
mongoose
.connect(my_mongo_db_url)
.then(() => console.log("Connected to DB"))
.catch(err => console.log("Could not connect to DB"))
const courseSchema = new mongoose.Schema({ ...course schema... })
const Course= mongoose.model("Course", courseSchema )
async function createCourse(){
const course = new Course({ ...new course values... })
try { const result = await course.save()}
catch (err) { console.log(err.message)}
}
createCourse()
I copy here the #jonrsharpe comment that answered my question:
"The call to course.save may be executed before the connection is made, but its internal implementation waits for the connection: https://mongoosejs.com/docs/connections.html#buffering"

Cloud HTTPS Functions: returning a Promise which is inside a Promisee

I'm currently working on an HTTPS Cloud Function using Firebase, consisting in deleting the post my Android user requested.
General idea
The workflow is (the whole code is available at the end of this SO Question): 1) Firebase checks the user identity (admin.auth().verifyIdToken) ; 2) Firestore gets data from the post that must be deleted (deleteDbEntry.get().then()) ; 3) Cloud Storage prepares itself to delete the file found in the gotten data (.file(filePath).delete()) ; 4) Firestore prepares a batch to delete the post (batch.delete(deleteDbEntry);) and to update the likes/unlikes using the gotten data (batch.update(updateUserLikes,) ; 5) executes the promise of the deletion of the file and of the batch (return Promise.all([deleteFile, batch_commit])).
Expected behavior
I would want to check the user identity. If it's successful, to get the requested post to delete's data using Firebase. If it's successful, I would want to execute the Firestore batch plus the Cloud Storage file deletion in the same promise (that's why I use Promise.all([deleteFile, batch_commit]).then()). If the identity check fails, or if the data get fails, or if the batch fails, I would want to tell the Android app. If all successes, idem.
As all of these operations are in a Cloud HTTPS Function, I must return a promise. This promise, I think, would correspond to all that operations if they are successful, or to an error if at least one is not (?).
Actual behavior
For the moment, I just return the promise of the Firebase user identity check.
My problem & My question
I can't go from the actual behavior to the expected behavior because:
I think it's not very clear in my mind whether I should return the promise corresponding to "all these operations are successful, or at least one is not" in this Cloud HTTPS Function
As these operations are nested (except the Firestorage file deletion + Firestore post deletion which are present in a batch), I can't return something like Promise.all().
My question
Could you please tell me if I'm right (point 1.) and, if not: what should I do? If yes: how could I do it, because of point 2.?
Whole Firebase Cloud HTTPS Function code
Note: I've removed my input data controls to make my code more lisible.
exports.deletePost = functions.https.onCall((data, context) => {
return admin.auth().verifyIdToken(idToken)
.then(function(decodedToken) {
const uid = decodedToken.uid;
const type_of_post = data.type_of_post;
const the_post = data.the_post;
const deleteDbEntry = admin_firestore.collection('list_of_' + type_of_post).doc(the_post);
const promise = deleteDbEntry.get().then(function(doc) {
const filePath = type_of_post + '/' + uid + '/' + data.stored_image_name;
const deleteFile = storage.bucket('android-f.appspot.com').file(filePath).delete();
const batch = admin.firestore().batch();
batch.delete(deleteDbEntry);
if(doc.data().number_of_likes > 0) {
const updateUserLikes = admin_firestore.collection("users").doc(uid);
batch.update(updateUserLikes, "likes", FieldValue.increment(-doc.data().number_of_likes));
}
const batch_commit = batch.commit();
return Promise.all([deleteFile, batch_commit]).then(function() {
return 1;
}).catch(function(error) {
console.log(error);
throw new functions.https.HttpsError('unknown', 'Unable to delete the post. (2)');
});
}).catch(function(error) {
console.log(error);
throw new functions.https.HttpsError('unknown', 'Unable to delete the post. (1)');
});
return promise;
}).catch(function(error) {
console.log(error);
throw new functions.https.HttpsError('unknown', 'An error occurred while verifying the token.');
});
});
You should note that you are actually defining a Callable Cloud Function and not an HTTPS one, since you do:
exports.deletePost = functions.https.onCall((data, context) => {..});
One of the advantages of a Callable Cloud Function over an HTTPS one is that it "automatically deserializes the request body and validates auth tokens".
So you can simply get the user uid with context.auth.uid;.
Now, regarding the way of "orchestrating" the different calls, IMHO you should just chain the different Promises returned by the asynchronous Firebase methods (the ones of Firestore and the one of Cloud Storage), as follows:
exports.deletePost = functions.https.onCall((data, context) => {
//....
const uid = context.auth.uid;
let number_of_likes;
const type_of_post = data.type_of_post;
const the_post = data.the_post;
const deleteDbEntry = admin_firestore.collection('list_of_' + type_of_post).doc(the_post);
return deleteDbEntry.get()
.then(doc => {
number_of_likes = doc.data().number_of_likes;
const filePath = type_of_post + '/' + uid + '/' + data.stored_image_name;
return storage.bucket('android-f.appspot.com').file(filePath).delete();
})
.then(() => {
const batch = admin.firestore().batch();
batch.delete(deleteDbEntry);
if (number_of_likes > 0) {
const updateUserLikes = admin_firestore.collection("users").doc(uid);
batch.update(updateUserLikes, "likes", FieldValue.increment(-doc.data().number_of_likes));
}
return batch.commit();
}).catch(function (error) {
console.log(error);
throw new functions.https.HttpsError('....', '.....');
});
});
I don't think using Promise.all() will bring any interest, in your case, because, as explained here, "if any of the passed-in promises reject, Promise.all asynchronously rejects with the value of the promise that rejected, whether or not the other promises have resolved".
At the time of writing, there is no way to group all of these asynchronous calls to different Firebase services into one atomic operation.
Even if the batched write at the end is atomic, it could happen that the file in Cloud Storage is correctly deleted but that the batched write to Firestore is not executed, for example because there is a problem with the Firestore service.
Also, note that you only need one exception handler at the end of the Promise chain. If you want to differentiate the cause of the exception, in such a way that you send a different error message to the front-end you could use the approach presented in this article.
The article shows how to define different custom Error classes (derived from the standard built-in Error object) which are used to check the kind of error in the exception handler.

Node Mysql queries queue but never execute

I loop over some directories and then the files in them. I process the files by directory and then try to add the processed results into MySQL.
I call conn.query('INSERT QUERY HERE') and it seems to continue on but the query never runs on the server. If I tell it to just process one directory and wait till the end it will run the queries but I can't have it continue to store all the queries in memory till the end of the script or node will fail out due to mem cap. I have tried everything I can think of to try and force the queued queries to run but no luck.
Here is an example of my code
dirs.forEach(function(dir){
var data = [];
var connection = mysql.createConnection(conConfig);
files.forEach(function(file){
//do some processing on files push into data array
//creating array of objects
});
data.forEach(function(record){
connection.query('INSERT INTO TABLE SET ?', record);
});
connection.end();
});
The code will just continue to loop over the directories without ever sending the query to mysql. I know it will work by limiting the code to just run on one directory and it will runt he queries once the one directory is processed but not if I let it run on all directories.
I have tried using mysql pooling as well with no luck. The
pool.on('enqueue' function... will fire but never send it over to the server.
edit:
So I tried calling the script with a for loop from bash to call every dir name individually and all records were loaded. I'm dumbfounded as to why a mysql connection is never established in my orig example.
Javascript calls the mysql query asynchronously. That means the connection will likely be be closed before all insert queries are finished.
What you can do is to use the callbacks that the query function provides:
var qCnt = array.length;
var connection = mysql.createConnection(conConfig);
connection.connect();
array.forEach(function(record){
connection.query('INSERT INTO TABLE SET ?', record, function(err){
if (err) throw err;
qCnt--;
if (qCnt == 0){
connection.end();
}
});
});
This solution is not ideal, since all the insert queries are fired regardless of your database connection limit etc. you may want to fire the next insert only after the former is done. This is also possible with some tricks.
It is in fact an async issue. There does not seem to be any way to force queued queries to execute without stopping the current running process. I had to use the async module in order to make my code work.
#luksch connection.end() will not close the connection till all queued queries are finished. I did use his iteration method to make the callback though.
Here is how I did it.
var async = require('async');
var connection = mysql.createConnection(conConfig);
var dirs = fs.readdirSync('./rootdirectory');
async.eachSeries(dirs,function(dir,callback){
var data = [];
files.forEach(function(file){
//do some processing on files push into data array
//creating array of objects
});
var qCount = data.length;
data.forEach(function(record){
connection.query('INSERT INTO TABLE SET ?', record, function(err){
if (err) throw err;
qCount--
if(qCount === 0) { callback(true); }
});
});
function(){ connection.end(); }
});
This will iterate over the directories and queue all the queries and then force the queries to be run till all directories have been processed then call the final function to close the connection.

node.js and mongodb-native: wait till a collection is non-empty

I am using node.js and with the native mongodb driver (node-mongodb-native);
My current project uses node.js + now.js + mongo-db.
The system basically sends data from the browser to node.js, which is processed with haskell and later fed back to the browser again.
Via a form and node.js the text is inserted in a mongo-db collection called "messages".
A haskell thread reads the entry and stores the result in the db collection "results". This works fine.
But now I need the javascript code that waits for the result to appear in the collection results.
Pseudo code:
wait until the collection result is non-empty.
findOne() from the collection results.
delete the collection results.
I currently connect to the mongodb like this:
var mongo = require('mongodb'),
Server = mongo.Server,
Db = mongo.Db;
var server = new Server('localhost', 27017, {
auto_reconnect: true
});
var db = new Db('test', server);
My haskell knowledge is quite good but not my javascript skills.
So I did extensive searches, but I didn't get far.
glad you solved it, i was going to write something similar:
setTimeout(function(){
db.collection('results',function(coll){
coll.findOne({}, function(err, one){
if( err ) return callback(err);
coll.drop(callback); //or destroy, not really sure <-- this will drop the whole collection
});
});
} ,1000);
The solution is to use the async library.
var async = require('async');
globalCount = -1;
async.whilst(
function () {
return globalCount<1;
},
function (callback) {
console.log("inner while loop");
setTimeout(db_count(callback), 1000);
},
function (err) {
console.log(" || whilst loop finished!!");
}
);

Categories