MongoDB executes queries sequentially instead of in parallel

MongoDB executes queries sequentially instead of in parallel - javascript

I have an API endpoint which I am trying to stress test which reads a very large MongoDB database collection (2 million documents). Each query takes roughly 2 seconds however the problem I am having is that the connection to the database isn't being pooled correctly so each query runs sequentially instead of concurrently.
I am using Mongoose to connect to my database and I am using artillery.io for testing.
Here is my connection code:
const mongoose = require('mongoose');
const Promise = require('bluebird');
const connectionString = process.env.MONGO_DB || 'mongodb://localhost/mydatabase';
mongoose.Promise = Promise;
mongoose.connect(connectionString, {
server: { poolSize: 10 }
});
const db = mongoose.connection;
db.on('error', console.error.bind(console, 'connection error: '));
db.once('open', function() {
console.log('Connected to: ' + connectionString);
});
module.exports = db;
It's your pretty bog standard connection procedure however probably the most important part is the server: { poolSize: 10 } line.
I am using the following script for artillery.io testing:
config:
target: 'http://localhost:1337'
phases:
-
duration: 10
arrivalRate: 5
name: "Warm-up"
scenarios:
-
name: "Search by postcodes"
flow:
-
post:
url: "/api/postcodes/gb_full/search"
headers:
Content-Type: 'application/json'
json:
postcodes:
- ABC 123,
- DEF 345,
- GHI 678
This test executes 50 calls to the API over 10 seconds. Now here's where the problem is, the API appears to execute queries sequentially, see the test results below:
"latency": {
"min": 1394.1,
"max": 57693,
"median": 30222.7,
"p95": 55396.8,
"p99": 57693
},
And the database logs are as follows:
connection accepted from 127.0.0.1:60770 #1 (1 connection now open)
...
2017-04-10T18:45:55.389+0100 ... 1329ms
2017-04-10T18:45:56.711+0100 ... 1321ms
2017-04-10T18:45:58.016+0100 ... 1304ms
2017-04-10T18:45:59.355+0100 ... 1338ms
2017-04-10T18:46:00.651+0100 ... 1295ms
It appears as though the API is only using one connection, which seems correct however it was my understanding that this will automatically put the poolSize to good use and execute these queries concurrently instead of one at a time.
What am I doing wrong here? How can I execute these database queries in parallel?
Edit 1 - Model and Query
To hopefully make things a little clearer, I am using the following model:
const mongoose = require('mongoose');
const db = require('...');
const postcodeSchema = mongoose.Schema({
postcode: { type: String, required: true },
...
location: {
type: { type: String, required: true },
coordinates: [] //coordinates must be in longitude, latitude order.
}
});
//Define the index for the location object.
postcodeSchema.index({location: '2dsphere'});
//Export a function that will allow us to define the collection
//name so we'll pass in something like: GB, IT, DE ect for different data sets.
module.exports = function(collectionName) {
return db.model('Postcode', postcodeSchema, collectionName.toLowerCase());
};
Where the db object is the connection module explained at the top of this question.
And I am executing a query using the following:
/**
* Searches and returns GeoJSON data for a given array of postcodes.
* #param {Array} postcodes - The postcode array to search.
* #param {String} collection - The name of the collection to search, i.e 'GB'.
*/
function search(postcodes, collection) {
return new Promise((resolve, reject) => {
let col = new PostcodeCollection(collection.toLowerCase());
col.find({
postcode: { $in: postcodes }
})
.exec((err, docs) => {
if (err)
return reject(err);
resolve(docs);
});
});
}
And here is an example of how the function can be called:
search(['ABC 123', 'DEF 456', 'GHI 789'], 'gb_full')
.then(postcodes => {
console.log(postcodes);
})
.catch(...);
To re-iterate, these queries are executed via the node.js API, therefore they should already be asynchronous however the queries themselves are being executed one after the other. Therefore I believe the problem may be on the MongoDB side but I have no idea where to even start looking. It's almost as if MongoDB is blocking any other queries from being executed against the collection if there is already one running.
I am running an instance of mongod.exe locally on a Windows 10 machine.

Firstly, MongoDB has a read lock when a query is issued (see here). That's why it was executing queries sequentially. The only way to improve this further is by sharding the collection.
If you are using mongo 3.0+ with wiredtiger as the storage engine you have document level locking. The queries should not execute sequentially, the sharding would definitely help with the paralelism but 2kk docs should not be a problem for most modern computer/server hardware.
You mention the log file of mongodb on the first question, you should have more than one connection opened, is that the case?

Ok, so I managed to figure out what the issues were.
Firstly, MongoDB has a read lock when a query is issued (see here). That's why it was executing queries sequentially. The only way to improve this further is by sharding the collection.
Also, as Jorge suggested, I added an index on the postcode field and this massively reduced the latency.
postcodeSchema.index({ postcode: 1 }); //, { unique: true } is a tiny bit faster.
To put it into perspective, here are the results of the stress test with the new index in place:
"latency": {
"min": 5.2,
"max": 72.2,
"median": 11.1,
"p95": 17,
"p99": null
},
The median latency has dropped from 30 seconds to 11 milliseconds which is an astonishing improvement.

Related

Mongoose-fuzzy-searching returns empty array unless query is empty

I am working on a wiki-like website component and I am trying to implement a fuzzy search. I found a popular Node.js plugin on npmjs for a fuzzy search of a cloud mongoDB database handled with Mongoose. I installed and saved mongoose-fuzzy-searching and hooked it up to my model and route. Then I updated all of the models, resaving each's value for the field I wanted to index. I can seem to call the function on a model, and it seems like there's an index on MongoDB Atlas, but it returns an empty array instead of any results. 2 questions:
Am I doing something wrong, or is there something I am missing?
Is there a better node library that's free? (*Free on heroku, which I think eliminates flexsearch as an option)
Here's my code.
Model:
const mongoose = require("mongoose"),
mongoose_fuzzy_searching = require('mongoose-fuzzy-searching'),
Schema = mongoose.Schema;
const IssueTemplateSchema = new Schema({
name: String,
info: String,
image: String,
tags: [{type: Schema.Types.ObjectId, ref: "Tag"}],
active: {type: Boolean, default: true},
instances: [{type: Schema.Types.ObjectId, ref: "LocalIssue"}],
issues: {type: Schema.Types.ObjectId, ref: "Issuegraph" },
});
IssueTemplateSchema.plugin(mongoose_fuzzy_searching, { fields: ['name'] });
module.exports = mongoose.model("IssueTemplate", IssueTemplateSchema);
An update to all of the issuetemplate models:
const express = require('express'),
router = express.Router(),
Issue = require('../api/issue/issue.template');
router.get("/createIndex", async (req, res) => {
Issue.find({}, async (err, issues) => {
if(err){console.log(err)}else {
for(issue of issues){
if(err){console.log(err)}else{
const name = issue.name;
await Issue.findByIdAndUpdate(issue._id, {name: name}, {strict: false});
}
}
}
});
return res.send("done");
});
Route:
router.get("/search", (req, res) => {
console.log(req.query.target);
let searchTerm = "";
if(req.query.target){
searchTerm = decodeURIComponent(req.query.target.replace(/\+/g, ' '));
}
Issue.fuzzySearch(searchTerm, (err, issue)=> {
if(err){
console.log(err);
res.send("Error fuzzy searching: " + err);
} else {
returnResult(issue);
}
});
function returnResult(result) {
console.log(result);
return res.send(result);
}
});
When I ran
npm install --save mongoose-fuzzy-searching
I received an error saying it needed mongoose 5.10, and I have 5.11, but it seemed to at least plug in. Can't see why When I send a request through Postman, I receive an empty array. If I leave the query blank, then I get everything. I have reset Node and am using mongoDB Cloud, where I see an index has been created. Is there perhaps a reset of the cloud database I would need to do (I don't know of such a thing), or is resetting the server enough? My knowledge level is: studying to be a freelance web developer and would appreciate any general tips on best practice, etc.

Mongoose need to define schema. which makes it slow and find() method is good for development, not for production level. Also ,This Process is outdated. You are working on MongoDB. if you need search then take a look into MongoDB atlas Full Text-Search.It includes all of those searching features like: autocomplete, Fuzzy Search everything.

It seems like my update operation was not actually updating the field I wanted to update, because findByIdAndUpdate() returns a query and I wasn't executing that query (with .exec() ). Await was also being used incorrectly because its purpose is not just to pause in an async function like I thought, but to wait for a promise to fulfill. A query is not a promise, and await is specifically for promises. Before I learned these details, I solved my problem in another way by using .find() , then .markModified("name") , then .save() . Once I did that, it all worked!

Why does creating a new tedious connection keep my program from finishing?

I'm trying to wrap the tedious MSSQL library API with promises to make it easier to use but whenever I make a new Promise that creates a new tedious SQL connection the program never exits and I'm having trouble figuring out why.
This is a stripped down version of my real code with the bare minimum needed to cause the issue.
const {Connection} = require('tedious');
const connect = () =>
new Promise((resolve, reject) =>
{
const config = {
userName: '----',
password: '----',
domain: '----',
server: '----',
options: {
database: '----',
port: 1805,
connectTimeout: 6000,
readOnlyIntent: true,
rowCollectionOnRequestCompletion: true,
encrypt: true
}
};
console.log('Pre new conn');
// const conn = new Connection(config);
console.log('Post new conn');
resolve('Resolved');
});
connect()
.then(conn => console.log(`conn: ${conn}`))
.catch(error => console.log(`Err: ${error}`));
When the connection succeeds I get the following output:
Pre new conn
Post new conn
conn: Resolved
If I uncomment the line const conn = new Connection(config); then I get the exact same output, but the program never exits!
I'm using tedious v2.6.4 and I'm running the program with node v8.11.3.

Node.js keeps track of open network connections, running timers and other things like that that might indicate that your node.js program is not yet done with whatever it was trying to do and when it sees that count is non-zero, it does not automatically exit. If you want it exit in that situation, you have three options:
You can close the connections that you are no longer using.
You can call .unref() on those connections to remove them from the count node.js is keeping. If it's a higher level thing like a database connection, you may need to call .unref() on the actual socket itself (which the DB interface may or may not make available to you) or perhaps the database shares it's own .unref() method for this purpose.
You can manually exit your process with process.exit() when you're doing with everything you wanted to do.

Is it safe to use a single Mongoose database from two files/processes?

I've been working on a server and a push notification daemon that will both run simultaneously and interact with the same database. The idea behind this is that if one goes down, the other will still function.
I normally use Swift but for this project I'm writing it in Node, using Mongoose as my database. I've created a helper class that I import in both my server.js file and my notifier.js file.
const Mongoose = require('mongoose');
const Device = require('./device'); // This is a Schema
var uri = 'mongodb://localhost/devices';
function Database() {
Mongoose.connect(uri, { useMongoClient: true }, function(err) {
console.log('connected: ' + err);
});
}
Database.prototype.findDevice = function(params, callback) {
Device.findOne(params, function(err, device) {
// etc...
});
};
module.exports = Database;
Then separately from both server.js and notifier.js I create objects and query the database:
const Database = require('./db');
const db = new Database();
db.findDevice(params, function(err, device) {
// Simplified, but I edit and save things back to the database via db
device.token = 'blah';
device.save();
});
Is this safe to do? When working with Swift (and Objective-C) I'm always concerned about making things thread safe. Is this a concern? Should I be worried about race conditions and modifying the same files at the same time?
Also, bonus question: How does Mongoose share a connection between files (or processes?). For example Mongoose.connection.readyState returns the same thing from different files.

The short answer is "safe enough."
The long answer has to do with understanding what sort of consistency guarantees your system needs, how you've configured MongoDB, and whether there's any sharding or replication going on.
For the latter, you'll want to read about atomicity and consistency and perhaps also peek at write concern.
A good way to answer these questions, even when you think you've figured it out, is to test scenarios: Hammer a duplicate of your system with fake data and events and see if what happen is OK or not.

nodejs modules code execution

need a little assistance with an understanding nodejs code organization,
so I'm from C++ world and suppose that didn't understand a principles.
So I need to implement some js module which should connect to MongoDB and exports a few methods for other modules: e.g. insert, update, delete.
when I write something like:
var db = MongoClient.connect(config.connectionString, {native_parser:true},function (err, db) {...});
exports.insert = function(a, b) {
// db using
//...
};
I suppose that "db" local static variable and will be initialized in any case. at the time of call "require('this module') " but seems it's not so, and db is uninitialized at the time of the call of exported functions? another question - I suppose this should be implemented using "futures" (class from c++, didn't find an analogue from js) to guaratee that db object is copmpletely constructed at the moment of the using??

So the problem I see is that you want to use DB but since DB is returned async, it may or may not be available in the exported function, hence you need to convert the connect from async to sync.
Since MongoDB driver cannot do sync, i suggest you use a wrapper, i suggest mongoskin.
https://github.com/kissjs/node-mongoskin
var mongo = require('mongoskin');
var db = mongo.db(config.connectionString, {native_parser:true});
Now this should work for you.

I had worked with C++, Java before (sometime back, not now) and now working in nodejs. I think I understood your question. Here are some key points.
Yes, Nodejs modules are somewhat like classes that they encapsulate the variables and you access only through public methods (exposed through exports). I think you are aware that there is no class implementation at all here, but it loosely maps to the behaviour.
The key difference in nodejs is the asynchronous nature of resource instantiation. By this, I mean if there are 2 statements stmt1 and stmt2, if stmt1 is called and takes time, nodejs does not wait for it to end (that is synchronous behaviour), instead it moves on to stmt2. In pre-nodejs world, we assume that reaching stmt2 means stmt1 is complete.
So, what is the workaround? How to ensure you do something after db connection is obtained. If your code is not making db calls immediately, you could assume that connection will be through. Or if you immediately want to invoke db, you write the code on a callback. Mongo exposes events called 'open' and 'error'. You can use this to ensure connection is open. Also it is best practise to track error event.
db.on('error', console.error.bind(console, 'connection error'));
db.once('open', function callback() {
console.log("Connection with database succeeded.");
// put your code
});
I am not aware of C++ future and so cannot comment on that.
Hope this helps !
[Updated] To add example
You could have db.js for setting up db connection and expose Mongoose object to create models.
'use strict';
var Mongoose = require('mongoose'),
Config = require('./config');
Mongoose.connect(Config.database.url);
var db = Mongoose.connection;
db.on('error', console.error.bind(console, 'connection error'));
db.once('open', function callback() {
console.log("Connection with database succeeded.");
});
exports.Mongoose = Mongoose;
exports.db = db;
You can include db.js in the server.js like
var DB = require('db.js');
which will do the initialisation.
You then use mongoose (mongoose is a Object relational mapper to work with mongo and highly recommended) to get models of database objects as shown below.
//userModel.js
var mongoose = require('mongoose'),
Schema = mongoose.Schema,
var UserSchema = new Schema({
uid : {type : Number, required: false}
, email : {type : String, get: toLower, set: toLower, required: true, index: { unique: true } }
, passwd : {type : String, required: false}
);
var user = mongoose.model('user', UserSchema);
module.exports = {
User : user
};
For more information on mongoose, you can refer http://mongoosejs.com
The db is generally not closed as I use in web environment and is always on. There is db connection pooling maintained and connections are reused optimally. I saw noticed a thread in SO which adds more details. Why is it recommended not to close a MongoDB connection anywhere in Node.js code?

Error : Still Getting Error after setting allowLargeResults to true in job configuration?

I have written bigquery in my code.
I want to fetch larger result.
I also set the propertie allowLargeResult:true , But Still i am getting the Error : Response too large to return. Consider setting allowLargeResults to true in your job configuration. For more information, see http
s://cloud.google.com/bigquery/troubleshooting-errors
Here is my Node.js Code :
var google = require('googleapis');
var bigquery = google.bigquery('v2');
var authClient = new google.auth.JWT(
'arjun-dev#mybank-bigquery.iam.gserviceaccount.com',
'keyA.pem',
null, ['https://www.googleapis.com/auth/bigquery']);
var request2 = {
projectId: 'project-bank',
jobId:'job_LHMRhoUfM038QA4jPZHaOESI3Uo',
startIndex: 0,
maxResults: 100000,
timeoutMs: 10000,
configuration:{
allowLargeResults: true
},
auth: authClient
};
var list1 = bigquery.jobs.getQueryResults(request2, function(err, result) {
if (err) {
console.log("### 2 " + err);
} else {
console.log(result);
res.send(result);
}
});
Please help !

Your result might be too large to return and you are missing the destination table as that is mandatory when you set allowLargeResults true, as you cannot return large result instead you need to write to a destination table. If you later want to get the data, you need to export the table to GCS and then download from GCS.
Returning large query results
Normally, queries have a maximum response size 128 MB compressed. If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration.
Queries that return large results take longer to execute, even if the result set is small, and are subject to additional limitations:
You must specify a destination table.
You can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
Window functions can return large query results only if used in conjunction with a PARTITION BY clause.

We Keep Coding

JavaScript is the programming language of the Web.

MongoDB executes queries sequentially instead of in parallel - javascript

Related

Mongoose-fuzzy-searching returns empty array unless query is empty

Why does creating a new tedious connection keep my program from finishing?

Is it safe to use a single Mongoose database from two files/processes?

nodejs modules code execution

Error : Still Getting Error after setting allowLargeResults to true in job configuration?

Categories

Resources