pre-allocation of records using count - javascript

I've read that pre-allocation of a record can improve the performance, which should be beneficial especially when handling many records of a time series dataset.
updateRefLog = function(_ref,year,month,day){
var id = _ref,"|"+year+"|"+month;
db.collection('ref_history').count({"_id":id},function(err,count){
// pre-allocate if needed
if(count < 1){
db.collection('ref_history').insert({
"_id":id
,"dates":[{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0},{"count":0}]
});
}
// update
var update={"$inc":inc['dates.'+day+'.count'] = 1;};
db.collection('ref_history').update({"_id":id},update,{upsert: true},
function(err, res){
if(err !== null){
//handle error
}
}
);
});
};
I'm a little concerned that having to go through a promise might slow this down, and possibly checking for count every time would negate the performance benefit of pre allocating a record.
Is there a more performant way to handle this?

The general statement of "pre-allocation" is about the potential cost of an "update" operation that causes the document to "grow". If that results in a document size that is greater than the currently allocated space, then the document would be "moved" to another location on disk to accomodate the new space. This can be costly, and hence the general recommendation to intially write the document befitting to it's eventual "size".
Honestly the best way to handle such an operation would be to do an "upsert" initially with all the array elements allocated, and then only update the requried element in position. This would reduce to "two" potential writes, and you can further reduce to a single "over the wire" operation using Bulk API methods:
var id = _ref,"|"+year+"|"+month;
var bulk = db.collection('ref_history').initializeOrderedBulkOp();
bulk.find({ "_id": id }).upsert().updateOne({
"$setOnInsert": {
"dates": Array.apply(null,Array(32)).map(function(el) { return { "count": 0 }})
}
});
var update={"$inc":inc['dates.'+day+'.count'] = 1;};
bulk.find({ "_id": id }).updateOne(update);
bulk.execute(function(err,results) {
// results would show what was modified or not
});
Or since newer drivers are favouring consistency with one another, the "Bulk" parts have been relegated to regular arrays of WriteOperations instead:
var update={"$inc":inc['dates.'+day+'.count'] = 1;};
db.collection('ref_history').bulkWrite([
{ "updateOne": {
"filter": { "_id": id },
"update": {
"$setOnInsert": {
"dates": Array.apply(null,Array(32)).map(function(el) {
return { "count": 0 }
})
}
},
"upsert": true
}},
{ "updateOne": {
"filter": { "_id": id },
"update": update
}}
],function(err,result) {
// same thing as above really
});
In either case the $setOnInsert as the sole block will only do anything if an "upsert" actually occurs. The main case being that the only contact with the server will be a single request and response, as opposed to "back and forth" operations waiting on network communication.
This is typically what "Bulk" operations are for. They reduce that network overhead when you might as well send a batch of requests to the server. The result significantly speeds things, and neither operation is really dependant on the other with the exception of the exception of "ordered", which is the default in the latter case, and explicitly set by the legacy .initializeOrderedBulkOp().
Yes there is a "little" overhead in the "upsert", but there is "less" than in testing with .count() and waiting for that result first.
N.B Not sure about the 32 array entries in your listing. You possibly meant 24 but copy/paste got the better of you. At any rate there are better ways to do that than hardcoding, as is demonstrated.

Related

Node/Apollo/Sequelize onboxiously slow (>7 seconds)

I'm no expert in these things (I'm used to Laravel), but running one query in Apollo Server is taking ~7.2 seconds, for maybe 300 items total.
The entire resolver is below - as you can see there's essentially no logic aside from running a query. It's just humongously slow.
getMenu: async (parent, {
slug
}, { models, me }) => {
const user = await models.User.findByPk(me.id)
const account = await models.Account.findByPk(user.accountId, {
include: [{
model: models.Venue,
as: 'venues',
include: getVenueIncludes(models)
}],
minifyAliases: true
})
return account.venues.find(venue => venue.slug === slug): null
},
I realise this is rather vague, but does anyone happen to know where I'd look to try and improve this? I understand they're different, but in a Laravel app I can load 10 times that amount (with more nesting) in under a second...
Aha!!
separate: true on your hasMany relationships. Good grief, cut request times from 7.2 seconds to 500ms.
Amazing.

Concurrent beforeSave calls allowing duplicates

In an effort to prevent certain objects from being created, I set a conditional in that type of object's beforeSave cloud function.
However, when two objects are created simultaneously, the conditional does not work accordingly.
Here is my code:
Parse.Cloud.beforeSave("Entry", function(request, response) {
var theContest = request.object.get("contest");
theContest.fetch().then(function(contest){
if (contest.get("isFilled") == true) {
response.error('This contest is full.');
} else {
response.success();
});
});
Basically, I don't want an Entry object to be created if a Contest is full. However, if there is 1 spot in the Contest remaining and two entries are saved simultaneously, they both get added.
I know it is an edge-case, but a legitimate concern.
Parse is using Mongodb which is a NoSQL database designed to be very scalable and therefore provides limited synchronisation features. What you really need here is mutual exclusion which is unfortunately not supported on a Boolean field. However Parse provides atomicity for counters and array fields which you can use to enforce some control.
See http://blog.parse.com/announcements/new-atomic-operations-for-arrays/
and https://parse.com/docs/js/guide#objects-updating-objects
Solved this by using increment and then doing the check in the save callback (instead of fetching the object and checking a Boolean on it).
Looks something like this:
Parse.Cloud.beforeSave("Entry", function(request, response) {
var theContest = request.object.get("contest");
theContest.increment("entries");
theContest.save().then(function(contest) {
if (contest.get("entries") > contest.get("maxEntries")) {
response.error('The contest is full.');
} else {
response.success();
}
});
}

Query changing strings to ints in mongoDB takes a long time

I run the following query on my whole data set(approx. 3 million documents) in mongoDB to change user IDs that are strings into ints. This query does not seem to complete:
var cursor = db.play_sessions.find()
while (cursor.hasNext()) {
var play = cursor.next();
db.play_sessions.update({_id : play._id}, {$set : {user_id : new NumberInt(play.user_id) }});
}
I run this query on the same data set and it returns relatively quickly:
db.play_sessions.find().forEach(function(play){
if (play.score && play.user_id && play.user_attempt_no && play.game_id && play.level && play.training_session_id) {
print(play.score,",",play.user_id,",",play.user_attempt_no,",",play.game_id,",",play.level,",",parseInt(play.training_session_id).toFixed());
} else if (play.score && play.user_id && play.user_attempt_no && play.game_id && play.level) {
print(play.score,",",play.user_id,",",play.user_attempt_no,",",play.game_id,",",play.level);
};
});
I understand I am writing to the database in the first query but why does the first query never seem to return, while the second does so relatively quickly? Is there something wrong with the code in the first query?
Three million documents is quite a lot of documents so the whole operation is going to take a while. But the main thing here to consider is that you are asking to both "send" data to the database and "receive" a acknowledged write response ( because that is what happens ) three million times. That alone is a lot more waiting in between operations than simply iterating a cursor.
Another reason here is that it is very likely that you are running MongoDB 2.6 or a greater revision. There is a core difference between earlier versions and versions upward to how this code is processed in the shell. The core of this is the Bulk Operations API which contains methods that are actually used by all the shell helpers for all interaction with the database.
In prior versions, in such a "loop" operation the "write concern" acknowledgement was not done in this context for each iteration. The way it is done now ( since the helpers actually use the Bulk API ) the acknowledgement is returned for every iteration. This slows things down a lot. Unless of course you use the Bulk operations directly.
So to "re-cast" your values in modern versions, do this instead:
var bulk = db.play_sessions.initializeUnorderedBulkOp();
var count = 0;
db.play_sessions.find({ "user_id": { "$type": 2 } }).forEach(function(doc) {
bulk.find({ "_id": doc._id }).updateOne({
"$set": { "user_id": NumberInt(doc.user_id) }
});
count++;
if ( count % 10000 == 0 ) {
bulk.execute();
bulk = db.play_sessions.initializeUnorderedBulkOp();
}
});
if ( count % 10000 != 0 )
bulk.execute();
Bulk operations send all of their "batch" in a single request. In actual fact the underlying driver breaks this up into individual batch requests of 1000 items, but 10000 is a reasonable number without taking up too much memory in most cases.
The other optimization here is that the only items selected by the query are those that are presently a "string" by the $type operator to identify this. This could possibly speed things up if some of the data is already converted.
If indeed you have an earlier version of MongoDB and you are running this conversion on a collection that is not on a sharded cluster, then your other option is to use db.eval().
Take care to actually read the content on that link though. This is not a good idea, and you should never use this in production and only as a last resort for one off conversion. The code is submitted as JavaScript and actually run on the server. As a result a high level of locking can and will occur while running. You have been warned:
db.eval(function() {
db.play_sessions.find({ "user_id": { "$type": 2 } }).forEach(function(doc) {
db.play_sessions.update(
{ "_id": doc._id },
{ "$set": { "user_id": NumberInt( doc.user_id ) } }
);
});
});
Use with caution and prefer the "batch" processing or even the basic loop on a machine as close as possible in network terms to the actual database server. Preferably on the server.
Also where version permits and you still deem the eval case necessary, try to use the Bulk operations methods anyway, as that is the greatly optimized approach.

Meteor Leaderboard example: resetting the scores

I've been trying to do Meteor's leaderboard example, and I'm stuck at the second exercise, resetting the scores. So far, the furthest I've got is this:
// On server startup, create some players if the database is empty.
if (Meteor.isServer) {
Meteor.startup(function () {
if (Players.find().count() === 0) {
var names = ["Ada Lovelace",
"Grace Hopper",
"Marie Curie",
"Carl Friedrich Gauss",
"Nikola Tesla",
"Claude Shannon"];
for (var i = 0; i < names.length; i++)
Players.insert({name: names[i]}, {score: Math.floor(Random.fraction()*10)*5});
}
});
Meteor.methods({
whymanwhy: function(){
Players.update({},{score: Math.floor(Random.fraction()*10)*5});
},
}
)};
And then to use the whymanwhy method I have a section like this in if(Meteor.isClient)
Template.leaderboard.events({
'click input#resetscore': function(){Meteor.call("whymanwhy"); }
});
The problem with this is that {} is supposed to select all the documents in MongoDB collection, but instead it creates a new blank scientist with a random score. Why? {} is supposed to select everything. I tried "_id" : { $exists : true }, but it's a kludge, I think. Plus it behaved the same as {}.
Is there a more elegant way to do this? The meteor webpage says:
Make a button that resets everyone's score to a random number. (There
is already code to do this in the server startup code. Can you factor
some of this code out and have it run on both the client and the
server?)
Well, to run this on the client first, instead of using a method to the server and having the results pushed back to the client, I would need to explicitly specify the _ids of each document in the collection, otherwise I will run into the "Error: Not permitted. Untrusted code may only update documents by ID. [403]". But how can I get that? Or should I just make it easy and use collection.allow()? Or is that the only way?
I think you are missing two things:
you need to pass the option, {multi: true}, to update or it will only ever change one record.
if you only want to change some fields of a document you need to use $set. Otherwise update assumes you are providing the complete new document you want and replaces the original.
So I think the correct function is:
Players.update({},{$set: {score: Math.floor(Random.fraction()*10)*5}}, {multi:true});
The documentation on this is pretty thorough.

Subscribe to a count of an existing collection

I need to keep track of a counter of a collection with a huge number of documents that's constantly being updated. (Think a giant list of logs). What I don't want to do is to have the server send me a list of 250k documents. I just want to see a counter rising.
I found a very similar question here, and I've also looked into the .observeChanges() in the docs but once again, it seems that .observe() as well as .observeChanges() actually return the whole set before tracking what's been added, changed or deleted.
In the above example, the "added" function will fire once per every document returned to increment a counter.
This is unacceptable with a large set - I only want to keep track of a change in the count as I understand .count() bypasses the fetching of the entire set of documents. The former example involves counting only documents related to a room, which isn't something I want (or was able to reproduce and get working, for that matter)
I've gotta be missing something simple, I've been stumped for hours.
Would really appreciate any feedback.
You could accomplish this with the meteor-streams smart package by Arunoda. It lets you do pub/sub without needing the database, so one thing you could send over is a reactive number, for instance.
Alternatively, and this is slightly more hacky but useful if you've got a number of things you need to count or something similar, you could have a separate "Statistics" collection (name it whatever) with a document containing that count.
There is an example in the documentation about this use case. I've modified it to your particular question:
// server: publish the current size of a collection
Meteor.publish("nbLogs", function () {
var self = this;
var count = 0;
var initializing = true;
var handle = Messages.find({}).observeChanges({
added: function (id) {
count++;
if (!initializing)
self.changed("counts", roomId, {nbLogs: count});
},
removed: function (id) {
count--;
self.changed("counts", roomId, {nbLogs: count});
}
// don't care about moved or changed
});
// Observe only returns after the initial added callbacks have
// run. Now return an initial value and mark the subscription
// as ready.
initializing = false;
self.added("counts", roomId, {nbLogs: count});
self.ready();
// Stop observing the cursor when client unsubs.
// Stopping a subscription automatically takes
// care of sending the client any removed messages.
self.onStop(function () {
handle.stop();
});
});
// client: declare collection to hold count object
Counts = new Meteor.Collection("counts");
// client: subscribe to the count for the current room
Meteor.subscribe("nbLogs");
// client: use the new collection
Deps.autorun(function() {
console.log("nbLogs: " + Counts.findOne().nbLogs);
});
There might be some higher level ways to do this in the future.

Categories