I am using MongoDB v3.2 and I'm using the native nodejs driver v2.1. When running the aggregation pipeline on large data sets(1mil+ documents), I am encountering the following error:
'aggregation result exceeds maximum document size (16MB)'
Here is my aggregation pipeline code:
var eventCollection = myMongoConnection.db.collection('events');
var cursor = eventCollection.aggregate([
{
$match: {
event_type_id: {$eq: 89012}
}
},
{
$group: {
_id: "$user_id",
score: {$sum: "$points"}
}
},
{
$sort: {
score: -1
}
}
],
{
cursor: {
batchSize: 500
},
allowDiskUse: true,
explain: false
}, function () {
});
Things I've tried:
//Using cursor event listeners. None of the on listeners seem to work. Always get error about 16mb.
cursor.on("data", function (data) {
console.log("Some data: ", data);
});
cursor.on("end", function (data) {
console.log("End of data: ", data);
});
//Using forEach. Which I thought would allow for >16mb because it's used in conjunction with the batchSize and cursor.
cursor.forEach(function (item) {
})
I've seen in other answers (How could I write aggregation without exceeds maximum document size?) that I need to have the results returned by a cursor, so how do I properly do that? I just can't seem to get it to work. Any suggestions on what the batchSize should be?
I am using the native mongodb package - https://github.com/mongodb/node-mongodb-native for a nodejs project not the mongo command line.
Ok I figured it out. It was not working because I was passing in a callback function as the last parameter in the aggregate method. By passing null, it allowed the stream to work as expected. Changes shown below:
var cursor = eventCollection.aggregate([
{
$match: {
event_type_id: {$eq: 89012}
}
},
{
$group: {
_id: "$user_id",
score: {$sum: "$points"}
}
},
{
$sort: {
score: -1
}
}
],
{
cursor: {
batchSize: 500
},
allowDiskUse: true,
explain: false
}, null);
Related
I'm formatting my parameters according to this https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Greengrass.html#createFunctionDefinition-property
But for whatever reason its giving me a key error when for "Execution" as well as DefaultConfig
Response:
Request ID:
"3ed83472-39af-493b-9df7-7f82d2f14636"
Function Logs:
r: Unexpected key \'Execution\' found in params.InitialVersion.Functions[0].FunctionConfiguration.Environment',
code: 'MultipleValidationErrors',
errors:
[ { UnexpectedParameter: Unexpected key 'DefaultConfig' found in params.InitialVersion
at ParamValidator.fail (/var/runtime/node_modules/aws-
and the code
GG.createFunctionDefinition({
InitialVersion: {
DefaultConfig: {
Execution: {
IsolationMode: "NoContainer"}
},
Functions: [
{
FunctionArn: "arn:aws:lambda:us-west-2:644226108543:function:SahmCumminsTelemetryTest:1",
FunctionConfiguration: {
MemorySize: 524288,
Pinned: true,
Timeout: 600,
Environment: {
AccessSysfs: false,
Execution: {
IsolationMode: "NoContainer",
RunAs: {
Gid: 0,
Uid: 0
}
}
}
},
Id: "function_definition",
},
],
},
Name: "function_definition",
}, function (err, data) {
if (err) {
console.log(err, err.stack);
}
else {
funcArn = data.LatestVersionArn;
};
I think that the issue is that the configuration data specifies two mutually exclusive options. There is a memory size value specified AND it says it should use "NoContainer". When the Greengrass container isn't in use memory size isn't a valid option.
Try removing memory size and see if that fixes it.
The provisioning code I've shared on Github "scrubs" functions when NoContainer is set to get around this issue. The scrubbing process is to set the memory size to NULL so when it is serialized to JSON the field is missing.
https://github.com/awslabs/aws-greengrass-provisioner/blob/e2608654b65682ca9b5b03da962cc8cb29ea1cbf/src/main/java/com/awslabs/aws/greengrass/provisioner/implementations/helpers/BasicGreengrassHelper.java#L390
I am working on meteor js, there will be huge data in mongodb database. For now it is around 50000 messages in database. I am providing the code that I am currently using. For some reason the application is taking too much time to render or load the data from database. Also one more thing, if I am doing any activity,e.g. just like the messages the app fetches the messages again from database.
Template.messages.helper({
linkMessages() {
var ids = _.pluck(Meteor.user().subscription, '_id');
var messages = Messages.find({ $or: [{ feedId: { $exists: false }, link: { $exists: true } }, { feedId: { $in: ids }, link: { $exists: true } }] }, { sort: { timestamp: 1 }, limit: Session.get("linkMessageLimit") }).fetch();
return messages;
}
})
calling publication in oncreate method
Template.roomView.onCreated(function() {
const self = this;
Deps.autorun(function() {
Meteor.subscribe('messages', Session.get('currentRoom'), Session.get('textMessageLimit'), {
onReady() {
isReady.messages = true;
if (scroll.needScroll) {
scroll.needScroll = false;
if (scroll.previousMessage) {
Client.scrollToMessageText(scroll.previousMessage);
}
} else {
Meteor.setTimeout(function() {
Client.scrollChatToBottomMsg();
}, 1000)
}
}
});
});
});`
The publication function on server:
Meteor.publish('messages', function(roomId, limit) {
check(roomId, String);
check(limit, Match.Integer);
let query = { $or: [{ link: {$exists: false} }, { feedId: { $exists: false } }] };
const thresholdMessage = Messages.findOne(query, { skip: limit, sort: { timestamp: 1 } });
if (thresholdMessage && thresholdMessage.timestamp) {
query.timestamp = { $gt: thresholdMessage.timestamp };
}
return Messages.find(query, { sort: { timestamp: -1 } });
});
It is not a good practice to allow mini-mongodb to get populated with such a huge data. Though Meteor JS is good at this too, still it will take some amount of time taking into consideration the network traffic, bandwidth etc.
Irrespective of whether it is unlimited scroll or simple pagination I would suggest you to use pagination. I have already got it accepted and it works like charm, here is the answer and entire code for pagination.
My pagination solution is server specific, so it performs good. Collection publish is limited to the limit provided from subscription.
NOTE: There is yet no such proper full fledged solution for table with search and pagination and much more facility which makes it very flexible as per our need. I suggest to create your own.
MORE INSIGHTS:
https://www.quora.com/Does-Meteor-work-well-with-large-datasets
https://forums.meteor.com/t/very-big-data-collection-in-mongodb-how-to-fetch-in-meteor-js/6571/7
https://projectricochet.com/blog/top-10-meteor-performance-problems
I am new to mongoDB, and was wondering if the following is possible.
I have two different collection in the same mongo db table - called jobs and nodesand this is that they look like:
function testing() {
nodes.find(function(err, data) {
if (err) {
console.log(err)
} else {
console.log('NODES RETURNED: ', data)
jobs.find(function(err, post) {
if (err) {
console.log(err)
} else {
console.log('JOBS RETURNED: ', post)
}
});
}
});
}
Which returns the following:
JOBS RETURNED: [ { _id: '5899999354d59',
job_url: 'http://222.22.22.22:2222/jobs',
progress: 0,
queue: 0 },
{ _id: '5899b7d054da96',
job_url: 'http://111.11.1.111:1111/jobs',
progress: 0,
queue: 0 } ]
CLUSTER NODESS RETURNED: [ { _id: '58a9a4805c1f',
node_url: 'http://222.22.22.22:2222/nodes',
cpu: 40 },
{ _id: '58999a9a4805c23',
node_url: 'http://111.11.1.111:1111/nodes',
average_cpu: 15 } ]
So as you can see, the two different collections both have two documents each, and they can relate by the job_url and the node_url e.g. 222.22.22.22:2222 Is it possible for me to join the documents together based on this, so that the final result is something like this:
[ { _id: '58a9a4805c1f',
node_url: 'http://222.22.22.22:2222/nodes',
job_url: 'http://222.22.22.22:2222/jobs',
progress: 0,
queue: 0 },
cpu: 40 },
{ _id: '58999a9a4805c23',
node_url: 'http://111.11.1.111:1111/nodes',
job_url: 'http://111.11.1.111:1111/jobs',
progress: 0,
queue: 0,
average_cpu: 15 }
Any help / tips would be really appreciated!
There is no join in mongo because mongo is a nosql database, read about it here https://en.wikipedia.org/wiki/NoSQL
In brief, nosql databases are used when fast retrieval and high availability of data is needed (join is slow so it's out of the game here), instead of that, when you get into situation like yours then you should think of remodeling your schema as explained well here https://docs.mongodb.com/manual/core/data-modeling-introduction
Ofcource you can do the join manually by yourself but then you miss the point of mongo
I am trying to build an app that has a many to many relationship in Meteor. There will be jobs, clients and users collections. Clients can have multiple jobs, and most importantly multiple users can work on the same job.
I have the jobs collection set up as follows in my fixtures file:
Jobs.insert({
jobNum: 'Somejob',
clientId: 'XXXXXXXX',
clientName: 'Some Client',
rate: XX,
userNames: [
{userId: user1._id},
{userId: user2._id}
],
active: true
});
I am publishing according to the readme for publish-composite, but I cannot get the users to publish to the client. Here is the publication code:
Meteor.publishComposite('jobsActive', {
find: function() {
// Find all active jobs any client
return Jobs.find({active: true});
},
children: [
{
find: function (job) {
// Return a client associated with the job
return Clients.find({_id: job.clientId});
}
},
{
find: function (job) {
// Return all users associated with the job
// This is where the problem is
return Meteor.users.find({_id: job.userNames.userId});
}
}
]
});
I can't figure out how to correctly find over an array. I tried a number of things and nothing worked. Is this possible? Or do I need to go about this in another way? I've thought about referencing jobs in the users collection, but there will be far more jobs than users, so it seems to make more sense like this.
BTW, I did subscribe to 'jobsActive' as well. The other two collections are coming over to the client side fine; I just can't get the users collection to publish.
Thanks for any help and ideas.
job.userNames.userId doesn't exist in your collection. job.userNames is an array of objects which have the key userId.
Try something like _.map( job.userNames, function( users ){ return users.userId } ).
Your code will be:
Meteor.publishComposite('jobsActive', {
find: function() {
return Jobs.find({active: true});
},
children: [
{
find: function (job) {
return Clients.find({_id: job.clientId});
}
},
{
find: function (job) {
return Meteor.users.find({ _id: { $in: _.map( job.userNames, function( users ) { return users.userId } ) } });
}
}
]
});
I think you don't need publish-composite at all, try this code snippet. It works for me!
Meteor.publish('jobsActive', function () {
return Events.find(
{
$or: [
// { public: { $eq: true } },
{ active: true },
{ userNames: this.userId}
],
},
{
sort: {createdAt: -1}
}
);
});
I would like to update a collection. Docs seem unclear on this.
I am wondering how to achieve the following:
Order.find({ _id: { $in: ids }}).exec(function(err, items, count) {
// Following gives error - same with save()
items.update({ status: 'processed'}, function(err, docs) {
});
});
I know how to batch save like this:
Model.update({ _id: id }, { $set: { size: 'large' }}, { multi: true }, callback);
But that requires setting my query again.
I've also tried:
Order.collection.update(items...
But that throws a max call stack error.
In mongoose, model.find(callback), return an Array of Document via callback. You can call save on a Document but not on an Array. So you can use for loop or forEach on the Array.
Order
.find({ _id: { $in: ids}})
.exec(function(err, items, count) {
items.forEach(function (it) {
it.save(function () {
console.log('you have saved ', it)
});
})
});