Issue Query for Each Element of an Array - javascript
I a currently querying my mondo db for an array of urls in one collection which returns an array. I then want to use that array to go through another collection and find the matching elements for each element in the previous query's returned array. Is it proper to use forEach on the array and do individual queries?
My code looks like such, the first function getUrls works great. The current error I get is:
(node:10754) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): TypeError: Cannot read property 'limit' of undefined
(node:10754) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
async function useUrls () {
let domains = await getUrls()
let db = await mongo.connect("mongodb://35.185.206.31:80/lc_data")
let results = []
domains.forEach( domain =>{
let query = {"$match":
{"email_domain": domain}
}
let cursor = db.collection('circleback')
.aggregate([query], (err, data) =>{
if(err)
throw err;
console.log("cb", data)
}).limit(1100)
})
As noted, the code in the question has a few problems, most of which can be addressed by looking at the full sample listing as supplied at the end of this response. What you are essentially asking for here is a variation on the "Top-N results" problem, for which there are a couple of ways to "practically" handle this.
So somewhat ranking from "worst" to "best":
Aggregation $slice
So rather than "loop" your results of your function, you can alternately supply all the results to a query using $in. That alleviates the need to "loop inputs", but the other thing needed here is the "top-N per output".
There really is not a "stable" mechanism in MongoDB for this as yet, but "if" it is plausible on the size of given collections then you can in fact simply $group on your "distinct" keys matching the provided $in arguments, and then $push all documents into an array and $slice the results:
let results = await db.collection('circleback').aggregate([
{ "$match": { "email_domain": { "$in": domains } } },
{ "$group": {
"_id": "$email_domain",
"docs": { "$push": "$$ROOT" }
}},
{ "$sort": { "_id": 1 } },
{ "$addFields": { "docs": { "$slice": [ "$docs", 0, 1100 ] } } }
]).toArray();
The "wider" issue here is that MongoDB has no way of "limiting" the array content on the initial $push. And this in fact is awaiting a long outstanding issue. SERVER-9377.
So whilst we can do this sort of operation "in theory", it often is not practical at all since the 16MB BSON Limit often restricts that "initial" array size, even if the $slice result would indeed stay under that cap.
Serial Loop Execution async/await
Your code shows you are running under this environment, so I suggest you actually use it. Simply await on each loop iteration from the source:
let results = [];
for ( let domain of domains ) {
results = results.concat(
await db.collection('circleback').find({ "email_domain": domain })
.limit(1100).toArray()
);
}
Simply functions allow you to do this, such as returning the standard cursor result of .find() as an array via .toArray() and then using .concat() to join with previous arrays of results.
It's simple and effective, but we can probably do a little better
Concurrent Execution of Async Methods
So instead of using a "loop" and await on each called async function, you can instead execute them all ( or at least "most" ) concurrently instead. This is in fact part of the problem you presently have as presented in the question, because nothing actually "waits" for the loop iteration.
We could use Promise.all() to effectively do this, however if it is actually a "very large" number of promises that would be running concurrently, this would run into the same problem as experienced, where the call stack is exceeded.
To avoid this, yet still have the benefits we can use Bluebird promises with Promise.map(). This has a "concurrent limiter" option, that allows only a specified number of operations to act simultaneously:
let results = [].concat.apply([],
await Promise.map(domains, domain =>
db.collection('circleback').find({ "email_domain": domain })
.limit(1100).toArray()
,{ concurrency: 10 })
);
In fact you should even be able to use a library such as Bluebird promises to "plugin" the .map() functionality to anything else that returns a Promise, such as your "source" function returning the list of "domains". Then you could "chain" just as is shown in the later examples.
Future MongoDB
Future releases of MongoDB ( from MongoDB 3.6 ) actually have a new "Non-Correlated" form of $lookup that allows a special case here. So going back to the original aggregation example, we can get the "distinct" values for each matching key, and then $lookup with a "pipeline" argument which would then allow a $limit to be applied on results.
let results = await db.collection('circleback').aggregate([
{ "$match": { "email_domain": { "$in": domains } } },
{ "$group": { "_id": "$email_domain" }},
{ "$sort": { "_id": 1 } },
{ "$lookup": {
"from": "circleback",
"let": {
"domain": "$_id"
},
"pipeline": [
{ "$redact": {
"$cond": {
"if": { "$eq": [ "$email_domain", "$$domain" ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$limit": 1100 }
],
"as": "docs"
}}
]).toArray();
This would then always stay under the 16MB BSON limit, presuming of course that the argument to $in allowed that to be the case.
Example Listing
As a full Example Listing you can run, and generally play with as the default data set creation is intentionally quite large. It demonstrates all techniques described above as well as some general usage patterns to follow.
const mongodb = require('mongodb'),
Promise = require('bluebird'),
MongoClient = mongodb.MongoClient,
Logger = mongodb.Logger;
const uri = 'mongodb://localhost/bigpara';
function log(data) {
console.log(JSON.stringify(data,undefined,2))
}
(async function() {
let db;
try {
db = await MongoClient.connect(uri,{ promiseLibrary: Promise });
Logger.setLevel('info');
let source = db.collection('source');
let data = db.collection('data');
// Clean collections
await Promise.all(
[source,data].map( coll => coll.remove({}) )
);
// Create some data to work with
await source.insertMany(
Array.apply([],Array(500)).map((e,i) => ({ item: i+1 }))
);
let ops = [];
for (let i=1; i <= 10000; i++) {
ops.push({
item: Math.floor(Math.random() * 500) + 1,
index: i,
amount: Math.floor(Math.random() * (200 - 100 + 1)) + 100
});
if ( i % 1000 === 0 ) {
await data.insertMany(ops,{ ordered: false });
ops = [];
}
}
/* Fetch 5 and 5 example
*
* Note that the async method to supply to $in is a simulation
* of any real source that is returning an array
*
* Not the best since it means ALL documents go into the array
* for the selection. Then you $slice off only what you need.
*/
console.log('\nAggregate $in Example');
await (async function(source,data) {
let results = await data.aggregate([
{ "$match": {
"item": {
"$in": (await source.find().limit(5).toArray()).map(d => d.item)
}
}},
{ "$group": {
"_id": "$item",
"docs": { "$push": "$$ROOT" }
}},
{ "$addFields": {
"docs": { "$slice": [ "$docs", 0, 5 ] }
}},
{ "$sort": { "_id": 1 } }
]).toArray();
log(results);
})(source,data);
/*
* Fetch 10 by 2 example
*
* Much better usage of concurrent processes and only get's
* what is needed. But it is actually 1 request per item
*/
console.log('\nPromise.map concurrency example');
await (async function(source,data) {
let results = [].concat.apply([],
await source.find().limit(10).toArray().map(d =>
data.find({ item: d.item }).limit(2).toArray()
,{ concurrency: 5 })
);
log(results);
})(source,data);
/*
* Plain loop async/await serial example
*
* Still one request per item, requests are serial
* and therefore take longer to complete than concurrent
*/
console.log('\nasync/await serial loop');
await (async function(source,data) {
let items = (await source.find().limit(10).toArray());
let results = [];
for ( item of items ) {
results = results.concat(
await data.find({ item: item.item }).limit(2).toArray()
);
}
log(results);
})(source,data);
/*
* Non-Correlated $lookup example
*
* Uses aggregate to get the "distinct" matching results and then does
* a $lookup operation to retrive the matching documents to the
* specified $limit
*
* Typically not as efficient as the concurrent example, but does
* actually run completely on the server, and does not require
* additional connections.
*
*/
let version = (await db.db('admin').command({'buildinfo': 1})).version;
if ( version >= "3.5" ) {
console.log('\nNon-Correlated $lookup example $limit')
await (async function(source,data) {
let items = (await source.find().limit(5).toArray()).map(d => d.item);
let results = await data.aggregate([
{ "$match": { "item": { "$in": items } } },
{ "$group": { "_id": "$item" } },
{ "$sort": { "_id": 1 } },
{ "$lookup": {
"from": "data",
"let": {
"itemId": "$_id",
},
"pipeline": [
{ "$redact": {
"$cond": {
"if": { "$eq": [ "$item", "$$itemId" ] },
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$limit": 5 }
],
"as": "docs"
}}
]).toArray();
log(results);
})(source,data);
} else {
console.log('\nSkipped Non-Correlated $lookup demo');
}
} catch(e) {
console.error(e);
} finally {
db.close();
}
})();
Related
Trying to write a recursive asynchronous search in JavaScript
I am trying to write some code that searches through a bunch of objects in a MongoDB database. I want to pull the objects from the database by ID, then those objects have ID references. The program should be searching for a specific ID through this process, first getting object from id, then ids from the object. async function objectFinder(ID1, ID2, depth, previousList = []) { let route = [] if (ID1 == ID2) { return [ID2] } else { previousList.push(ID1) let obj1 = await findObjectByID(ID1) let connectedID = obj1.connections.concat(obj1.inclusions) //creates array of both references to object and references from object let mapPromises = connectedID.map(async (id) => { return findID(id) //async function }) let fulfilled = await Promise.allSettled(mapPromises) let list = fulfilled.map((object) => { return object.value.main, object.value.included }) list = list.filter(id => !previousList.includes(id)) for (id of list) { await objectFinder(id, ID2, depth - 1, previousList).then(result => { route = [ID1].concat(result) if (route[route.length - 1] == ID2) { return route }}) } } if (route[route.length - 1] == ID2) { return route } } I am not sure how to make it so that my code works like a tree search, with each object and ID being a node.
I didn't look too much into your code as I strongly believe in letting your database do the work for you if possible. In this case Mongo has the $graphLookup aggregation stage, which allows recursive lookups. here is a quick example on how to use it: db.collection.aggregate([ { $match: { _id: 1, } }, { "$graphLookup": { "from": "collection", "startWith": "$inclusions", "connectFromField": "inclusions", "connectToField": "_id", "as": "matches", } }, { //the rest of the pipeline is just to restore the original structure you don't need this $addFields: { matches: { "$concatArrays": [ [ { _id: "$_id", inclusions: "$inclusions" } ], "$matches" ] } } }, { $unwind: "$matches" }, { "$replaceRoot": { "newRoot": "$matches" } } ]) Mongo Playground If for whatever reason you want to keep this in code then I would take a look at your for loop: for (id of list) { await objectFinder(id, ID2, depth - 1, previousList).then(result => { route = [ID1].concat(result); if (route[route.length - 1] == ID2) { return route; } }); } Just from a quick glance I can tell you're executing this: route = [ID1].concat(result); Many times at the same level. Additional I could not understand your bottom return statements, I feel like there might be an issue there.
MongoDB - Most Recent Item for Each ID in List
I have a list of content IDs and I’m trying to fetch the most recent comment (if one exists) for each of the content IDs in the list - My query looks as follows: const query = [ { $match: { content_id: { $in: myContentIds }, }, }, { $sort: { ‘comment_created’: -1 } }, ] const results = await collection.find(query).toArray(); My understanding is this will fetch all of the comments related to the contentIds in the myContentIds array and sort them in descending order based on the date. I could then limit my results using { $limit: 1} but this would return the most recent comment on any of the content items, rather than the most recent comment for each content. How can I modify my query to return the most recent comment for each of my content items?
$group by content_id and get first recent document $replaceRoot to replace that recent document in root (this is optional, you can use document by object recentComment) const query = [ { $match: { content_id: { $in: myContentIds } } }, { $sort: { comment_created: -1 } }, { $group: { _id: "$content_id", recentComment: { $first: "$$ROOT" } } }, { $replaceRoot: { newRoot: "$recentComment" } } ]; const results = await collection.aggregate(query); Playground
MongoDB Node.JS driver: create, append, and update documents with arrays in one go
Is this really the only way to create, append, and update documents with arrays in a database or could this be done in only ONE beautiful operation ? I am of course looking to optimise this bit of code into how it is supposed to be implemented in MongoDB operators, it would reduce the operations done to the database... What do you think? const { MongoClient } = require('mongodb'); async function main() { try { // Make the appropriate DB calls // Create a new document await createDocument(client); // Append new items to the items array await appendNewItemsToArray(client); // Update items in the items array await updateItemsInArray(client); } finally { // Close the connection to the MongoDB cluster await client.close(); } } main().catch(console.error); async function createDocument(client) { const result = await client.db("NameOfYourDb").collection("NameOfYourCollection").insertOne({ "_id": "UniqueId1", "items": [ { "item_name": "my_item_one", "first_seen": 1000, "last_seen": 1000, "logic": true } ] }); console.log(`New document created with the following id: ${result.insertedId}`); } async function appendNewItemsToArray(client) { const result = await client.db("NameOfYourDb").collection("NameOfYourCollection").updateOne( { "_id": "UniqueId1" }, { $push: { items: { $each: [ { "item_name": "my_item_two", "first_seen": 2000, "last_seen": 2000, "logic": true }, { "item_name": "my_item_three", "first_seen": 3000, "last_seen": 3000, "logic": true }] } } }); console.log(`${result.matchedCount} document(s) matched the query criteria.`); console.log(`${result.modifiedCount} document(s) was/were updated.`); } async function updateItemsInArray(client) { const result = await client.db("NameOfYourDb").collection("NameOfYourCollection").updateOne( { "_id": "UniqueId1", "items.item_name": "my_item_one" }, { $set: { "items.$.logic": false, "items.$.last_seen": 4000 } } ); console.log(`${result.matchedCount} document(s) matched the query criteria.`); console.log(`${result.modifiedCount} document(s) was/were updated.`); } I have found this original piece of code here bellow: MongoDB Node.JS driver: create, append, and update documents with arrays Of course some modifications are required.
MongoDB select random 5 documents via find query
I need to find random 5 documents from mongoDB by using find function. I using LoopBack 4 framework. I already try to use sample (it is in comment) const userParties: IndividualParty[] = (await find( this.logger, { where: { and: [ { _id: { nin: ids.map(id => id) } }, { gender: { inq: gender } }, ], }, //sample: { size: 5 }, //limit: 5, } as Filter<IndividualParty>, this.partyRepository, )) as IndividualParty[];
I'm not familiar with Loopback, but using pure node and node MongoDB driver, here's the shortest example I can come up with: var run = async function() { const conn = await require('mongodb').MongoClient.connect('mongodb://localhost:27017', {useNewUrlParser: true}) let agg = [ {'$match': {'_id': {'$gte': 50}}}, {'$sample': {'size': 5}} ] let res = await conn.db('test').collection('test').aggregate(agg).toArray() console.log(res) await conn.close() }() In a collection containing _id from 0 to 99, this will randomly output 5 documents having _id larger than 50. Example output: [ { _id: 60 }, { _id: 77 }, { _id: 84 }, { _id: 96 }, { _id: 63 } ] You would need to make the above example work with Loopback, but the basic idea is there. Note: You need aggregation instead of find(). Have a read through the $sample documentation and note especially its behavior: $sample uses one of two methods to obtain N random documents, depending on the size of the collection, the size of N, and $sample’s position in the pipeline. The position of $sample in the pipeline is important. If you need to select a subset of the collection to do $sample on via a $match stage (as with the example above), then you will need to ensure that the subset to be sampled is within 16 MB (the limit of MongoDB in-memory sort).
How count pages total number of pages? [duplicate]
I am interested in optimizing a "pagination" solution I'm working on with MongoDB. My problem is straight forward. I usually limit the number of documents returned using the limit() functionality. This forces me to issue a redundant query without the limit() function in order for me to also capture the total number of documents in the query so I can pass to that to the client letting them know they'll have to issue an additional request(s) to retrieve the rest of the documents. Is there a way to condense this into 1 query? Get the total number of documents but at the same time only retrieve a subset using limit()? Is there a different way to think about this problem than I am approaching it?
Mongodb 3.4 has introduced $facet aggregation which processes multiple aggregation pipelines within a single stage on the same set of input documents. Using $facet and $group you can find documents with $limit and can get total count. You can use below aggregation in mongodb 3.4 db.collection.aggregate([ { "$facet": { "totalData": [ { "$match": { }}, { "$skip": 10 }, { "$limit": 10 } ], "totalCount": [ { "$group": { "_id": null, "count": { "$sum": 1 } }} ] }} ]) Even you can use $count aggregation which has been introduced in mongodb 3.6. You can use below aggregation in mongodb 3.6 db.collection.aggregate([ { "$facet": { "totalData": [ { "$match": { }}, { "$skip": 10 }, { "$limit": 10 } ], "totalCount": [ { "$count": "count" } ] }} ])
No, there is no other way. Two queries - one for count - one with limit. Or you have to use a different database. Apache Solr for instance works like you want. Every query there is limited and returns totalCount.
MongoDB allows you to use cursor.count() even when you pass limit() or skip(). Lets say you have a db.collection with 10 items. You can do: async function getQuery() { let query = await db.collection.find({}).skip(5).limit(5); // returns last 5 items in db let countTotal = await query.count() // returns 10-- will not take `skip` or `limit` into consideration let countWithConstraints = await query.count(true) // returns 5 -- will take into consideration `skip` and `limit` return { query, countTotal } }
Here's how to do this with MongoDB 3.4+ (with Mongoose) using $facets. This examples returns a $count based on the documents after they have been matched. const facetedPipeline = [{ "$match": { "dateCreated": { $gte: new Date('2021-01-01') } }, "$project": { 'exclude.some.field': 0 }, }, { "$facet": { "data": [ { "$skip": 10 }, { "$limit": 10 } ], "pagination": [ { "$count": "total" } ] } } ]; const results = await Model.aggregate(facetedPipeline); This pattern is useful for getting pagination information to return from a REST API. Reference: MongoDB $facet
Times have changed, and I believe you can achieve what the OP is asking by using aggregation with $sort, $group and $project. For my system, I needed to also grab some user info from my users collection. Hopefully this can answer any questions around that as well. Below is an aggregation pipe. The last three objects (sort, group and project) are what handle getting the total count, then providing pagination capabilities. db.posts.aggregate([ { $match: { public: true }, { $lookup: { from: 'users', localField: 'userId', foreignField: 'userId', as: 'userInfo' } }, { $project: { postId: 1, title: 1, description: 1 updated: 1, userInfo: { $let: { vars: { firstUser: { $arrayElemAt: ['$userInfo', 0] } }, in: { username: '$$firstUser.username' } } } } }, { $sort: { updated: -1 } }, { $group: { _id: null, postCount: { $sum: 1 }, posts: { $push: '$$ROOT' } } }, { $project: { _id: 0, postCount: 1, posts: { $slice: [ '$posts', currentPage ? (currentPage - 1) * RESULTS_PER_PAGE : 0, RESULTS_PER_PAGE ] } } } ])
there is a way in Mongodb 3.4: $facet you can do db.collection.aggregate([ { $facet: { data: [{ $match: {} }], total: { $count: 'total' } } } ]) then you will be able to run two aggregate at the same time
By default, the count() method ignores the effects of the cursor.skip() and cursor.limit() (MongoDB docs) As the count method excludes the effects of limit and skip, you can use cursor.count() to get the total count const cursor = await database.collection(collectionName).find(query).skip(offset).limit(limit) return { data: await cursor.toArray(), count: await cursor.count() // this will give count of all the documents before .skip() and limit() };
It all depends on the pagination experience you need as to whether or not you need to do two queries. Do you need to list every single page or even a range of pages? Does anyone even go to page 1051 - conceptually what does that actually mean? Theres been lots of UX on patterns of pagination - Avoid the pains of pagination covers various types of pagination and their scenarios and many don't need a count query to know if theres a next page. For example if you display 10 items on a page and you limit to 13 - you'll know if theres another page..
MongoDB has introduced a new method for getting only the count of the documents matching a given query and it goes as follows: const result = await db.collection('foo').count({name: 'bar'}); console.log('result:', result) // prints the matching doc count Recipe for usage in pagination: const query = {name: 'bar'}; const skip = (pageNo - 1) * pageSize; // assuming pageNo starts from 1 const limit = pageSize; const [listResult, countResult] = await Promise.all([ db.collection('foo') .find(query) .skip(skip) .limit(limit), db.collection('foo').count(query) ]) return { totalCount: countResult, list: listResult } For more details on db.collection.count visit this page
It is possible to get the total result size without the effect of limit() using count() as answered here: Limiting results in MongoDB but still getting the full count? According to the documentation you can even control whether limit/pagination is taken into account when calling count(): https://docs.mongodb.com/manual/reference/method/cursor.count/#cursor.count Edit: in contrast to what is written elsewhere - the docs clearly state that "The operation does not perform the query but instead counts the results that would be returned by the query". Which - from my understanding - means that only one query is executed. Example: > db.createCollection("test") { "ok" : 1 } > db.test.insert([{name: "first"}, {name: "second"}, {name: "third"}, {name: "forth"}, {name: "fifth"}]) BulkWriteResult({ "writeErrors" : [ ], "writeConcernErrors" : [ ], "nInserted" : 5, "nUpserted" : 0, "nMatched" : 0, "nModified" : 0, "nRemoved" : 0, "upserted" : [ ] }) > db.test.find() { "_id" : ObjectId("58ff00918f5e60ff211521c5"), "name" : "first" } { "_id" : ObjectId("58ff00918f5e60ff211521c6"), "name" : "second" } { "_id" : ObjectId("58ff00918f5e60ff211521c7"), "name" : "third" } { "_id" : ObjectId("58ff00918f5e60ff211521c8"), "name" : "forth" } { "_id" : ObjectId("58ff00918f5e60ff211521c9"), "name" : "fifth" } > db.test.count() 5 > var result = db.test.find().limit(3) > result { "_id" : ObjectId("58ff00918f5e60ff211521c5"), "name" : "first" } { "_id" : ObjectId("58ff00918f5e60ff211521c6"), "name" : "second" } { "_id" : ObjectId("58ff00918f5e60ff211521c7"), "name" : "third" } > result.count() 5 (total result size of the query without limit) > result.count(1) 3 (result size with limit(3) taken into account)
Try as bellow: cursor.count(false, function(err, total){ console.log("total", total) }) core.db.users.find(query, {}, {skip:0, limit:1}, function(err, cursor){ if(err) return callback(err); cursor.toArray(function(err, items){ if(err) return callback(err); cursor.count(false, function(err, total){ if(err) return callback(err); console.log("cursor", total) callback(null, {items: items, total:total}) }) }) })
Thought of providing a caution while using the aggregate for the pagenation. Its better to use two queries for this if the API is used frequently to fetch data by the users. This is atleast 50 times faster than getting the data using aggregate on a production server when more users are accessing the system online. The aggregate and $facet are more suited for Dashboard , reports and cron jobs that are called less frequently.
We can do it using 2 query. const limit = parseInt(req.query.limit || 50, 10); let page = parseInt(req.query.page || 0, 10); if (page > 0) { page = page - 1} let doc = await req.db.collection('bookings').find().sort( { _id: -1 }).skip(page).limit(limit).toArray(); let count = await req.db.collection('bookings').find().count(); res.json({data: [...doc], count: count});
I took the two queries approach, and the following code has been taken straight out of a project I'm working on, using MongoDB Atlas and a full-text search index: return new Promise( async (resolve, reject) => { try { const search = { $search: { index: 'assets', compound: { should: [{ text: { query: args.phraseToSearch, path: [ 'title', 'note' ] } }] } } } const project = { $project: { _id: 0, id: '$_id', userId: 1, title: 1, note: 1, score: { $meta: 'searchScore' } } } const match = { $match: { userId: args.userId } } const skip = { $skip: args.skip } const limit = { $limit: args.first } const group = { $group: { _id: null, count: { $sum: 1 } } } const searchAllAssets = await Models.Assets.schema.aggregate([ search, project, match, skip, limit ]) const [ totalNumberOfAssets ] = await Models.Assets.schema.aggregate([ search, project, match, group ]) return await resolve({ searchAllAssets: searchAllAssets, totalNumberOfAssets: totalNumberOfAssets.count }) } catch (exception) { return reject(new Error(exception)) } })
I had the same problem and came across this question. The correct solution to this problem is posted here.
You can do this in one query. First you run a count and within that run the limit() function. In Node.js and Express.js, you will have to use it like this to be able to use the "count" function along with the toArray's "result". var curFind = db.collection('tasks').find({query}); Then you can run two functions after it like this (one nested in the other) curFind.count(function (e, count) { // Use count here curFind.skip(0).limit(10).toArray(function(err, result) { // Use result here and count here }); });