Comparing collections with mongodb - javascript

I have about 25k documents in a collection (codename 'Parties') which is structured like this:
{
"_id": ObjectId("..."),
"name": "Magic Show",
"funStuff": [
{
"date": new Date("2010-01-04T16:00:00+1100"),
"start": 5,
"finish": 5.5,
"symbol": "ABCD"
}, ...
]
}, ...
Each document has a child array called funStuff that contains roughly 11k items.
I need to compare all Parties.funStuff using the date as a match. For example:
Get 2 random documents (x, y) from parties
Map through x.funStuff and attempt to match a date with y.funStuff
If match occurs update/upsert a new document ('PartiesCompared') that looks roughly like this:
{ parent: x._id, child: y.id, results: ... }
4. Go back to step 2 moving forward a day.
I've tried a few JS solutions find -> map -> save results, but it very slow and will take months(just under a year lol) to complete. I'm obviously doing it very wrong.

Related

MongoDB: Efficiency of operation pushing to a nested array or updating it when identifier found, using aggregation pipeline

I have a document that holds lists containing nested objects. The document simplified looks like this:
{
"username": "user",
"listOne": [
{
"name": "foo",
"qnty": 5
},
{
"name": "bar",
"qnty": 3
},
],
"listTwo": [
{
"id": 1,
"qnty": 13
},
{
"id": 2,
"qnty": 9
},
]
}
And I need to update quantity in lists based on an indentifier. For list one it was easy. I was doing something like this:
db.collection.findOneAndUpdate(
{
"username": "user",
"listOne.name": name
},
{
$inc: {
"listOne.$.qnty": qntyChange,
}
}
)
Then I would catch whenever find failed because there was no object in the list with that name and nothing was updated, and do a new operation with $push. Since this is a rarer case, it didn't bother me to do two queries in the database collection.
But now I had to also add list two to the document. And since the identifiers are not the same I would have to query them individually. Meaning four searches in the database collection, in the worst case scenario, if using the same strategy I was using before.
So, to avoid this, I wrote an update using an aggregation pipeline. What it does is:
Look if there is an object in the list one with the queried identifier.
If true, map through the entire array and:
2.1) Return the same object if the identifier is different.
2.2) Return object with the quantity changed when identifier matches.
If false, push a new object with this identifier to the list.
Repeat for list two
This is the pipeline for list one:
db.coll1.updateOne(
{
"username": "user"
},
[{
"$set": {
"listOne": {
"$cond": {
"if": {
"$in": [
name,
"$listOne.name"
]
},
"then": {
"$map": {
"input": "$listOne",
"as": "one",
"in": {
"$cond": {
"if": {
"$eq": [
"$$one.name",
name
]
},
"then": {
"$mergeObjects": [
"$$one",
{
"qnty": {
"$add": [
"$$one.qnty",
qntyChange
]
}
}
]
},
"else": "$$one"
}
}
}
},
"else": {
"$concatArrays": [
"$listOne",
[
{
"name": name,
"qnty": qntyChange
}
]
]
}
}
}
}
}]
);
Entire pipeline can be foun on this Mongo Playgorund.
So my question is about how efficient is this. As I am paying for server time, I would like to use an efficient solution to this problem. Querying the collection four times, or even just twice but at every call, seems like a bad idea, as the collection will have thousands of entries. The two lists, on the other hand, are not that big, and should not exceed a thousand elements each. But the way it's written it looks like it will iterate over each list about two times.
And besides, what worries me the most, is if when I use map to change the list and return the same object, in cases where the identifier does not match, does MongoDB rewrite these elements too? Because not only would that increase my time on the server rewriting the entire list with the same objects, but it would also count towards the bytes size of my write operation, which are also charged by MongoDB.
So if anyone has a better solution to this, I'm all ears.
According to this SO answer,
What you actually do inside of the document (push around an array, add a field) should not have any significant impact on the total cost of the operation
So, in your case, your array operations should not be causing a heavy impact on the total cost.

Finding like values and appending items to array (javascript)

I have two arrays I'm trying to combine in a very specific way and I need a little guidance. Array1 is an array of dates 30-40 items, Array 2 is a list of objects with a date inside one of the attributes. I'm trying to append the object in array 2 to the index of array1 when the dates match.
I want to put arr2 in the same index as arr1 if the dates match.
const arr = [
"2022-06-26T07:00:00.000Z",
"2022-06-27T07:00:00.000Z",
"2022-06-28T07:00:00.000Z",
"2022-06-29T07:00:00.000Z",
"2022-06-30T07:00:00.000Z",
"2022-07-01T07:00:00.000Z",
"2022-07-02T07:00:00.000Z",
"2022-07-03T07:00:00.000Z",
"2022-07-04T07:00:00.000Z",
"2022-07-05T07:00:00.000Z",
"2022-07-06T07:00:00.000Z",
"2022-07-07T07:00:00.000Z",
"2022-07-08T07:00:00.000Z",
"2022-07-09T07:00:00.000Z",
"2022-07-10T07:00:00.000Z",
"2022-07-11T07:00:00.000Z",
"2022-07-12T07:00:00.000Z",
"2022-07-13T07:00:00.000Z",
"2022-07-14T07:00:00.000Z",
"2022-07-15T07:00:00.000Z",
"2022-07-16T07:00:00.000Z",
"2022-07-17T07:00:00.000Z",
"2022-07-18T07:00:00.000Z",
"2022-07-19T07:00:00.000Z",
"2022-07-20T07:00:00.000Z",
"2022-07-21T07:00:00.000Z",
"2022-07-22T07:00:00.000Z",
"2022-07-23T07:00:00.000Z",
"2022-07-24T07:00:00.000Z",
"2022-07-25T07:00:00.000Z",
"2022-07-26T07:00:00.000Z",
"2022-07-27T07:00:00.000Z",
"2022-07-28T07:00:00.000Z",
"2022-07-29T07:00:00.000Z",
"2022-07-30T07:00:00.000Z",
"2022-07-31T07:00:00.000Z",
"2022-08-01T07:00:00.000Z",
"2022-08-02T07:00:00.000Z",
"2022-08-03T07:00:00.000Z",
"2022-08-04T07:00:00.000Z",
"2022-08-05T07:00:00.000Z",
"2022-08-06T07:00:00.000Z"
]
const arr2 = [
{
"gsi1SK": "name ",
"searchPK": "thing",
"SK": "uuid",
"Desc": "place #1205",
"PK": "thing uuid",
"searchSK": "7/1/2022",
"gsi1PK": "thing",
"Complete": false,
"Users": [
"person1",
"person2"
]
},
{
"gsi1SK": "name",
"searchPK": "thing",
"SK": "uuid",
"Desc": "place#124124",
"PK": "thing uuid",
"searchSK": "7/4/2022",
"gsi1PK": "thing",
"Complete": false,
"Users": [
"person2",
"person45"
]
}
]
console.log([arr, arr2]);
You seem to have a handle on the date conversion part. Here I've defined two short sample arrays to represent arr2 and newArr. Then, a map function to create the output.
const arr2 = [
{
"OTHER_FIELDS": "TOP SECRET",
"searchSK":"7/4/2022"
},
{
"OTHER_FIELDS": "TOP SECRET",
"searchSK":"7/9/2022"
}
];
const newArr = [
[
"7/2/2022"
],
[
"7/3/2022"
],
[
"7/4/2022"
],
[
"7/5/2022"
],
[
"7/6/2022"
],
[
"7/7/2022"
],
[
"7/8/2022"
],
[
"7/9/2022"
],
[
"7/10/2022"
]
];
// for each subarray in newArr, return an array containing the existing element plus any elements from arr2 found by the filter function
const output = newArr.map(el => [...el, ...arr2.filter(a2 => a2.searchSK === el[0])]);
console.log(output);
Plan
You've got two obvious options:
A. Look at each of the objects, finding a home for each one in turn
B. Look at each of the dates, collecting all the objects that belong to it
Which method makes more sense for you will depend on other factors you haven't covered in your post. I think the main question is: is it guaranteed that the date list will contain a proper home for every object? If no, then do you want to drop the objects without proper homes, or do you want to create a proper home for the objects
Performance can also matter, but really only if you expect either list to be very long or if you need to run this process multiple times (such as in a React component in the browser).
Implement
Loop through the list you chose. For each item, scan the other list for the relevant item(s): its home or its children. Take the appropriate action for those items depending on which plan you chose.
Another consideration is: don't mutate your arguments. That means you probably need to create copies of the two input arrays before you do the work. If the arrays contain objects rather than scalars, you can't just do array.slice() to create a copy.
For an array of POJOs, you can convert the source to a string and then back again to create a clone.
The array of dates will need special handling, because JSON.parse will not revive dates.
Mutating arguments is generally a bad practice, at least in the functional paradigm that underlies many popular front-end frameworks today. Plus, if you create your own copies of the input data, you can gain efficiency by moving items from the source arrays to the output array, which means that subsequent iterations won't have to re-examine items that have already been processed.

How to write a mongoose update to update a whole document, various subdocuments included

I'm trying to write a way to update a whole MongoDB document including subdocuments using Mongoose and a supplied update object. I want to supply an object from my client in the shape of the schema and then iterate over it to update each property in the document including those in nested subdocuments in arrays.
So if my schema looked like this:
const Person = new Schema({
name: String,
age: Number,
addresses: [
{
label: String,
fullAddress: String,
},
],
bodyMeasurements: {
height: Number,
weight: Number,
clothingSizes: {
jacket: Number,
pants: Number
},
},
})
I would want to supply an object to update an existing document that looked something like this:
{
"_id": "217a7f84685f49642635dff0",
"name": "Dan",
"addresses": [
{
"_id": "5f49647f84f02635df217a68",
"label": "Home 2"
},
{
"label": "Work",
"fullAddress": "6 Elm Street"
}
],
"bodyMeasurements": {
"_id": "2635df217a685f49647f84f0",
"weight": 90,
"clothingSizes": {
"_id": "217a685f4962635df49647f84f0",
"pants": 32
}
}
}
The code would need to iterate through all the keys and values of the object entries, and where it found a Mongo ID it would know to update that specific item (like the "Home 2" address label), and where it didn't it would know to add it (like the second "Work" address here), or replace it if it was a property on the top level. (like the "name" property with "Dan")
For a one dimensional Schema without taking into account _id's this would work:
for (const [key, value] of Object.entries(req.body)) {
person[key] = value;
}
But not for any nested subdocuments or documents in arrays. I don't want to have to specify the name of the subdocuments for each case. I am trying to find a way to update any Schema's documents and subdocuments generically.
I imagine there might be some recursion and the Mongoose .set() method needed to handle deeply nested documents.

How to acces to the data from several json objects inside an array position

I'm making a little app in nodejs, I'm struggling trying to print some data provenient from a json which has the following structure:
{
"courses": [
{
"java": [
{ "attendees": 43 },
{ "subject": "Crash course" }
]
},
{
"python":
{
"occurrences": [
{ "attendees": 24 },
{ "subject": "another crash course" },
{ "notes": "completed with issues" }
,
{ "attendees": 30 },
{ "subject": "another crash course" },
{ "notes": "completed with issues" }
]
}
}
],
}
If I want to print the attendees at 'java' I do:
console.log(myJSON.courses[0]['java'][0]['attendees']);
which prints
43
and if I want to print the notes of the 2nd occurrence of the python course I do:
console.log(myJSON.courses[1]['python']['occurrences'][2]['notes']);
which prints:
completed with issues
The before mentioned cases are correct, but what I want to do is to print the keys of 'java' ('attendees' and 'subject'), as you can see Java is an array and in its unique position it has two json objects, I've tried with:
console.log(myJSON.courses[0]['java'][0].keys;
and with
console.log(myJSON.courses[0]['java'].keys;
but they print "undefined" and "[Function: keys]" respectively.
What I'm missing here?
Could anybody help me please?:(
myJSON.courses[0]['java'] is an array with indexes. Where each index holds an object with keys. Your array doesn't exactly have the keys you want (the keys of an array are its indexes: 0, 1 etc...)
Instead, you want to access all the keys from the objects in the myJSON.courses[0]['java'] array.
You can do this by using .map and Object.keys. .map will allow you to get and convert every object in your myJSON.courses[0]['java'] array. Object.keys() will allow you to get an array of keys from the given object (in your case your array will be of length 1, and so you can access index 0 of this array).
const myJSON = {courses:[{java:[{attendees:43},{subject:"Crash course"}]},{python:{occurrences:[{attendees:24},{subject:"another crash course"},{notes:"completed with issues"},{attendees:30},{subject:"another crash course"},{notes:"completed with issues"}]}}]};
const myKeys = myJSON.courses[0]['java'].map(obj => Object.keys(obj)[0]);
console.log(myKeys);
If you have multiple keys in your objects within an array, you can also use .flatMap (take note of browser support):
const myJSON = {courses:[{java:[{attendees:43},{subject:"Crash course"}]},{python:{occurrences:[{attendees:24},{subject:"another crash course"},{notes:"completed with issues"},{attendees:30},{subject:"another crash course"},{notes:"completed with issues"}]}}]};
const myKeys = myJSON.courses[0]['java'].flatMap(Object.keys);
console.log(myKeys);

How to merge a heterogeneous array to a single document in MongoDb?

The MongoDb of my website stores a single document for each user. Each user will answer a couple of questionnaire forms during his visit. The forms are stored in an array, but since the documents don't overlap, a flat, single document would suffice. For analysis, I wish to produce a flat table of all the answers over all the forms.
Consider the following data structure:
{
"USER_SESSION_ID": 456,
"forms": [
{
"age": 21,
"gender": "m"
},
{
"job": "Student",
"years_on_job": "12"
},
{
"Hobby": "Hiking",
"Twitter": "#my_account"
}
]
},
{
"USER_SESSION_ID": 678,
"forms": [
{
"age": 46,
"gender": "f"
},
{
"job": "Bodyguard",
"years_on_job": "2"
},
{
"Hobby": "Skiing",
"Twitter": "#bodyguard"
}
]
}
The form-documents all look different and have no conflicting fields, so I would like to merge them, yielding a tabular, flat structure like this:
{ 'USER_SESSION_ID': 456, 'age': 21, 'gender': 'm', 'job': 'Student', ... 'Twitter': '#my_account' }
{ 'USER_SESSION_ID': 678, 'age': 46, 'gender': 'f', 'job': 'Bodyguard', ... 'Twitter': '#bodyguard' }
Using Python, this is a total no-brainer, looking like this:
for session in sessions: # Iterate all docs
for form in session['forms']: # Iterate all children
session.update(form) # Integrate to parent doc
del session['forms'] # Remove nested child
In MongoDb I find this quite hard to achieve. I am trying to use the aggregate pipeline, which I imagine should be suitable for this.
So far I helped myself by unwinding my datastructure, like this:
db.sessions.aggregate(
{
'$unwind': '$forms'
},
{
'$project': {
'USER_SESSION_ID': true,
'forms': true
}
},
{
'$group': {
'_id': '$USER_SESSION_ID',
'forms': <magic?!>
}
}
)
In the unwinding stage, I create a document with the parent's data for each child. This should be roughly equivalent to the double-for loop in my python code. However what I feel like I'm conceptually missing is the "Merge" accumulator upon grouping. In python, this is done with dict.update(), in underscore.js it would be _.extend(destination, *sources).
How do I achieve this within MongoDB?
Try the following which uses nested forEach() method calls of the find() cursor to iterate over the cursor result and get the object keys for the elements within the forms array using Object.keys():
db.sessions.find().forEach(function (doc){
doc.forms.forEach(function (e){
var keys = Object.keys(e);
keys.forEach(function(key){ doc[key] = e[key] });
});
delete doc.forms;
db.sessions.save(doc);
});
I played around with the aggregate pipeline for ages until I gave the mapReduce command a try. This is what I came up with:
db.sessions.mapReduce(
function () {
var merged = {};
this.forms.forEach(function (form) {
for(var key in form) {
merged[key] = form[key];
}
});
emit(this.USER_SESSION_ID, merged);
},
function () {},
{
"out": {"inline": true}
}
)
The mapping step combines the elements, since there is no single $merging operator available as an aggregation pipeline step. The empty reduce function is required. The out either writes to a different collection or just returns the result (inline, what I'm doing here).
It looks a lot like the method that chridam showed in his answer, but actually uses a projection. His version is much closer to the way that my python code works, but for what I'm trying to do a projection is fine and doesn't change the original set. Note that the python code does that, but not chaning the input collection is quite useful!

Categories