The MongoDb of my website stores a single document for each user. Each user will answer a couple of questionnaire forms during his visit. The forms are stored in an array, but since the documents don't overlap, a flat, single document would suffice. For analysis, I wish to produce a flat table of all the answers over all the forms.
Consider the following data structure:
{
"USER_SESSION_ID": 456,
"forms": [
{
"age": 21,
"gender": "m"
},
{
"job": "Student",
"years_on_job": "12"
},
{
"Hobby": "Hiking",
"Twitter": "#my_account"
}
]
},
{
"USER_SESSION_ID": 678,
"forms": [
{
"age": 46,
"gender": "f"
},
{
"job": "Bodyguard",
"years_on_job": "2"
},
{
"Hobby": "Skiing",
"Twitter": "#bodyguard"
}
]
}
The form-documents all look different and have no conflicting fields, so I would like to merge them, yielding a tabular, flat structure like this:
{ 'USER_SESSION_ID': 456, 'age': 21, 'gender': 'm', 'job': 'Student', ... 'Twitter': '#my_account' }
{ 'USER_SESSION_ID': 678, 'age': 46, 'gender': 'f', 'job': 'Bodyguard', ... 'Twitter': '#bodyguard' }
Using Python, this is a total no-brainer, looking like this:
for session in sessions: # Iterate all docs
for form in session['forms']: # Iterate all children
session.update(form) # Integrate to parent doc
del session['forms'] # Remove nested child
In MongoDb I find this quite hard to achieve. I am trying to use the aggregate pipeline, which I imagine should be suitable for this.
So far I helped myself by unwinding my datastructure, like this:
db.sessions.aggregate(
{
'$unwind': '$forms'
},
{
'$project': {
'USER_SESSION_ID': true,
'forms': true
}
},
{
'$group': {
'_id': '$USER_SESSION_ID',
'forms': <magic?!>
}
}
)
In the unwinding stage, I create a document with the parent's data for each child. This should be roughly equivalent to the double-for loop in my python code. However what I feel like I'm conceptually missing is the "Merge" accumulator upon grouping. In python, this is done with dict.update(), in underscore.js it would be _.extend(destination, *sources).
How do I achieve this within MongoDB?
Try the following which uses nested forEach() method calls of the find() cursor to iterate over the cursor result and get the object keys for the elements within the forms array using Object.keys():
db.sessions.find().forEach(function (doc){
doc.forms.forEach(function (e){
var keys = Object.keys(e);
keys.forEach(function(key){ doc[key] = e[key] });
});
delete doc.forms;
db.sessions.save(doc);
});
I played around with the aggregate pipeline for ages until I gave the mapReduce command a try. This is what I came up with:
db.sessions.mapReduce(
function () {
var merged = {};
this.forms.forEach(function (form) {
for(var key in form) {
merged[key] = form[key];
}
});
emit(this.USER_SESSION_ID, merged);
},
function () {},
{
"out": {"inline": true}
}
)
The mapping step combines the elements, since there is no single $merging operator available as an aggregation pipeline step. The empty reduce function is required. The out either writes to a different collection or just returns the result (inline, what I'm doing here).
It looks a lot like the method that chridam showed in his answer, but actually uses a projection. His version is much closer to the way that my python code works, but for what I'm trying to do a projection is fine and doesn't change the original set. Note that the python code does that, but not chaning the input collection is quite useful!
Related
I have a document that holds lists containing nested objects. The document simplified looks like this:
{
"username": "user",
"listOne": [
{
"name": "foo",
"qnty": 5
},
{
"name": "bar",
"qnty": 3
},
],
"listTwo": [
{
"id": 1,
"qnty": 13
},
{
"id": 2,
"qnty": 9
},
]
}
And I need to update quantity in lists based on an indentifier. For list one it was easy. I was doing something like this:
db.collection.findOneAndUpdate(
{
"username": "user",
"listOne.name": name
},
{
$inc: {
"listOne.$.qnty": qntyChange,
}
}
)
Then I would catch whenever find failed because there was no object in the list with that name and nothing was updated, and do a new operation with $push. Since this is a rarer case, it didn't bother me to do two queries in the database collection.
But now I had to also add list two to the document. And since the identifiers are not the same I would have to query them individually. Meaning four searches in the database collection, in the worst case scenario, if using the same strategy I was using before.
So, to avoid this, I wrote an update using an aggregation pipeline. What it does is:
Look if there is an object in the list one with the queried identifier.
If true, map through the entire array and:
2.1) Return the same object if the identifier is different.
2.2) Return object with the quantity changed when identifier matches.
If false, push a new object with this identifier to the list.
Repeat for list two
This is the pipeline for list one:
db.coll1.updateOne(
{
"username": "user"
},
[{
"$set": {
"listOne": {
"$cond": {
"if": {
"$in": [
name,
"$listOne.name"
]
},
"then": {
"$map": {
"input": "$listOne",
"as": "one",
"in": {
"$cond": {
"if": {
"$eq": [
"$$one.name",
name
]
},
"then": {
"$mergeObjects": [
"$$one",
{
"qnty": {
"$add": [
"$$one.qnty",
qntyChange
]
}
}
]
},
"else": "$$one"
}
}
}
},
"else": {
"$concatArrays": [
"$listOne",
[
{
"name": name,
"qnty": qntyChange
}
]
]
}
}
}
}
}]
);
Entire pipeline can be foun on this Mongo Playgorund.
So my question is about how efficient is this. As I am paying for server time, I would like to use an efficient solution to this problem. Querying the collection four times, or even just twice but at every call, seems like a bad idea, as the collection will have thousands of entries. The two lists, on the other hand, are not that big, and should not exceed a thousand elements each. But the way it's written it looks like it will iterate over each list about two times.
And besides, what worries me the most, is if when I use map to change the list and return the same object, in cases where the identifier does not match, does MongoDB rewrite these elements too? Because not only would that increase my time on the server rewriting the entire list with the same objects, but it would also count towards the bytes size of my write operation, which are also charged by MongoDB.
So if anyone has a better solution to this, I'm all ears.
According to this SO answer,
What you actually do inside of the document (push around an array, add a field) should not have any significant impact on the total cost of the operation
So, in your case, your array operations should not be causing a heavy impact on the total cost.
I'm trying to write a way to update a whole MongoDB document including subdocuments using Mongoose and a supplied update object. I want to supply an object from my client in the shape of the schema and then iterate over it to update each property in the document including those in nested subdocuments in arrays.
So if my schema looked like this:
const Person = new Schema({
name: String,
age: Number,
addresses: [
{
label: String,
fullAddress: String,
},
],
bodyMeasurements: {
height: Number,
weight: Number,
clothingSizes: {
jacket: Number,
pants: Number
},
},
})
I would want to supply an object to update an existing document that looked something like this:
{
"_id": "217a7f84685f49642635dff0",
"name": "Dan",
"addresses": [
{
"_id": "5f49647f84f02635df217a68",
"label": "Home 2"
},
{
"label": "Work",
"fullAddress": "6 Elm Street"
}
],
"bodyMeasurements": {
"_id": "2635df217a685f49647f84f0",
"weight": 90,
"clothingSizes": {
"_id": "217a685f4962635df49647f84f0",
"pants": 32
}
}
}
The code would need to iterate through all the keys and values of the object entries, and where it found a Mongo ID it would know to update that specific item (like the "Home 2" address label), and where it didn't it would know to add it (like the second "Work" address here), or replace it if it was a property on the top level. (like the "name" property with "Dan")
For a one dimensional Schema without taking into account _id's this would work:
for (const [key, value] of Object.entries(req.body)) {
person[key] = value;
}
But not for any nested subdocuments or documents in arrays. I don't want to have to specify the name of the subdocuments for each case. I am trying to find a way to update any Schema's documents and subdocuments generically.
I imagine there might be some recursion and the Mongoose .set() method needed to handle deeply nested documents.
I'm making a little app in nodejs, I'm struggling trying to print some data provenient from a json which has the following structure:
{
"courses": [
{
"java": [
{ "attendees": 43 },
{ "subject": "Crash course" }
]
},
{
"python":
{
"occurrences": [
{ "attendees": 24 },
{ "subject": "another crash course" },
{ "notes": "completed with issues" }
,
{ "attendees": 30 },
{ "subject": "another crash course" },
{ "notes": "completed with issues" }
]
}
}
],
}
If I want to print the attendees at 'java' I do:
console.log(myJSON.courses[0]['java'][0]['attendees']);
which prints
43
and if I want to print the notes of the 2nd occurrence of the python course I do:
console.log(myJSON.courses[1]['python']['occurrences'][2]['notes']);
which prints:
completed with issues
The before mentioned cases are correct, but what I want to do is to print the keys of 'java' ('attendees' and 'subject'), as you can see Java is an array and in its unique position it has two json objects, I've tried with:
console.log(myJSON.courses[0]['java'][0].keys;
and with
console.log(myJSON.courses[0]['java'].keys;
but they print "undefined" and "[Function: keys]" respectively.
What I'm missing here?
Could anybody help me please?:(
myJSON.courses[0]['java'] is an array with indexes. Where each index holds an object with keys. Your array doesn't exactly have the keys you want (the keys of an array are its indexes: 0, 1 etc...)
Instead, you want to access all the keys from the objects in the myJSON.courses[0]['java'] array.
You can do this by using .map and Object.keys. .map will allow you to get and convert every object in your myJSON.courses[0]['java'] array. Object.keys() will allow you to get an array of keys from the given object (in your case your array will be of length 1, and so you can access index 0 of this array).
const myJSON = {courses:[{java:[{attendees:43},{subject:"Crash course"}]},{python:{occurrences:[{attendees:24},{subject:"another crash course"},{notes:"completed with issues"},{attendees:30},{subject:"another crash course"},{notes:"completed with issues"}]}}]};
const myKeys = myJSON.courses[0]['java'].map(obj => Object.keys(obj)[0]);
console.log(myKeys);
If you have multiple keys in your objects within an array, you can also use .flatMap (take note of browser support):
const myJSON = {courses:[{java:[{attendees:43},{subject:"Crash course"}]},{python:{occurrences:[{attendees:24},{subject:"another crash course"},{notes:"completed with issues"},{attendees:30},{subject:"another crash course"},{notes:"completed with issues"}]}}]};
const myKeys = myJSON.courses[0]['java'].flatMap(Object.keys);
console.log(myKeys);
I have a fairly complex array generated from Google's natural language API. I feed it a paragraph of text and out comes lots of language information regarding such paragraph.
My end goal is to find "key words" from this paragraph, so, to achieve this I want to put all the "entities" into a flat array, count the duplicates, and then consider words with the highest amount of duplicates to be "key words". If it doesn't find any then I'll cherry pick words from entities I consider most significant.
I already know the entities that could exist:
var entities = [
'art',
'events',
'goods',
'organizations',
'other',
'people',
'places',
'unknown'
];
Here is an example structure of the array I'm working with.
input = [
{
language: {
entities: {
people: [
{
name: "Paul",
type: "Person",
},
{
name: "Paul",
type: "Person",
},
],
goods: [
{
name: "car",
type: "Consumer_good",
}
], //etc
}
}
}
];
output = ["Paul", "Paul", "car"...];
My question is - what is the best way to convert my initial array into a flat array to then find the duplicates without using a whole bunch of FOR loops?
There is no way around loops or array functions if you work with dynamic input data.
You can access all the values using this format:
input[0]["language"]["entities"]["people"][0].name
input = [
{
language: {
entities: {
people: [
{
name: "Paul",
type: "Person",
},
{
name: "Paul",
type: "Person",
},
],
goods: [
{
name: "car",
type: "Consumer_good",
}
], //etc
}
}
}
];
console.log(input[0]["language"]["entities"]["people"][0].name);
Then you could do something like this:
for (var entry in input[0]["language"]["entities"]) {
console.log(entry);
}
OR, if I understood you wrong,
You can use this to turn the javascript Object into an array using this (requires jquery):
var myObj = {
1: [1, 2, 3],
2: [4, 5, 6]
};
var array = $.map(myObj, function(value, index) {
return [value];
});
console.log(array[0][0]);
console.log(array[0]);
console.log(array);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
This will output
1
[1, 2, 3]
[[1,2,3],[4,5,6]]
You could iterate through input.language.entities in a recursive way and collect all the .name properties into an array. Then you have only one for loop :-).
After doing that, you can iterate through it to find the duplicates. If you sort it alphabetical before it is easier (if two or more consecutive entries are equal, there are duplicates).
But it could be a bit dangerous if google changes the api or if it delivers crap data because of a malfunction.
Isn't input.language.entities already flat enough to work with it?
I ended up doing something like this. It's not pretty but it gets the job done.
var result = [];
var known_entities = ['art','events','goods','organizations','other','people','places','unknown'];
for(i=0; i < known_entities.length; i++){
var entity = known_entities[i];
if(language.entities[entity]){
for(var j in language.entities[entity]){
var word = language.entities[entity][j].name
result.key_words.push(word);
}
}
}
I'm trying to delete ($pull) an object from an array that's embedded. (Using javascript/node.js driver.)
Here is the sample data, where one, two, three are the levels:
{
one : "All",
one : [
{
name: "People",
two: [
{
three_id: 123,
three: "Jonny",
},
{
three_id: 456,
three: "Bobby",
}
]
},
{
name: "Animals",
two: [
{
three_id: 828,
three: "Cat",
},
{
three_id: 282,
three: "Dog",
}
]
}
]
}
In this example, I'm trying get rid of "Bobby".
I can successfully match the document at the "three level" if I want, like this:
db.test.find({"one.two.three_id" : 456});
However, I've no idea how to eliminate that record using update. Here are some attempts, none of which work:
// failed attempts
db.test.update({"one.two.three_id" : 456}, {$pull:{'one.$.two.$.three_id': 456}});
db.test.update({"one.two.three_id" : 456}, {$pull:{'three_id': 456}});
// deletes entire level two "People"
db.test.update({"one.two.three_id" : 456}, {$pull: {one: {two : {$elemMatch: {'three_id': 456}}}}});
I read that you cannot use two positional $ operators and that you have to know the index position for the second one. However, I want to avoid having to use the index of the embedded dictionary I want to delete.
reference:
Mongodb on pull
http://docs.mongodb.org/manual/reference/operator/update/pull/
The value of the key in your $pull object needs to be the path of the array that you're targeting. This appears to work:
db.test.update(
{'one.two.three_id': 456},
{$pull: {'one.$.two': {three_id: 456}}}
);
It looks like the $ represents the index of the first matched array level in this case so it works even though we're matching across multiple nesting levels.