Mongoose findOneAndUpdate: create and then update nested array - javascript

I have a program where I'm requesting weather data from a server, processing the data, and then saving it to an mlab account using mongoose. I'm gathering 10 years of data, but the API that I'm requesting the data from only allows about a year at a time to be requested.
I'm using findOndAndUpdate to create/update the document for each weather station, but am having trouble updating the arrays within the data object. (Probably not the best way to describe it...)
For example, here's the model:
const stnDataSchema = new Schema(
{
station: { type: String, default: null },
elevation: { type: String, default: null },
timeZone: { type: String, default: null },
dates: {},
data: {}
},
{ collection: 'stndata' },
{ runSettersOnQuery: true }
)
where the dates object looks like this:
dates: ["2007-01-01",
"2007-01-02",
"2007-01-03",
"2007-01-04",
"2007-01-05",
"2007-01-06",
"2007-01-07",
"2007-01-08",
"2007-01-09"]
and the data object like this:
"data": [
{
"maxT": [
0,
null,
4.4,
0,
-2.7,
etc.....
what I want to have happen is when I run findOneAndUpdate I want to find the document based on the station, and then append new maxT values and dates to the respective arrays. I have it working for the date array, but am running into trouble with the data array as the elements I'm updated are nested.
I tried this:
const update = {
$set: {'station': station, 'elevation': elevation, 'timeZone': timeZone},
$push: {'dates': datesTest, 'data.0.maxT': testMaxT}};
StnData.findOneAndUpdate( query, update, {upsert: true} ,
function(err, doc) {
if (err) {
console.log("error in updateStation", err)
throw new Error('error in updateStation')
}
else {
console.log('saved')
but got an output into mlab like this:
"data": {
"0": {
"maxT": [
"a",
"b",
the issue is that I get a "0" instead of an array of one element. I tried 'data[0].maxT' but nothing happens when I do that.
The issue is that the first time I run the data for a station, I want to create a new document with data object of the format in my third code block, and then on subsequent runs, once that document already exists, update the maxT array with new values. Any ideas?

You are getting this output:
"data": {
"0": {
"maxT": [
"a",
"b",
because you are upserting the document. Upserting gets a bit complicated when dealing with arrays of documents.
When updating an array, MongoDB knows that data.0 refers to the first element in the array. However, when inserting, MongoDB can't tell if it's meant to be an array or an object. So it assumes it's an object. So rather than inserting ["val"], it inserts {"0": "val"}.
Simplest Solution
Don't use an upsert. Insert a document for each new weather station then use findOndAndUpdate to push values into the arrays in the documents. As long as you insert the arrays correctly the first time, you will be able to push to them without them turning into objects.
Alternative Simple Solution if data just Contains one Object
From your question, it looks like you only have one object in data. If that is the case, you could just make the maxT array top-level, instead of being a property of a single document in an array. Then it would act just like dates.
More Complicated MongoDB 3.6 Solution
If you truly cannot do without upserts, MongoDB 3.6 introduced the filtered positional operator $[<identifier>]. You can use this operator to update specific elements in an array which match a query. Unlike the simple positional operator $, the new $[<identifier>] operator can be used to upsert as long as an exact match is used.
You can read more about this operator here: https://docs.mongodb.com/manual/reference/operator/update/positional-filtered/
So your data objects will need to have a field which can be matched exactly on (say name). An example query would look something like this:
let query = {
_id: 'idOfDocument',
data: [{name: 'subobjectName'}] // Need this for an exact match
}
let update = {$push: {'data.$[el].maxT': testMaxT}}
let options = {upsert: true, arrayFilters: [{'el.name': 'subobjectName'}]}
StnData.findOneAndUpdate(query, update, options, callbackFn)
As you can see this adds much more complexity. It would be much easier to forget about trying to do upserts. Just do one insert then update.
Moreover mLab currently does not support MongoDB 3.6. So this method won't be viable when using mLab until 3.6 is supported.

Related

MongoDB - slow query on old documents (aggregation and sorting)

I have two DBs for testing and each contains thousands/hundreds of thousand of documents.
But with the same Schemas and CRUD operations.
Let's call DB1 and DB2.
I am using Mongoose
Suddenly DB1 became really slow during:
const eventQueryPipeline = [
{
$match: {
$and: [{ userId: req.body.userId }, { serverId: req.body.serverId }],
},
},
{
$sort: {
sort: -1,
},
},
];
const aggregation = db.collection
.aggregate(eventQueryPipeline)
.allowDiskUse(true);
aggregation.exect((err, result) => {
res.json(result);
});
In DB2 the same exact query runs in milliseconds up to maximum a 10 seconds
In DB1 the query never takes less than 40 seconds.
I do not understand why. What could I be missing?
I tried to confront the Documents and the Indexes and they're the same.
Deleting the collection and restrting saving the documents, brings the speed back to normal and acceptable, but why is it happening? Does someone had same experience?
Short answer:
You should create following index:
{ "userId": 1, "serverId": 1, "sort": 1 }
Longer answer
Based on your code (i see that you have .allowDiskUse(true)) it looks like mongo is trying to do in memory sort with "a lot" of data. Mongo has by default 100MB system memory limit for sort operations, and you can allow it to use temporary files on disk to store data if it hits that limit.
You can read more about it here: https://www.mongodb.com/docs/manual/reference/method/cursor.allowDiskUse/
In order to optimise the performance of your queries, you can use indexes.
Common rule that you should follow when planning indexes is ESR (Equality, Sort, Range). You can read more about it here: https://www.mongodb.com/docs/v4.2/tutorial/equality-sort-range-rule/
If we follow that rule while creating our compound index, we will add equality matches first, in your case "userId" and "serverId". After that comes the sort field, in your case "sort".
If you had a need to additionally filter results based on some range (eg. some value greater than X, or timestamp greater than yday), you would add that after the "sort".
That means your index should look like this:
schema.index({ userId: 1, serverId: 1, sort: 1 });
Additionally, you can probably remove allowDiskUse, and handle err inside aggregation.exec callback (im assuming that aggregation.exect is a typo)

Updating a collection from a different database

I'm using Mongo 4.1 and would like to update a collection named "location_copy", by adding a new field to it of type object named "time", with two subfields: "utcTime", which will be populated by the value of that documents "time" field, and "tz", which will be populated by value of "subject.contactInf[0].addresses[0].timeZoneID" from of the document in the collection "subjects" in the database "Subjects" (a different database from the one of the first collection) with "_id" field value corresponding to "subjectID" field in locations_copy.
I have tried to accomplish this with the following code:
const get_time_zone_id = function(doc) {return doc.contactInfo[0].addresses[0].timeZoneID}
const get_location_doc = function(subjectID) { return db.getSiblingDB('Subjects').subjects.find({"_id": subjectID, "contactInfo": {"$exists": true}, "$where" : function() {
return (this.contactInfo.length > 0 && this.contactInfo[0].addresses && this.contactInfo[0].addresses.length > 0 && this.contactInfo[0].addresses[0].timeZoneID)
}}, {"contactInfo" : {"$slice": 1}, "contactInfo.addresses": {"$slice": 1},"contactInfo.addresses.timeZoneID" : 1}).map(get_time_zone_id)}
db.locations_copy.aggregate( [
{ $match: {"subjectID": {"$exists": true}}},
{ $addFields: {
time: { utc: "$timeUTC",
tz: { "$arrayElemAt": [get_location_doc(ObjectId("$subjectID")), 0 ] }}
}
}
] ).forEach(function(x){db.locations_copy.save(x)})
everything works except for one thing: when I try to pass ObjectId("$subjectID") as a parameter to "get_location_doc", it parses "$subjectID" as a literal string rather than passing the value of the underlying field in each document. I have also tried passing simply subjectID (without quotes) in which case it was simply undefined, or "$$subjectID" which led me to a literal string again. I understand this is due to client/server side parsing in run time.
I have tried to utilize the "$function" operator, but apparently it's only available from version 4.4 (I'm using 4.1).
I should note, that if I replace "$subjectID" with a hard-coded string ID (for example "5ff4c037bc0a716381231277") everything works as you'd expect.
Can anyone please help me accomplish what I intend? since this script is only meant to be executed once, performance is not much of an issue.
Thank you!
db.getSiblingDB().collection.find() is a client-side operation. It is not something you can use to join collections as part of a query. For that, see https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/.
The second thing you are doing is retrieving nested fields out of a document. You can do this with $set and dot notation. See specifically the example at https://docs.mongodb.com/manual/reference/operator/aggregation/set/#adding-fields-to-an-embedded-document.
You will need to construct a single aggregation pipeline that does everything your current mix of aggregation and javascript does using only the operations documented in https://docs.mongodb.com/manual/reference/operator/aggregation/ and the stages documented in https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/.

Create/update objects with mongoose/mongoDB

The internet is full of resources for dealing with arrays, but often objects are a more natural fit for data and seemingly more efficient.
I want to store key-value objects under dynamic field names like this:
project['en-US'] = { 'nav-back': 'Go back', ... }
project['pt-BR'] = { 'nav-back': 'Volte', ... }
Doing this seems like it would be more efficient than keeping an array of all languages and having to filter it to get all language entries for a given language.
My question is: How can I insert a key-value pair into an object with a dynamic name using mongoose? And would the object need to exist or can I create it if it doesn't in one operation?
I tried this:
await Project.update(
{ _id: projectId },
{
$set: {
[`${language}.${key}`]: value,
},
});
But no luck regardless of if I have an empty object there to begin with or not: { ok: 0, n: 0, nModified: 0 }.
Bonus: Should I index these objects and how? (I will want to update single items)
Thanks!
In mongoose, the schema is everything. It describe the data you gonna read/store from the database. If you wanna add dynamically a new key in the schema it's gonna be hard.
In this particulary case I would recommend to use the mongodb-native-driver which is way more permissive about the data manipulation. So you could read the data in a specific format and dynamically add your field into it.
To resume my thought, how should your dynamic change happen :
Use mongodb-native-driver to insert the new key into the database data
Modify the mongoose schema you have in the code (push a new key into it)
Use mongoose to manipulate the data afterward
Do not forget to dynamically update your mongoose model or you won't read the new key at the next find.
I solved this using the original code snippet unchanged, but adding { strict: false } to the schema:
const projectSchema = new Schema({ ...schema... }, { strict: false });

CouchDB query issues

I will start off by saying while I am not new to CouchDB, I am new to querying the views using JavaScript and the web.
I have looked at multiple other questions on here, including CouchDB - Queries with params, couchDB queries, Couchdb query with AND operator, CouchDB Querying Dates, and Basic CouchDB Queries, just to list a few.
While all have good information in them, I haven't found one that has my particular problem in it.
I have a view set up like so:
function (docu) {
if(docu.status && docu.doc && docu.orgId.toString() && !docu.deleted){
switch(docu.status){
case "BASE":
emit(docu.name, docu);
break;
case "AIR":
emit(docu.eta, docu);
break;
case "CHECK":
emit(docu.checkTime, docu);
break;
}
}
}
with all documents having a status, doc, orgId, deleted, name, eta, and checkTime. (I changed doc to docu because of my custom doc key.
I am trying to query and emit based on a set of keys, status, doc, orgId, where orgId is an integer.
My jQuery to do this looks like so:
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["status","doc",orgId],
success: function(data) {
console.log(data);
},
error: function(status) {
console.log(status);
}
});
I receive
{"total_rows":59,"offset":59,"rows":[
]}
Sometimes the offset is 0, sometimes it is 59. I feel I must be doing something wrong for this not to be working correctly.
So for my questions:
I did not mention this, but I had to set docu.orgId.toString() because I guess it parses the URL as a string, is there a way to use this number as a numeric value?
How do I correctly view multiple documents based on multiple keys, i.e. if(key1 && key2) emit(doc.name, doc)
Am I doing something obviously wrong that I lack the knowledge to notice?
Thank you all.
You're so very close. To answer your questions
When you're using docu.orgId.toString() in that if-statement you're basically saying: this value must be truthy. If you didn't convert to string, any number, other than 0, would be true. Since you are converting to a string, any value other than an empty string will be true. Also, since you do not use orgId as the first argument in an emit call, at least not in the example above, you cannot query by it at all.
I'll get to this.
A little.
The thing to remember is emit creates a key-value table (that's really all a view is) that you can use to query. Let's say we have the following documents
{type:'student', dept:'psych', name:'josh'},
{type:'student', dept:'compsci', name:'anish'},
{type:'professor', dept:'compsci', name:'kender'},
{type:'professor', dept:'psych', name:'josh'},
{type:'mascot', name:'owly'}
Now let's say we know that for this one view, we want to query 1) everything but mascots, 2) we want to query by type, dept, and name, all of the available fields in this example. We would write a map function like this:
function(doc) {
if (doc.type === 'mascot') { return; } // don't do anything
// allow for queries by type
emit(doc.type, null); // the use of null is explained below
// allow queries by dept
emit(doc.dept, null);
// allow for queries by name
emit(doc.name, null);
}
Then, we would query like this:
// look for all joshs
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["josh"],
// ...
});
// look for everyone in the psych department
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["psych"],
// ...
});
// look for everyone that's a professor and everyone named josh
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["professor", "josh"],
// ...
});
Notice the last query isn't and in the sense of a logical conjunction, it's in the sense of a union. If you wanted to restrict what was returned to documents that were only professors and also joshs, there are a few options. The most basic would be to concatenate the key when you emit. Like
emit('type-' + doc.type + '_name-' + doc.name, null);
You would then query like this: key : ["type-professor_name-josh"]
It doesn't feel very proper to rely on strings like this, at least it didn't to me when I first started doing it, but it is a quite common method for querying key-value stores. The characters - and _ have no special meaning in this example, I simply use them as delimiters.
Another option would be what you mentioned in your comment, to emit an array like
emit([ doc.type, doc.name ], null);
Then you would query like
key: ["professor", "josh"]
This is perfectly fine, but generally, the use case for emitting arrays as keys, is for aggregating returned rows. For example, you could emit([year, month, day]) and if you had a simple reduce function that basically passed the records through:
function(keys, values, rereduce) {
if (rereduce) {
return [].concat.apply([], values);
} else {
return values;
}
}
You could query with the url parameter group_level set to 1 or 2 and start querying by year and month or just year on the exact same view using arrays as keys. Compared to SQL or Mongo it's mad complicated and convoluted, but hey, it's there.
The use of null in the view is really for resource saving. When you query a view, the rows contain an _id that you can use in a second ajax call to get all the documents from, for example, _all_docs.
I hope that makes sense. If you need any clarification you can use the comments and I'll try my best.

MongoDB - Query conundrum - Document refs or subdocument

I've run into a bit of an issue with some data that I'm storing in my MongoDB (Note: I'm using mongoose as an ODM). I have two schemas:
mongoose.model('Buyer',{
credit: Number,
})
and
mongoose.model('Item',{
bid: Number,
location: { type: [Number], index: '2d' }
})
Buyer/Item will have a parent/child association, with a one-to-many relationship. I know that I can set up Items to be embedded subdocs to the Buyer document or I can create two separate documents with object id references to each other.
The problem I am facing is that I need to query Items where it's bid is lower than Buyer's credit but also where location is near a certain geo coordinate.
To satisfy the first criteria, it seems I should embed Items as a subdoc so that I can compare the two numbers. But, in order to compare locations with a geoNear query, it seems it would be better to separate the documents, otherwise, I can't perform geoNear on each subdocument.
Is there any way that I can perform both tasks on this data? If so, how should I structure my data? If not, is there a way that I can perform one query and then a second query on the result from the first query?
Thanks for your help!
There is another option (besides embedding and normalizing) for storing hierarchies in mongodb, that is storing them as tree structures. In this case you would store Buyers and Items in separate documents but in the same collection. Each Item document would need a field pointing to its Buyer (parent) document, and each Buyer document's parent field would be set to null. The docs I linked to explain several implementations you could choose from.
If your items are stored in two separate collections than the best option will be write your own function and call it using mongoose.connection.db.eval('some code...');. In such case you can execute your advanced logic on the server side.
You can write something like this:
var allNearItems = db.Items.find(
{ location: {
$near: {
$geometry: {
type: "Point" ,
coordinates: [ <longitude> , <latitude> ]
},
$maxDistance: 100
}
}
});
var res = [];
allNearItems.forEach(function(item){
var buyer = db.Buyers.find({ id: item.buyerId })[0];
if (!buyer) continue;
if (item.bid < buyer.credit) {
res.push(item.id);
}
});
return res;
After evaluation (place it in mongoose.connection.db.eval("...") call) you will get the array of item id`s.
Use it with cautions. If your allNearItems array will be too large or you will query it very often you can face the performance problems. MongoDB team actually has deprecated direct js code execution but it is still available on current stable release.

Categories