I have event objects in MonogDB that look like this:
{
"start": 2010-09-04T16:54:11.216Z,
"title":"Short descriptive title",
"description":"Full description",
"score":56
}
And I need to get a query across three parameters:
Time window (event start is between two dates)
Score threshold (score is > x)
Full-text search of title and description
What's the right way to approach this efficiently? I think the first two are done with an aggregation but I'm not sure how text search would factor in.
Assuming your start field is of type date (which it should be) and not a string, here are the basic components that you'd want to play with. In fact, given the ISO 8601 structure of a MongoDB date a string based comparison would work just as well.
// create your text index
db.collection.ensureIndex({
description: "text",
title: "text"
})
// optionally create an index on your other fields
db.collection.ensureIndex({
start: 1,
score: 1
})
x = 50
lowerDate = ISODate("2010-09-04T16:54:11.216Z") // or just the string part for string fields
upperDate = ISODate("2010-09-04T16:54:11.216Z")
// simple find to retrieve your result set
db.collection.find({
start: {
$gte: lowerDate, // depending on your exact scenario, you'd need to use $gt
$lte: upperDate // depending on your exact scenario, you'd need to use $lt
},
score: { $gt: x }, // depending on your exact scenario, you'd need to use $gte
$text: { // here comes the text search
$search: "descriptive"
}
})
There is an important topic with respect to performance/indexing that needs to be understood, though, which is very well documented here: Combine full text with other index
This is why I initially wrote "components of what you'd want to play with". So depending on the rest of your application you may want to create different indexes.
Related
Let's say I have a Schema with a couple fields foo and bar;
I then want to retrieve all Documents using a projection. I want to retrieve all foos and bars with aliases and then "create" another field for my result based on some conditional logic of what bar is. If the condition is true, I simply want to tack on a leading '0' char to this new field, otherwise, I just want to set it to whatever barAlias is.
So, something like
const pipeline = [
{ $match: {} },
{
$project: {
fooAlias: "$foo",
barAlias: "$bar",
newField: (if some condition with barAlias) ? '0' + barAlias : barAlias
}
}
];
const docs = await Collection.aggregate(pipeline);
I know how to use $cond and $concat, but my issue here is that I'm trying to base my logic on the alias fields. Is this possible. Thanks!
I am trying to check If a field exists in a sub-document of an array and if it does, it will only provide those documents in the callback. But every time I log the callback document it gives me all values in my array instead of ones based on the query.
I am following this tutorial
And the only difference is I am using the findOne function instead of find function but it still gives me back all values. I tried using find and it does the same thing.
I am also using the same collection style as the example in the link above.
Example
In the image above you can see in the image above I have a document with a uid field and a contacts array. What I am trying to do is first select a document based on the inputted uid. Then after selecting that document then I want to display the values from the contacts array where contacts.uid field exists. So from the image above only values that would be displayed is contacts[0] and contacts[3] because contacts1 doesn't have a uid field.
Contact.contactModel.findOne({$and: [
{uid: self.uid},
{contacts: {
$elemMatch: {
uid: {
$exists: true,
$ne: undefined,
}
}
}}
]}
You problems come from a misconception about data modeling in MongoDB, not uncommon for developers coming from other DBMS. Let me illustrate this with the example of how data modeling works with an RDBMS vs MongoDB (and a lot of the other NoSQL databases as well).
With an RDBMS, you identify your entities and their properties. Next, you identify the relations, normalize the data model and bang your had against the wall for a few to get the UPPER LEFT ABOVE AND BEYOND JOIN™ that will answer the questions arising from use case A. Then, you pretty much do the same for use case B.
With MongoDB, you would turn this upside down. Looking at your use cases, you would try to find out what information you need to answer the questions arising from the use case and then model your data so that those questions can get answered in the most efficient way.
Let us stick with your example of a contacts database. A few assumptions to be made here:
Each user can have an arbitrary number of contacts.
Each contact and each user need to be uniquely identified by something other than a name, because names can change and whatnot.
Redundancy is not a bad thing.
With the first assumption, embedding contacts into a user document is out of question, since there is a document size limit. Regarding our second assumption: the uid field becomes not redundant, but simply useless, as there already is the _id field uniquely identifying the data set in question.
The use cases
Let us look at some use cases, which are simplified for the sake of the example, but it will give you the picture.
Given a user, I want to find a single contact.
Given a user, I want to find all of his contacts.
Given a user, I want to find the details of his contact "John Doe"
Given a contact, I want to edit it.
Given a contact, I want to delete it.
The data models
User
{
"_id": new ObjectId(),
"name": new String(),
"whatever": {}
}
Contact
{
"_id": new ObjectId(),
"contactOf": ObjectId(),
"name": new String(),
"phone": new String()
}
Obviously, contactOf refers to an ObjectId which must exist in the User collection.
The implementations
Given a user, I want to find a single contact.
If I have the user object, I have it's _id, and the query for a single contact becomes as easy as
db.contacts.findOne({"contactOf":self._id})
Given a user, I want to find all of his contacts.
Equally easy:
db.contacts.find({"contactOf":self._id})
Given a user, I want to find the details of his contact "John Doe"
db.contacts.find({"contactOf":self._id,"name":"John Doe"})
Now we have the contact one way or the other, including his/her/undecided/choose not to say _id, we can easily edit/delete it:
Given a contact, I want to edit it.
db.contacts.update({"_id":contact._id},{$set:{"name":"John F Doe"}})
I trust that by now you get an idea on how to delete John from the contacts of our user.
Notes
Indices
With your data model, you would have needed to add additional indices for the uid fields - which serves no purpose, as we found out. Furthermore, _id is indexed by default, so we make good use of this index. An additional index should be done on the contact collection, however:
db.contact.ensureIndex({"contactOf":1,"name":1})
Normalization
Not done here at all. The reasons for this are manifold, but the most important is that while John Doe might have only have the mobile number of "Mallory H Ousefriend", his wife Jane Doe might also have the email address "janes_naughty_boy#censored.com" - which at least Mallory surely would not want to pop up in John's contact list. So even if we had identity of a contact, you most likely would not want to reflect that.
Conclusion
With a little bit of data remodeling, we reduced the number of additional indices we need to 1, made the queries much simpler and circumvented the BSON document size limit. As for the performance, I guess we are talking of at least one order of magnitude.
In the tutorial you mentioned above, they pass 2 parameters to the method, one for filter and one for projection but you just passed one, that's the difference. You can change your query to be like this:
Contact.contactModel.findOne(
{uid: self.uid},
{contacts: {
$elemMatch: {
uid: {
$exists: true,
$ne: undefined,
}
}
}}
)
The agg framework makes filtering for existence of a field a little tricky. I believe the OP wants all docs where a field exists in an array of subdocs and then to return ONLY those subdocs where the field exists. The following should do the trick:
var inputtedUID = "0"; // doesn't matter
db.foo.aggregate(
[
// This $match finds the docs with our input UID:
{$match: {"uid": inputtedUID }}
// ... and the $addFields/$filter will strip out those entries in contacts where contacts.uid does NOT exist. We wish we could use {cond: {$zz.name: {$exists:true} }} but
// we cannot use $exists here so we need the convoluted $ifNull treatment. Note we
// overwrite the original contacts with the filtered contacts:
,{$addFields: {contacts: {$filter: {
input: "$contacts",
as: "zz",
cond: {$ne: [ {$ifNull:["$$zz.uid",null]}, null]}
}}
}}
,{$limit:1} // just get 1 like findOne()
]);
show(c);
{
"_id" : 0,
"uid" : 0,
"contacts" : [
{
"uid" : "buzz",
"n" : 1
},
{
"uid" : "dave",
"n" : 2
}
]
}
I'm learning Dynamodb and for that I installed the local server that comes with a shell at http://localhost:8000/shell
now.. I created the following table:
var serverUpTimeTableName = 'bingodrive_server_uptime';
var eventUpTimeColumn = 'up_time';
var params = {
TableName: serverUpTimeTableName,
KeySchema: [ // The type of of schema. Must start with a HASH type, with an optional second RANGE.
{ // Required HASH type attribute
AttributeName: eventUpTimeColumn,
KeyType: 'HASH',
},
],
AttributeDefinitions: [ // The names and types of all primary and index key attributes only
{
AttributeName: eventUpTimeColumn,
AttributeType: 'N', // (S | N | B) for string, number, binary
},
],
ProvisionedThroughput: { // required provisioned throughput for the table
ReadCapacityUnits: 2,
WriteCapacityUnits: 2,
}
};
dynamodb.createTable(params, callback);
so I created a table only with one hash key called up_time, that's actually the only item in the table.
Now I want to fetch the last 10 inserted up times.
so far I created the following code:
var serverUpTimeTableName = 'bingodrive_server_uptime';
var eventUpTimeColumn = 'up_time';
var params = {
TableName: serverUpTimeTableName,
KeyConditionExpression: eventUpTimeColumn + ' != :value',
ExpressionAttributeValues: {
':value':0
},
Limit: 10,
ScanIndexForward: false
}
docClient.query(params, function(err, data) {
if (err) ppJson(err); // an error occurred
else ppJson(data); // successful response
});
ok.. so few things to notice:
I don't really need a KeyCondition. i just want the last 10 items, so I used Limit 10 for the limit and ScanIndexForward:false for reverse order.
!= or NE are not supported in key expressions for hash keys. and it seems that I must use some kind of index in the query.. confused about that.
so.. any information regarding the issue would be greatly appreciated.
Some modern terminology: Hash is now called Partition, Range is now called Sort.
Thank you Amazon.
You need to understand that Query-ing is an action on hash-keys. In order to initiate a query you must supply a hash-key. Since your table's primary key is only hash key (and not hash+range) you can't query it. You can only Scan it in order to find items. Scan doesn't require any knowledge about items in the table.
Moving on.. when you say "last 10 items" you actually do want a condition because you are filtering on the date attribute, you haven't defined any index so you can't have the engine provide you 10 results. If it were a range key element, you could get the Top-10 ordered elements by querying with a backwards index (ScanIndexForward:false) - again, not your schema.
In your current table - what exactly are you trying to do? You currently only have one attribute which is also the hash key so 10 items would look like (No order, no duplicates):
12312
53453
34234
123
534534
3101
11
You could move those to range key and have a global hash-key "stub" just to initiate the query you're making but that breaks the guidelines of DynamoDB as you have a hot partition and it won't have the best performance. Not sure this bothers you at the moment, but it is worth mentioning.
I'm using YDN-DB (an abstraction on top of IndexedDB) as a local database. I have an object store called 'conversations', and in that store, there's an index called 'participants' where there is a string containing id's for different users in the conversation. For example:
Example Conversation #1:
id: 1234343434353456,
participants: '171e66ca-207f-4ba9-8197-d1dac32499db,82be80e2-2831-4f7d-a8d7-9223a2d4d511'
Example Conversation #2:
id: 4321343434356543,
participants: 'd7fa26b3-4ecc-4f84-9271-e15843fcc83f,171e66ca-207f-4ba9-8197-d1dac32499db'
To try to perform a partial match on an index, I tried using ydn-db-fulltext as a solution. The full text catalog looks like this:
{
name: 'participants',
lang: 'en',
sources: [
{
storeName: 'conversations',
keyPath: 'participants',
weight: 1
}
]
}
I see that the catalog is generated, but there seems to be a problem doing exact matches. For example, if I query using only part of the key in the participants index, I get back a primary key from the catalog:
db.search('participants', 'd7fa26b3').done(function(results) {
if(results.length == 0) console.debug('No results found...');
console.debug(results); // there is 1 object here!
var primaryKey = results[0].primaryKey; // primaryKey exists!
});
However, when using any value past the '-', the search request returns 0 results:
db.search('participants', 'd7fa26b3-4ecc-4f84-9271-e15843fcc83f').done(function(results) {
if(results.length == 0) console.debug('No results found...');
console.debug(results); // there are 0 objects in the array
var primaryKey = results[0].primaryKey; // primaryKey throws undefined since there are 0 results!
});
This makes sense, when reading the documentation, in that '-' and '*' are reserved characters that remove a phrase and match a prefix respectively:
Query format is free text, in which implicit and/or/near logic operator apply for each token. Use double quote for exact match, - to subtract from the result and * for prefix search.
I tried putting double quotes inside the single quotes, using only double quotes, and also escaping all of the '-' characters with a backslash, but none of these seem to work.
So the question is how does one perform a match in an index where the string contains '-' characters?
Have you try db.search('participants', '"d7fa26b3"').
BTW, you are using full text search that is not suppose to do. You have to tokenize your string and index them manually.
If you store the participants field of your object as an array, then you can use the multi-entry flag to the createIndex method called on the participants field, and probably do what you want.
The number of items in the participants property of the object is mutable. When you update an object in the store and it has a different number of items in the partic property, then the index is automatically updated as a result (just like any other index). If you add an item to the prop, then restore (put/override/cursor.update) the object in the store, the index updates.
It helps to review the basics of how a multi-entry index works. You can do this with vanilla js, without a framework, and certainly without full-text searching.
I want to let mongo hold an incrementing number for me such that I can call that number and then generate a string from it.
x = 1e10
x.toString(36).substring(2,7)
>>'dqpds'
I have a way to increment the number every time I call it from mongo
db.counter.update({ _id: 1 }, { $inc: { seq: 1 } }, {upsert: true},
function(err, val){
//...
})
But I want to set the number to something like 1e10 at the beginning such that I get a 5 character long string, But I would rather not have something more than one call to the database.
How to I set a default value for the upsert in mongo. Or do you have a more efficient way of generating a unique 5 - 6 character string?
If you only need a unique id which is not necessarily sequential, you can use the first part of ObjectId.
From the above document there is a description:
ObjectId is a 12-byte BSON type, constructed using:
a 4-byte timestamp,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
So you can do like this:
x = ObjectId().toString().subString(0,4)
This approach doesn't involve database IO, so the performance would be better. If you want to be more sure about its uniqueness, add the last 2 bytes of the counter to make a 6 character one.
There is a way to do this in MongoDB.
You use the findAndModify command and it's described in detail in exactly the context you are looking for here:
http://docs.mongodb.org/manual/tutorial/create-an-auto-incrementing-field/