Searching in embeded comments in mongodb - javascript

I'd like to make a simple "chat" where there is a post and answers for them (only 1 deep), I decided to go this way, so a single document would look like this
{
_id: ObjectId(...),
posted: date,
author: "name",
content: "content",
comments: [
{ posted: date,
author: "name2"},
content: '...'
}, ... ]
}
My question is how should I search in the content this way? I'd first need to look for a match in the "parent" content, then the contents in the comments list. How should I do that?

If you can search for a regex within each content, you could use:
{$or : [
{'content':{$regex:'your search regex'}},
{'comments' : { $elemMatch: { 'content':{$regex:'your search regex'}}}]}
Please note that when fetching for results, upon a match to either a parent or a child you will receive the entire mongo document, containing both the parent and the children.
If you want to avoid this (to be sure what you've found), you can possibly run first a regex query on the parent only, and then on the children only, instead of the single $or query.
For more details on $elemMatch take a look at: docs.mongodb.org/manual/reference/operator/query/elemMatch

As was stated in the comments earlier, the basic query to "find" is just a simple matter of using $or here, which also does short circuit to match on the first condition where that returns true. There is only one array element here so no need for $elemMatch, but just use "dot notation" since multiple field matches are not required:
db.messages.find({
"$or": [
{ "content": { "$regex": ".*Makefile.*" } },
{ "comments.content": { "$regex": ".*Makefile.*" } }
]
})
This does actually match the documents that would meet those conditions, and this is what .find() does. However what you seem to be looking for is something a little "funkier" where you want to "discern" between a "parent" result and a "child" result.
That is a little out of the scope for .find() and such manipulation is actually the domain of other operations with MongoDB. Unfortunately as you are looking for "part of a string" to match as your condition, doing a "logical" equivalent of a $regex operation does not exist in something such as the aggregation framework. It would be the best option if it did, but there is no such comparison operator for this, and a logical comparison is what you want. The same would apply to "text" based searches, as there is still a need to discern the parent from the child.
Not the most ideal approach since it does involve JavaScript processing, but the next best option here is mapReduce().
db.messages.mapReduce(
function() {
// Check parent
if ( this.content.match(re) != null )
emit(
{ "_id": this._id, "type": "P", "index": 0 },
{
"posted": this.posted,
"author": this.author,
"content": this.content
}
);
var parent = this._id;
// Check children
this.comments.forEach(function(comment,index) {
if ( comment.content.match(re) != null )
emit(
{ "_id": parent, "type": "C", "index": index },
{
"posted": comment.posted,
"author": comment.author,
"content": comment.content
}
);
});
},
function() {}, // no reduce as all are unique
{
"query": {
"$or": [
{ "content": { "$regex": ".*Makefile.*" } },
{ "comments.content": { "$regex": ".*Makefile.*" } }
]
},
"scope": { "re": /.*Makefile.*/ },
"out": { "inline": 1 }
}
)
Basically the same query to input as this does select the "documents" you want and really just using "scope" here is it makes it a little easier to pass in the regular expression as an argument without re-writing the JavaScript code to include that value each time.
The logic there is simple enough, just to each "de-normalized" element you are testing to see if the regular expression condition was a match for that particular element. The results are returned "de-normalized" and discern between whether the matched element was a parent or a child.
You could take that further and not bother to check the children if the parent was a match just by moving that to else. In the same way you could even just return the "first" child match by some means or another if that was your desire.
Anyhow this should set you on the path to whatever your final code looks like. But this is the basic approach to the only way to are going to get this distinction to be processed on the server, and client side post processing would follow much the same pattern.

Related

Deleting an object from a nested array in DynamoDB - AWS JavaScript SDK

I'm building an app where I need to delete items stored in the database. Here's a (shortened) example of user data I have in my DynamoDB table called 'registeredUsers':
{
"userId": "f3a0f858-57b4-4420-81fa-1f0acdec979d"
"aboutMe": "My name is Mary, and I just love jigsaw puzzles! My favourite jigsaw category is Architecture, but I also like ones with plants in them.",
"age": 27,
"email": "mary_smith#gmail.com",
"favourites": {
"imageLibrary": [
{
"id": "71ff8060-fcf2-4523-98e5-f48127d7d88b",
"name": "bird.jpg",
"rating": 5,
"url": "https://s3.eu-west-2.amazonaws.com/jigsaw-image-library/image-library/images/bird.jpg"
},
{
"id": "fea4fd2a-851b-411f-8dc2-1ae0e144188a",
"name": "porsche.jpg",
"rating": 3,
"url": "https://s3.eu-west-2.amazonaws.com/jigsaw-image-library/image-library/images/porsche.jpg"
},
{
"id": "328b913f-b364-47df-929d-925676156e97",
"name": "rose.jpg",
"rating": 0,
"url": "https://s3.eu-west-2.amazonaws.com/jigsaw-image-library/image-library/images/rose.jpg"
}
]
}
}
I want to be able to delete the item 'rose.jpg' in the user.favourites.imageLibrary array. In order to select the correct user, I can provide the userId as the primary key. Then, in order to select the correct image in the array, I can pass the AWS.DocumentClient the 'id' of the item in order to delete it. However, I'm having trouble understanding the AWS API Reference docs. The examples given in the developer guide do not describe how to delete an item by looking at one of it's attributes. I know I have to provide an UpdateExpression and an ExpressionAttributeValues object. When I wanted to change a user setting, I found it pretty easy to do:
const params = {
TableName: REGISTERED_USERS_TABLE,
Key: { userId },
UpdateExpression: "set userPreferences.difficulty.showGridOverlay = :d",
ExpressionAttributeValues: {
":d": !showGridOverlay
},
ReturnValues: "UPDATED_NEW"
};
To conclude, I need a suitable Key, UpdateExpression and ExpressionAttributeValues object to access the rose.jpg item in the favourites array.
Unfortunately, the UpdateExpression syntax is not as powerful as you would have liked. It supports entire nested documents inside the item, but not sophisticated expressions to search in them or to modify them. The only ability it gives you inside a list is to access or modify its Nth element. For example:
REMOVE #favorites.#imagelibrary[3]
Will remove the 3rd element of imagelibrary (note that the "#imagelibrary" will need to be defined in ExpressionAttributeNames), and you can also have a condition on #favorites.#imagelibrary[3].#id, for example, in ConditionExpression. But unfortunately, there is no way to specify more complex combinations of conditions and updates, such as "find me the i where #favorites.#imagelibrary[i].#id is equal something, and then REMOVE this specific element".
Your remaining option is to read the full value of the item (or with ProjectionExpression just the #favorties.#imagelibrary array), and then in your own code find which of the elements you want to remove (e.g., discover that it is the 3rd element), and then in a separate update, remove the 3rd element.
Note that if there's a possibility that some other parallel operation also changes the item, you must use a conditional update (both UpdateExpression and ConditionExpression) for the element removal, to ensure the element that you are removing still has the id you expected. If the condition fails, you need to repeat the whole operation again - read the modified item again, find the element again, and try to remove it again. This is an example of the so-called "optimistic locking" technique which is often used with DynamoDB.

Index on array keypath doesn't find any values

I want to get familiar with indexedDB to built my Firefox WebExtension.
My sample data is structured like this:
const sampleDataRaw = [
{
"ent_seq" : 1413190,
"att1" : [ {
"sub11" : "content1",
"sub12" : [ "word" ]
}, {
"sub11" : "content2"
} ],
"att2" : [ {
"sub21" : "other content",
"sub22" : [ "term" ]
} ]
}, {
"ent_seq" : 1000010,
"att2" : [ {
"sub21" : "more content"
}, {
"sub22" : "more words"
} ]
}
] // end sampleRawData
I got as far as opening/creating my database, adding this sample data and querying it by the ent_seq key using objectStore.get() and objectStore.openCursor().
The problem arises when I want to search the sub11 or sub21 fields using indexes I should have created for these like this:
objectStore.createIndex("sub11Elements", "att1.sub11", { unique: false });
objectStore.createIndex("sub21Elements", "att2.sub21", { unique: false });
When I want to search, say, fields sub11 as here:
var index = objectStore.index("sub11Elements");
index.get("content1").onsuccess = function(event) {
// I should have the first object of my data now, alas the result is undefined instead
};
It certainly does succeed, but the returned value is undefined since the get() didn't actually find anything.
I want to know why it doesn't find the entry and how to make it find it. I figured it might be because the keypath is wrong, but as stated, if I instead search by the key (ent_seq) I can successfully get the result.att1[i].sub11 values.
On mozilla's websites it's stated that keys can be of type string and array (or array within array etc) amongst others, and keypath parts are supposed to be concatenated using dots.
From searching on stackexchange I've so far found that it's not possible to have variable keys inside the keypath, but that shouldn't be the case here anyway.
Therefore, I really don't see what might be causing the search to not find the object inside the database.
It looks like the second level of objects are arrays, not properties of the first level of objects. The . accessor accesses sub properties, not indices of an array.
IDBObjectStore.prototype.get always yields success when there is no error, and is not indicative of whether a match was found.
A bit more on point 1. Look at "att1":[{"sub11" : "content1","sub12" : [ "word" ]}.... Pretend this was was an actual basic JavaScript object. Would you be able to use att1.sub11? No. Because the value of att1 is an array, not an object.

Find a document that contains a specific value in an array but not if it's the last element

My current approach is:
var v = 'Value';
Collection.find({arrayToLookIn: v}).forEach(function(obj) {
if (obj.arrayToLookIn.indexOf(v) !== obj.arrayToLookIn.length - 1) {
// do stuff
}
}
I was wondering if there's a way to specify such a rule in the find() call and do this without the inner check?
I've looked through https://docs.mongodb.org/manual/tutorial/query-documents/#match-an-array-element but didn't spot what I seek.
First question, please be gentle :)
What you can do now
You want $where, which can use JavaScript evaluation to match the document. So here you ask the evaluating code to test each array element, but not the last one:
Collection.find({
"arrayToLookIn": v,
"$where": function() {
var array = this.arrayToLookIn;
array.pop(); // remove last element
return array.some(function(el) { return el == 'Value' });
}
})
Note that as it is JavaScript sent to the server the "Value" needs to be specified in that code rather than using a variable. You can optionally contruct the JavaScript code as a "string" to join in that variable as a literal and submit that as the argument to $where.
Note that I'm leaving in the basic equality match, as $where cannot match using an index like that can, and therefore it's job is to "filter" out the results where the match is on the last element, and not test every single document to find whether it is even there at all.
Better Future Way
For the curious, as of the present MongoDB 3.0 release series there is not a really efficient way to do this with the aggregation framework, so the JavaScript evalution is the better option.
You would presently need to do something silly like find the last element in a $group after $unwind and then $match out the value after another $unwind. It's not efficient and prone to error where the value exists more than once.
Future releases will have a $slice operator which could be used like this with $redact:
Collection.aggregate([
// Still wise to do this as mentioned earlier
{ "$match": { "arrayToLookIn": v } },
// Then only return if the match was not in the last element
{ "$redact": {
"$cond": {
"if": {
"$setIsSubset": [
[v],
{ "$slice": [
"$arrayToLookIn",
0,
{ "$subtract": [ { "$size": "$arrayToLookIn" }, 1 ] }
]}
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
Where $setIsSubset does the comparison of the array which has had it's last entry removed via $slice by only returning elements from 0 to the $size minus 1.
And that should be more efficient than $where as it uses native coded operations for the comparison, when the next release that has that $slice for aggregation becomes available.
Not to mention $unwind also has an option to include the index position in future releases as well. But it's still not a great option even with that addition.

Elasticsearch multi_match query

I have a search text field in a web GUI for an Elasticsearch index which has two different types of fields that need to be searched on; fulltext (description) and an exact match (id).
Question 1 - How should I add the second exact match query for the id field? When I search for IDs, the exact ID is within the result "set," but it should be the only result.
The description search seems to be working correctly, just not the ID search.
"multi_match": {
"fields": ["id", "description"],
"query": query,
"description": {
"fuzziness": 1,
"operator": "and"
}
}
I think that you are looking for something like this. Try it.
{
"query": {
"bool": {
"must": [ {
"match": {
"description": {
"fuzziness": 1,
"query": "yourfuzzinessdescription"
}
}
},
{
"term" : {
"id" : 1
}
}
]
}
}
}
Dani's query structure is probably what you are looking for but perhaps you also need an alternative to the fuzziness aspect of the query. Or maybe not - can you please provide an example of an user input for the description field and what you expect that to match that up with?
Looking at Match Query documentation and Elasticsearch Common Options - fuzziness, that fuzziness is based on Levenshtein Distance. So, that query corresponds to allowing an edit distance of 1 and will allow minor misspellings and such. If you keep the and operator in the original query, then all terms in the query must get matched. Given you have a document with a description like "search server based on Lucene", you will not be able to retrieve that with a description query like "lucene based search server". Using an analyzer with the stop filter and a stemming filter in combination with a match phrase query with a slop would work? But again, it depends on what you are trying.

Find documents with array that doesn't contains a specific value

I have the following model:
var PersonSchema = new Schema({
name: String,
groups: [
{type: Schema.Types.ObjectId, ref: 'Group'}
],
});
I am looking for a query that retrieves all the Persons that are not part of a certain Group (i.e the persons' group array doesn't contain the id of the specified group).
I was thinking about something like this, but I'm not sure it is correct:
Person.find({groups: {$nin: [group._id]})
Nothing wrong with what you are basically attempting, but perhaps the only clarification here is the common misconception that you need operators like $nin or $in when querying an array.
Also you really need to do here is a basic inequality match with $ne:
Person.find({ "groups": { "$ne": group._id } })
The "array" operators are not for "array targets" but for providing a "list" of conditions to test in a convenient form.
Person.find({ "groups": { "$nin": [oneId, twoId,threeId] } })
So just use normal operators for single conditions, and save $in and $nin for where you want to test more than one condition against either a single value or a list. So it's just the other way around.
If you do need to pass a "list" of arguments where "none" of those in the provided list match the contents of the array then you reverse the logic with the $not operator and the $all operator:
Person.find({ "groups": { "$not": { "$all": [oneId,twoId,threeId] } } })
So that means that "none of the list" provided are present in the array.
This is a better way to do this in Mongoose v5.11:
Person.find({ occupation: /host/ }).where('groups').nin(['group1', 'group2']);
The code becomes clearer and has more readability.

Categories