Let's say I've got CouchDB documents in the following format:
...
{
player: 'abcde',
action: 'run'
},
{
player: 'abcde',
action: 'jump'
},
{
player: 'abcde',
action: 'left'
},
{
player: 'abcde',
action: 'right'
},
....
My view looks like this:
function(doc) {
emit(doc.player, doc.action)
}
How can I count how many times player abcde has an action jump immediately followed by action run? I don't want the total number of jumps and runs. It is even possible to access previous or next documents from the current one inside a map or reduce function?
Thank you!
No, you can't access other documents in a map/reduce like that. For starters I don't think reduce order is explicitly defined, so 'previous' and 'next' aren't even really meaningful I'm afraid.
Instead I'd suggest you collapse the whole history of each players actions into a single document like:
{
"player": "abcde",
"actions" : [ "right", "run", "jump" ]
}
You can then count specific sets of ordered actions from the array in your map method alone, and trivially aggregate them as desired in your reduce.
Related
I"m loading products via an infinite scroll in chunks of 12 at a time.
At times, I may want to sort these by how many followers they have.
Below is how i'm tracking how many followers each product has.
Follows are in a separate collection, because of the 16mb data cap, and the amount of follows should be unlimited.
follow schema:
var FollowSchema = new mongoose.Schema({
user: {
type: mongoose.Schema.ObjectId,
ref: 'User'
},
product: {
type: mongoose.Schema.ObjectId,
ref: 'Product'
},
timestamp: {
type: Date,
default: Date.now
}
});
Product that is followed schema:
var ProductSchema = new mongoose.Schema({
name: {
type: String,
unique: true,
required: true
},
followers: {
type: Number,
default: 0
}
});
Whenever a user follows / unfollows a product, I run this function:
ProductSchema.statics.updateFollowers = function (productId, val) {
return Product
.findOneAndUpdateAsync({
_id: productId
}, {
$inc: {
'followers': val
}
}, {
upsert: true,
'new': true
})
.then(function (updatedProduct) {
return updatedProduct;
})
.catch(function (err) {
console.log('Product follower update err : ', err);
})
};
My questions about this:
1: Is there a chance that the incremented "follower" value within product could hit some sort of error, resulting in un matching / inconsistent data?
2: would it be better to write an aggregate to count followers for each Product, or would that be too expensive / slow?
Eventually, I'll probably rewrite this in a graphDB, as it seems better suited, but for now -- this is an exercise in mastering MongoDB.
1 If you increment after inserting or decrement after removing, these is a chance resulting in inconsistent data. For example, insertion succeed but incrementing failed.
2 Intuitively, aggregation is much more expensive than find in this case. I did a benchmark to prove it.
First generate 1000 users, 1000 products and 10000 followers randomly. Then, use this code to benchmark.
import timeit
from pymongo import MongoClient
db = MongoClient('mongodb://127.0.0.1/test', tz_aware=True).get_default_database()
def foo():
result = list(db.products.find().sort('followers', -1).limit(12).skip(12))
def bar():
result = list(db.follows.aggregate([
{'$group': {'_id': '$product', 'followers': {'$sum': 1}}},
{'$sort': {'followers': -1}},
{'$skip': 12},
{'$limit': 12}
]))
if __name__ == '__main__':
t = timeit.timeit('foo()', 'from __main__ import foo', number=100)
print('time: %f' % t)
t = timeit.timeit('bar()', 'from __main__ import bar', number=100)
print('time: %f' % t)
output:
time: 1.230138
time: 3.620147
Creating index can speed up find query.
db.products.createIndex({followers: 1})
time: 0.174761
time: 3.604628
And If you need attributes from product such as name, you need another O(n) query.
I guess that when data scale up, aggregation will be much more slow. If need, I can benchmark on big scale data.
For number 1, if the only operations on that field are incrementing and decrementing, I think you'd be okay. If you start replicating that data or using it in joins for some reason, you'd run the risk of inconsistent data.
For number 2, I'd recommend you run both scenarios in the mongo shell to test them out. You can also review the individual explain plans for both queries to get an idea of which one would perform better. I'm just guessing, but it seems like the update route would perform well.
Also, the amount of expected data makes a difference. It might intially perform well one way, but after a million records the other route might be the way to go. If you have a test environment, that'd be a good thing to check.
1) This relies on the application layer to enforce consistency, and as such there is going to be a chance that you end up with inconsistencies. The questions I would ask are: how important is consistency in this case, and how likely is it that there will be a large inconsistency? My thought is that being off by one follower isn't as important as making your infinite scroll load as fast as possible to improve the user's experience.
2) Probably worth looking at the performance, but if I had to guess I would say this approach is going to be to slow.
I am currently creating an app where the administrator should be able to tag images. The visitors can search for a tag, and see the images that have that tag.
One image can have more than one tag. This represents how I currently have set up my data:
Images: {
PUSH_ID: {
url: "https://cdn.filestackcontent.com/o0ze07FlQjabT9nuteaE",
tags: {
PUSH_ID: {
tag: "1324"
},
PUSH_ID: {
tag: "4321"
}
}
}
}
When a visitor searches for a tag, I need to be able to query the tag, and find the URL of the images that have the given tag. I was thinking that something along the lines of this would work:
ref.orderByChild('tags/tag').equalTo(tag).once("value"...)
But after some reading I have come to the understanding that you can only query one level deep in Firebase.
If this is the case, I need to restructure my data, but I cannot figure out how it should be structured.
Can anyone help me?
Btw; I have been told that I should use something like Elasticsearch to query in Firebase, but this app is for an event with a limited ammount of traffic.
Thanks!
When modeling data in Firebase (and in most NoSQL databases), you need to model the data for how your app wants to use it.
For example:
Images: {
PUSH_ID_1: {
url: "https://cdn.filestackcontent.com/o0ze07FlQjabT9nuteaE",
tags: {
"tag1324": true,
"tag4321": true
}
},
PUSH_ID_2: {
url: "https://cdn.filestackcontent.com/shjkds7e1ydhiu",
tags: {
"tag1324": true,
"tag5678": true
}
}
},
TagsToImages: {
"tag1324": {
PUSH_ID_1: true,
PUSH_ID_2: true
},
"tag4321": {
PUSH_ID_1: true
}
"tag5678": {
PUSH_ID_2: true
}
}
I changed a few things from you model:
I still store the tags for each image, so that you can show them when a user is watching a single image
But now we store the tags in a "tag1234": true format, which prevents a image from being tagged with the same tag multiple times. Whenever you feel the need for an array that you want to do a contains() operation on, consider using this approach which is more akin to a set.
I prefix the tag numbers with a static string, which prevents Firebase from trying to interpret the tag numbers as array indices.
We now also store a map of tags-to-image-ids. In this map you can easily look up all image IDs for a specific tag and then load the images in a loop.
We've essentially duplicated some data (the tags are stored twice) to ensure that we can look the information up in two ways. If you have more ways you want to access the data, you may need to replicate even more. This is normal in NoSQL databases and is part of the reason they scale so well.
I highly recommend reading this article on NoSQL data modeling.
And there is a Tags node in this structure like this, isnt it??
Tags: {
"tag1324": {
tagName: pets
},
"tag4321": {
tagName: daffodil
}
"tag5678": {
tagName: tasmanian wolf
}
I'm using $pull to pull a subdocument within an array of a document.
Don't know if it matters but in my case the subdocuments contain _id so they are indexed.
Here are JSONs that describes the schemas:
user: {
_id: String,
items: [UserItem]
}
UserItem: {
_id: String,
score: Number
}
Now my problem is this: I am using $pull to remove certain UserItem's from a User.
var delta = {$pull:{}};
delta.$pull.items={};
delta.$pull.items._id = {$in: ["mongoID1", "mongoID2" ...]};
User.findOneAndUpdate(query, delta, {multi: true}, function(err, data){
//whatever...
});
What i get in data here is the User object after the change, when what i wish to get is the items that were removed from the array (satellite data).
Can this be done with one call to the mongo or do I have to do 2 calls: 1 find and 1 $pull?
Thanks for the help.
You really cannot do this, or at least there is nothing that is going to return the "actual" elements that were "pulled" from the array in any response, even with the newer WriteResponse objects available to the newer Bulk Operations API ( which is kind of the way forward ).
The only way you can really do this is by "knowing" the elements you are "intending" to "pull", and then comparing that to the "original" state of the document before it was modified. The basic MongoDB .findAndModify() method allows this, as do the mongoose wrappers of .findByIdAndUpdate() as well and .findOneAndUpdate().
Basic usage premise:
var removing = [ "MongoId1", "MongoId2" ];
Model.findOneAndUpdate(
query,
{ "$pull": { "items._id": { "$in": removing } } },
{ "new": false },
function(err,doc) {
var removed = doc.items.filter(function(item) {
return removing.indexOf(item) != -1;
});
if ( removed.length > 0 )
console.log(removed);
}
);
Or something along those lines. You basically "turn around" the default mongoose .findOneAndUpdate() ( same for the other methods ) behavior and ask for the "original" document before it was modified. If the elements you asked to "pull" were present in the array then you report them, or inspect / return true or whatever.
So the mongoose methods differ from the reference implementation by returning the "new" document by default. Turn this off, and then you can "compare".
Further notes: "multi" is not a valid option here. The method modifies "one" document by definition. Also you make a statement that the array sub-documents conatain an _id. This is done by "mongoose" by default. But those _id values in the array are "not indexed" unless you specifically define an index on that field. The only default index is the "primary document" _id.
I have a list of portfolio items that the user can click on, and when the user clicks on one, I want to query for the children of that item. I have the initiatives separate from the the rollups which are separate from the features, and when it is clicked, I am getting the correct data.
However, When I try to query for its children, I run into problems. Take the following example.
If a rollup was clicked on, I have tried these queries:
Rally.data.ModelFactory.getModel({
type: 'PortfolioItem/Feature',
success: function(model) {
Ext.create('Rally.data.WsapiDataStore', {
model : model,
limit : Infinity,
fetch : true,
// filters : [{
// property : 'Parent',
// operator : '=',
// value : rollup
// }]
}).load({
callback : function(store) {
console.log('got features');
console.log('first feature',store.getItems()[0]);
}
});
}
});
When I run the query with the filters commented out as shown, then I get all of the features. However, when I add in the filters, I get nothing back! I have tried setting the variable Rollup to the rollup's ObjectID/name, and still nothing. when I console log one of the features, I can see:
Parent: Object
_rallyAPIMajor: "2"
_rallyAPIMinor: "0"
_ref: "/portfolioitem/rollup/xxxxxxxx"
_refObjectName: "xxxxxxxxxxxxxxxxxxxxxxxx"
_type: "PortfolioItem/Rollup"
and that is it. Furthermore, I know there are portfolio items that meet the requirements I am trying to express in the filters. How can I filter out by parent when querying for portfolio items?
Well, I thought I had tried all possible combinations, but it appears that, even though it does not appear as a field when printed, the correct filter is:
filters : [{
property : 'Parent.ObjectID',
operator : '=',
value : rollup
}]
I am using localStorage to save conversations client-side to save space server-side in my db. In order to do it I use an object like this:
users = {
478vh9k52k: {
name: 'john',
messages: []
},
42r66s58rs: {
name: 'jack',
messages: []
}
};
I then use users[id].messages.push(msgObj) to push new messages inside the right user ID. Lastly I use JSON.stringify and then save the resulting string.
Problem with this is that slowly the string will grow, thus filling in the limits. It is not too much of a problem the length of the messages array, because I truncate it, but the existence of old users not being necessary anymore.
Question is simple: how can I delete the old users that are contained in the 'users' object? I was thinking to add a timestamp as a key inside the object and then accessing random users to check on them at every save.
Or is there a better way to do this?
why access them randomly? you could slice up your storage in days instead, with a
localStorage["chatLogs"] = {
"13........0" : {
bob: {},
alice: {},
christoffelson: {}
}
"13....86400" : {
bob: {},
alice: {},
christoffelson: {}
}
}
and then run through your object by its keys. If they're older than a day, delete them. Alternatively, you can have a look at using indexdb instead of localStorage if you're going to do a lot of querying/filtering.