Question is about best way to handle data.
Let's assume we have such key -> value data:
"user#gmail.com": { "name": "John",
"age": 20,
"job": "developer",
"favourite_food": ['taco', 'steak']
//...etc
}
//...etc
There is a lot of data for users with key "email", like a million.
And usually I had to search users by their email.
But today my boss came up to me and said he want to search users by their names and of course keep possibility to search by email. On the other day he said he want my program to realize search by age and so on.
My first thought was to iterate over data with, for example, this php code:
foreach($email as $data){
foreach($data as $k => $v){
if($v == 'search value'){
return $email;
}
}
}
But this solution is not good for big amount of data.
My second idea was to iterate over first data and create for each email own table to make it look like this:
$a = "user#gmail.com": {//all data}
$b = "John" : {//all data including email}
$c = "developer":{//all other data}
// and so on
But my users getting older with time, so I have to update user age every time the data in my main object changes.
So, my question is, what is the best way to implement such logic using any programming language?
Some notes:
It had to be done by using programming language without touching MySQL or any other DB.
I think using the year of birth of users instead of age might be better in this situation.
You can use index if you are using database.
If not, I think you can create index by yourself.
A simple index strategy is:
Do not change the original data, but add index dicts where the keys are index and values are email.
Like in python you can add two indices, name and yearofbirth:
name = {"John": ["xx#xx.com", "cc#cc.com", "aa#c.com"],
"Mike": ["aa#aa.com", ...],
#...etc}
yearofbirth = {"1981":["xx#xx.com", "cc#cc.com"],
#...etc}
In this way, you can search by name or yearofbirth to get the email and then fetch the original data.
And it is fast.
Related
I am trying to check If a field exists in a sub-document of an array and if it does, it will only provide those documents in the callback. But every time I log the callback document it gives me all values in my array instead of ones based on the query.
I am following this tutorial
And the only difference is I am using the findOne function instead of find function but it still gives me back all values. I tried using find and it does the same thing.
I am also using the same collection style as the example in the link above.
Example
In the image above you can see in the image above I have a document with a uid field and a contacts array. What I am trying to do is first select a document based on the inputted uid. Then after selecting that document then I want to display the values from the contacts array where contacts.uid field exists. So from the image above only values that would be displayed is contacts[0] and contacts[3] because contacts1 doesn't have a uid field.
Contact.contactModel.findOne({$and: [
{uid: self.uid},
{contacts: {
$elemMatch: {
uid: {
$exists: true,
$ne: undefined,
}
}
}}
]}
You problems come from a misconception about data modeling in MongoDB, not uncommon for developers coming from other DBMS. Let me illustrate this with the example of how data modeling works with an RDBMS vs MongoDB (and a lot of the other NoSQL databases as well).
With an RDBMS, you identify your entities and their properties. Next, you identify the relations, normalize the data model and bang your had against the wall for a few to get the UPPER LEFT ABOVE AND BEYOND JOIN™ that will answer the questions arising from use case A. Then, you pretty much do the same for use case B.
With MongoDB, you would turn this upside down. Looking at your use cases, you would try to find out what information you need to answer the questions arising from the use case and then model your data so that those questions can get answered in the most efficient way.
Let us stick with your example of a contacts database. A few assumptions to be made here:
Each user can have an arbitrary number of contacts.
Each contact and each user need to be uniquely identified by something other than a name, because names can change and whatnot.
Redundancy is not a bad thing.
With the first assumption, embedding contacts into a user document is out of question, since there is a document size limit. Regarding our second assumption: the uid field becomes not redundant, but simply useless, as there already is the _id field uniquely identifying the data set in question.
The use cases
Let us look at some use cases, which are simplified for the sake of the example, but it will give you the picture.
Given a user, I want to find a single contact.
Given a user, I want to find all of his contacts.
Given a user, I want to find the details of his contact "John Doe"
Given a contact, I want to edit it.
Given a contact, I want to delete it.
The data models
User
{
"_id": new ObjectId(),
"name": new String(),
"whatever": {}
}
Contact
{
"_id": new ObjectId(),
"contactOf": ObjectId(),
"name": new String(),
"phone": new String()
}
Obviously, contactOf refers to an ObjectId which must exist in the User collection.
The implementations
Given a user, I want to find a single contact.
If I have the user object, I have it's _id, and the query for a single contact becomes as easy as
db.contacts.findOne({"contactOf":self._id})
Given a user, I want to find all of his contacts.
Equally easy:
db.contacts.find({"contactOf":self._id})
Given a user, I want to find the details of his contact "John Doe"
db.contacts.find({"contactOf":self._id,"name":"John Doe"})
Now we have the contact one way or the other, including his/her/undecided/choose not to say _id, we can easily edit/delete it:
Given a contact, I want to edit it.
db.contacts.update({"_id":contact._id},{$set:{"name":"John F Doe"}})
I trust that by now you get an idea on how to delete John from the contacts of our user.
Notes
Indices
With your data model, you would have needed to add additional indices for the uid fields - which serves no purpose, as we found out. Furthermore, _id is indexed by default, so we make good use of this index. An additional index should be done on the contact collection, however:
db.contact.ensureIndex({"contactOf":1,"name":1})
Normalization
Not done here at all. The reasons for this are manifold, but the most important is that while John Doe might have only have the mobile number of "Mallory H Ousefriend", his wife Jane Doe might also have the email address "janes_naughty_boy#censored.com" - which at least Mallory surely would not want to pop up in John's contact list. So even if we had identity of a contact, you most likely would not want to reflect that.
Conclusion
With a little bit of data remodeling, we reduced the number of additional indices we need to 1, made the queries much simpler and circumvented the BSON document size limit. As for the performance, I guess we are talking of at least one order of magnitude.
In the tutorial you mentioned above, they pass 2 parameters to the method, one for filter and one for projection but you just passed one, that's the difference. You can change your query to be like this:
Contact.contactModel.findOne(
{uid: self.uid},
{contacts: {
$elemMatch: {
uid: {
$exists: true,
$ne: undefined,
}
}
}}
)
The agg framework makes filtering for existence of a field a little tricky. I believe the OP wants all docs where a field exists in an array of subdocs and then to return ONLY those subdocs where the field exists. The following should do the trick:
var inputtedUID = "0"; // doesn't matter
db.foo.aggregate(
[
// This $match finds the docs with our input UID:
{$match: {"uid": inputtedUID }}
// ... and the $addFields/$filter will strip out those entries in contacts where contacts.uid does NOT exist. We wish we could use {cond: {$zz.name: {$exists:true} }} but
// we cannot use $exists here so we need the convoluted $ifNull treatment. Note we
// overwrite the original contacts with the filtered contacts:
,{$addFields: {contacts: {$filter: {
input: "$contacts",
as: "zz",
cond: {$ne: [ {$ifNull:["$$zz.uid",null]}, null]}
}}
}}
,{$limit:1} // just get 1 like findOne()
]);
show(c);
{
"_id" : 0,
"uid" : 0,
"contacts" : [
{
"uid" : "buzz",
"n" : 1
},
{
"uid" : "dave",
"n" : 2
}
]
}
My JSON structure which I'm getting is similar to this question:
var data = {
type: "binary",
choices: [
{
choice: "No",
answers: 18
},
{
choice: "Yes",
answers: 11
}
],
tags: {
2851: "road",
8685: "had",
10978: "heard"
}
};
And I found the insertion of json record using MySQL query with Node.js in answer of same question.
Yes, I can update this record by getting whole record, Update record and replace old record with updated one.
BUT Here my question is "Can i update these json record in database without downloading of full json and update these record based on one key from JSON?"
Note: this is just demo purpose: if my fied name is jsonData
"Update tableName set jsonData.choices.answers = 20 where data.choices.choice = 18 and id = 1";
Hope you understand my scenario to update JSON record one value based on one key.
Any conceptual suggestion will appreciated and if there is demonstration code than it will be very helpful.
Json decode the data which will convert it to an array, access array elements and append to query :
$a = json_decode($data);
"Update tableName set".$a->col1." =20......."
PHP's json_decode function takes a JSON string and converts it into a
PHP variable. Typically, the JSON data will represent a JavaScript
array or object literal which json_decode will convert into a PHP
array or object.
http://www.dyn-web.com/tutorials/php-js/json/decode.php
I know I'm necromancing a long dead question, but sure, this is very simple with MySQL's JSON functions.
UPDATE tableName
SET jsonData = JSON_SET(jsonData, '$.choices.answers', 20)
WHERE data->'$.choices.choice' = 18
AND id = 1;
Read my Best Practices for using MySQL as JSON storage.
I've run into a bit of an issue with some data that I'm storing in my MongoDB (Note: I'm using mongoose as an ODM). I have two schemas:
mongoose.model('Buyer',{
credit: Number,
})
and
mongoose.model('Item',{
bid: Number,
location: { type: [Number], index: '2d' }
})
Buyer/Item will have a parent/child association, with a one-to-many relationship. I know that I can set up Items to be embedded subdocs to the Buyer document or I can create two separate documents with object id references to each other.
The problem I am facing is that I need to query Items where it's bid is lower than Buyer's credit but also where location is near a certain geo coordinate.
To satisfy the first criteria, it seems I should embed Items as a subdoc so that I can compare the two numbers. But, in order to compare locations with a geoNear query, it seems it would be better to separate the documents, otherwise, I can't perform geoNear on each subdocument.
Is there any way that I can perform both tasks on this data? If so, how should I structure my data? If not, is there a way that I can perform one query and then a second query on the result from the first query?
Thanks for your help!
There is another option (besides embedding and normalizing) for storing hierarchies in mongodb, that is storing them as tree structures. In this case you would store Buyers and Items in separate documents but in the same collection. Each Item document would need a field pointing to its Buyer (parent) document, and each Buyer document's parent field would be set to null. The docs I linked to explain several implementations you could choose from.
If your items are stored in two separate collections than the best option will be write your own function and call it using mongoose.connection.db.eval('some code...');. In such case you can execute your advanced logic on the server side.
You can write something like this:
var allNearItems = db.Items.find(
{ location: {
$near: {
$geometry: {
type: "Point" ,
coordinates: [ <longitude> , <latitude> ]
},
$maxDistance: 100
}
}
});
var res = [];
allNearItems.forEach(function(item){
var buyer = db.Buyers.find({ id: item.buyerId })[0];
if (!buyer) continue;
if (item.bid < buyer.credit) {
res.push(item.id);
}
});
return res;
After evaluation (place it in mongoose.connection.db.eval("...") call) you will get the array of item id`s.
Use it with cautions. If your allNearItems array will be too large or you will query it very often you can face the performance problems. MongoDB team actually has deprecated direct js code execution but it is still available on current stable release.
Let me start by explaining the situation:
I have a MySql table that contains several columns, of which a user id, a race id, a lap time, a lap number and I want to put this information into an array in PHP which I will then send to a java script.
My JavaScript array should end up looking like this :
first row:
[laptime1 user1, laptime2 user1, laptime3 user1,...]
second row:
[laptime1 user2, laptime2 user2, laptime3 user2,...]
Here's my current situation:
I first tried to test this situation for a single user and ran into lots of problems because my lap times in MySql are floats and when using json_encode it turned everything into strings, which did not work for my javascript as it started outputting the wrong values.
For example:
The first value was "8" instead of "84,521", then the second value was "4", etc..)...
Sadly, I found a potential solution with the numeric check option, but cannot use it as my hosting runs a PHP version that doesn't support it.
So I found the following solution, which I fiddled with a bit and that works for a single user (it might look messy to you, I'm really a beginner and punching above my weight, but it works) :
$query = doquery("SELECT racelaptime,userid FROM {{table}} WHERE raceid='1' ORDER BY racelap", "liverace");
while(($result = mysql_fetch_array($query))) {
$data[] = (float)$result['racelaptime'];
}
$script = $script . "\nvar myArray = new Array(";
foreach ($data as $key => $value){
if ($key < (count($data)-1)){
$script = $script . $value . ',';
}
else {
$script = $script . $value . ");\n";
}
}
This outputs an array in JavaScript that looks like this :
myArray=[84.521,83.800,81.900]
Which is great, as this is exactly what my java script requires as input (time in seconds, separated by commas for each lap).
Now I would like to implement the multiple user element but I'm stumped as to how I can work that out...
My MySQL query is still sorted by race lap but I also kind of need to sort the data by user id as I want all the laps of each user sorted in 1 row, Also, the user id is unknown to me and can vary (depends which user posts the time) so I can't really do a "if userid==1 save this here and then go to next one".
Should I use a foreach statement in the while loop that stores the data, but how can I tell him to store all the laps by the same user in the first row (and the next user in the second row, etc...) without using tons of SQL queries ?
If you can offer a more elegant solution than my current one for passing the PHP array to JavaScript, I would be more than happy to make changes but otherwise a simple solution using the current "setup" would be great too (hope it's all clear enough).
Any help would be very much appreciated, thanks in advance !
For multiple user element I would use a multidimensional array >
$query = doquery("SELECT racelaptime,userid FROM {{table}} WHERE raceid='1' ORDER BY racelap", "liverace");
// Loop the DB result
while(($result = mysql_fetch_array($query))) {
// Check if this ID is already in the data array
if(!array_key_exists($result['userid'], $data)){
// Create array for current user
$data[$result['userid']] = array();
}
// Add the current race time to the array (do not need to use the float)
$data[$result['userid']][] = $result['racelaptime'];
}
// Echo json data
echo json_encode($data);
Now what you need to do on the Javascript side when handling this array is to go through each of the user
$.each(data, function(key, value){
// Use parseFloat to keep the decimal value
alert(key + ' Has the following values - ' + value);
// If you want to display the racing values you simply
$.each(value, function(k, parseFloat(v)){
alert(v);
})
})
Is this what you needed or am I completely out of the scope?
I'm trying to make a MongoDB database for a simple voting system. If I were to draw a schema, then it looks something like this:
User {
name: String
, email: String
}
Vote {
message: String
, voters: [ObjectId(User._id)]
}
I have some questions about this design when there are a lot of voters for one vote:
Sending the whole voters array to the client's side (not to mention caching it in memory) is very expensive, right? Is there a way to get the Vote in a "shallow" way, so when I need vote.voters, it will make another database request to the array of voters?
If a voter has voted already, I want to check that and not count his vote. To do that, is there a query I can run in the array of embedded voters to quickly find this?
When showing votes, I'd want to show the number of votes without fetching the voters array to the client side. Is there some kind of count query I can run to count the voters length?
I would add a bit of redundancy to the schema to avoid some of the potential problems you mention. I assume you want to 1) quickly count the number of votes and 2) make sure a user cannot vote twice.
One way to achieve this is to keep both a list of users and a count of votes, and add a clause to the update query that makes sure that a vote is only added if the user's ID is not in the list of voters:
var query = {_id: xyz, voters: {$ne: user_id}
var update = {$inc: {votes: 1}, $push: {voters: user_id}}
db.votes.update(query, update, true)
(the last parameter is upsert, a very nice feature of Mongo)
Then, if you want to show the number of votes you can limit the properties of the result to the votes property:
db.votes.find({_id: xyz}, {votes: true})
You can find a complete description of more or less exactly what you want to do here: http://cookbook.mongodb.org/patterns/votes/
1) You can specify only to return a subset of fields to the client (docs).
e.g. to find a specific message and only return "AnotherField".
db.Vote.find({ message : "search for this message" }, { AnotherField : 1 } );
2) You could use the $addToSet operator to add a voter into the voters array (docs) which (quote):
Adds value to the array only if its
not in the array already
e.g.
{ $addToSet : { voters: "Bob"} }
3) You could store the count as an extra field in the document and then just return that.
Hope this helps.