I'm relatively new to NoSQL, but I have been enjoying the journey very much! I am however finding the map-reduce way of life a bit tricky! I need some help with a problem!
I have a database with two types of documents, opening transactions and closing transactions. For replication and offline functionality reasons I cannot merge the data into one document. The opening transaction document looks something like :
{
_id: "transaction-open-randomgeneratedstring",
type: "transactions-open",
vehicle: "vehicle-id",
created: "date string"
}
The closing documents looks something like:
{
_id: "transaction-close-randomgeneratedstring",
type: "transactions-close",
openid: "transaction-open-randomgeneratedstring",
created: "date string"
}
The randomgeneratedstring of a closing transactions match the randomgeneratedstring of the corresponding opening transaction.
I need a map-reduce to give me the list of open transactions that does not have a corresponding closing transaction. This will basically give me a list of outstanding transactions.
This is the map-reduce I have thus far, but it is not doing the job.
{
"map": function(doc) {
if(doc.type == "transactions-open") {
emit([doc._id, 0], "OPEN");
}
if(doc.type == "transactions-close"){
emit([doc.openid, 1], "CLOSE");
}
},
"reduce": function(keys, values, rereduce) {
var unique_labels = {};
var open = {};
keys.forEach(function(label) {
if(!unique_labels[label[0]]) {
unique_labels[label[0]] = true;
} else {
open[label[0]] = true;
}
});
return open;
}
}
I am open for changes in the _id naming / structure, but I cannot combine the two documents into one.
Thanks!
EDIT
Based on response from Hod, I changed the reduce to look like:
function(keys, values, rereducer)
{
if(values.length == 1)
return true;
}
This is certainly a step in the right direction, but the unwanted transactions are still in the result set, the value is only null. Is there no way to get those out of the result set?
As described - what you would do with a Join in SQL you do with a reduce in CouchDB. Code something like this - not tested:
{
"map": function(doc) {
if(doc.type == "transactions-open") {
emit([doc._id], 1);
}
if(doc.type == "transactions-close"){
emit([doc.openid], -1);
}
},
"reduce": "_sum";
}
So we emit a 1 for an open transaction under an ID and a -1 for a close under the same ID. Now when you reduce you will get a result for each ID of:
-1 = Closed with no record of an open (error condition).
0 = Opened and Closed
1 = Open and not yet closed.
The problem is with the keys parameter in your reduce function. The reduce phase is not called once with all possible keys. It's called per distinct key, and based on the group_level you specify.
Looking at your code, if you haven't specified any group_level, your reduce function is going to get called for every document separately.
Because you're emitting the id of the open transaction doc for both open and close markers, if you grouped at the first level, you'd get open or open/close pairs. You're still only getting a reduction on a limited set of docs at a time.
You could fix this either in your logic calling the query, or by emitting a key that let's you reduce on the entire set at once. (I imagine there are other ways too. These are the ones that come to mind.)
If you use the key approach, you'd need to emit something that looked like ["transaction", doc._id, 0]. Then a first level grouping would give you the whole transaction set like you're current code expects.
EDIT (Adding information based on edit of question.)
The reduce function is going to get called with whatever grouping you set up. It's always going to return something, even if it's just no results emitted (i.e. null).
If you don't want to handle that in the logic that's running the queries and processing the results, you need to use an approach that will allow you to group all the transaction documents together, instead of just the documents for a single transaction.
Based on what you've done so far, another approach would be to forgo the reduce phase and just look at the number of results returned by a query that's limited to the unique doc id.
Related
Bit of a lengthy one so those of you who like a challenge (or I'm simply not knowledgeable enough - hopefully it's an easy solution!) read on!
(skip to the actual question part to skip the explanation and what I've tried)
Problem
I have a site that has a dataset that contains an object with multiple objects inside. Each of those objects contains an array, and within that array there are multiple objects. (yes this is painful but its from an API and I need to use this dataset without changing or modifying it.) I am trying to filter the dataset based of the key-value pairs in the final object. However, I have multiple filters being executed at once.
Example of Path before looping which retrieves the key-value pair needed for one hall.
["Hamilton Hall"]["Hire Options"][2].Commercial
After Looping Path of required key-value pair for all halls, not just one (the hall identifier is stored):
[0]["Hire Options"][2].Commercial
Looping allows me to check each hall for a specific key-value pair (kind of like map or forEach, but for an object).
After getting that out of the way back to the question.
How would I go about filtering which of the looped objects are displayed?
What I have Tried
(userInput is defined elsewhere - this happens on a btn click btw)
let results = Object.keys(halls);
for (key of results) {
let weekend = [halls[ `${key}` ][ 'Hire Options' ][4][ 'Weekend function' ]];
if(userInput == weekend) {
outputAll([halls[ `${key}` ]]);
}
}
That filters it fine. However, I run into an issue here. I want to filter by multiple queries, and naturally adding an AND into the if statement doesn't work. I also dont want to have 10 if statements (I have 10+ filters of various data types I need to sort by).
I have recently heard of ternary operators, but do not know enough about them to know if that is the correct thing to do? If so, how? Also had a brief loook at switches, but doesnt seem to look like what I want (correct me if I am wrong.)
Actual Question minus the babble/explanation
Is there a way for me to dynamically modify an if statements conditions? Such as adding or removing conditions of an if statement? Such as if the filter for 'a' is set to off, remove the AND condition for 'a' in the if statement? This would mean that the results would only filter with the active filters.
Any help, comments or 'why haven't you tried this' remark are greatly appreciated!
Thanks!
Just for extra reference, here is the code for retrieving each of the objects from the first object as it loops through them:
(Looping Code)
halls = data[ 'Halls' ];
let results = Object.keys(halls);
for (key of results) {
let arr = [halls[ `${key}` ]];
outputAll(arr);
}
You can use Array.filter on the keys array - you can structure the logic for a match how you like - just make it return true if a match is found and the element needs to be displayed.
let results = Object.keys(halls);
results.filter(key => {
if (userInput == halls[key]['Hire Options'][4]['Weekend function']) {
return true;
}
if (some other condition you want to match) {
return true;
}
return false;
}).forEach(key => outputAll([halls[key]]));
I will start off by saying while I am not new to CouchDB, I am new to querying the views using JavaScript and the web.
I have looked at multiple other questions on here, including CouchDB - Queries with params, couchDB queries, Couchdb query with AND operator, CouchDB Querying Dates, and Basic CouchDB Queries, just to list a few.
While all have good information in them, I haven't found one that has my particular problem in it.
I have a view set up like so:
function (docu) {
if(docu.status && docu.doc && docu.orgId.toString() && !docu.deleted){
switch(docu.status){
case "BASE":
emit(docu.name, docu);
break;
case "AIR":
emit(docu.eta, docu);
break;
case "CHECK":
emit(docu.checkTime, docu);
break;
}
}
}
with all documents having a status, doc, orgId, deleted, name, eta, and checkTime. (I changed doc to docu because of my custom doc key.
I am trying to query and emit based on a set of keys, status, doc, orgId, where orgId is an integer.
My jQuery to do this looks like so:
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["status","doc",orgId],
success: function(data) {
console.log(data);
},
error: function(status) {
console.log(status);
}
});
I receive
{"total_rows":59,"offset":59,"rows":[
]}
Sometimes the offset is 0, sometimes it is 59. I feel I must be doing something wrong for this not to be working correctly.
So for my questions:
I did not mention this, but I had to set docu.orgId.toString() because I guess it parses the URL as a string, is there a way to use this number as a numeric value?
How do I correctly view multiple documents based on multiple keys, i.e. if(key1 && key2) emit(doc.name, doc)
Am I doing something obviously wrong that I lack the knowledge to notice?
Thank you all.
You're so very close. To answer your questions
When you're using docu.orgId.toString() in that if-statement you're basically saying: this value must be truthy. If you didn't convert to string, any number, other than 0, would be true. Since you are converting to a string, any value other than an empty string will be true. Also, since you do not use orgId as the first argument in an emit call, at least not in the example above, you cannot query by it at all.
I'll get to this.
A little.
The thing to remember is emit creates a key-value table (that's really all a view is) that you can use to query. Let's say we have the following documents
{type:'student', dept:'psych', name:'josh'},
{type:'student', dept:'compsci', name:'anish'},
{type:'professor', dept:'compsci', name:'kender'},
{type:'professor', dept:'psych', name:'josh'},
{type:'mascot', name:'owly'}
Now let's say we know that for this one view, we want to query 1) everything but mascots, 2) we want to query by type, dept, and name, all of the available fields in this example. We would write a map function like this:
function(doc) {
if (doc.type === 'mascot') { return; } // don't do anything
// allow for queries by type
emit(doc.type, null); // the use of null is explained below
// allow queries by dept
emit(doc.dept, null);
// allow for queries by name
emit(doc.name, null);
}
Then, we would query like this:
// look for all joshs
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["josh"],
// ...
});
// look for everyone in the psych department
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["psych"],
// ...
});
// look for everyone that's a professor and everyone named josh
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["professor", "josh"],
// ...
});
Notice the last query isn't and in the sense of a logical conjunction, it's in the sense of a union. If you wanted to restrict what was returned to documents that were only professors and also joshs, there are a few options. The most basic would be to concatenate the key when you emit. Like
emit('type-' + doc.type + '_name-' + doc.name, null);
You would then query like this: key : ["type-professor_name-josh"]
It doesn't feel very proper to rely on strings like this, at least it didn't to me when I first started doing it, but it is a quite common method for querying key-value stores. The characters - and _ have no special meaning in this example, I simply use them as delimiters.
Another option would be what you mentioned in your comment, to emit an array like
emit([ doc.type, doc.name ], null);
Then you would query like
key: ["professor", "josh"]
This is perfectly fine, but generally, the use case for emitting arrays as keys, is for aggregating returned rows. For example, you could emit([year, month, day]) and if you had a simple reduce function that basically passed the records through:
function(keys, values, rereduce) {
if (rereduce) {
return [].concat.apply([], values);
} else {
return values;
}
}
You could query with the url parameter group_level set to 1 or 2 and start querying by year and month or just year on the exact same view using arrays as keys. Compared to SQL or Mongo it's mad complicated and convoluted, but hey, it's there.
The use of null in the view is really for resource saving. When you query a view, the rows contain an _id that you can use in a second ajax call to get all the documents from, for example, _all_docs.
I hope that makes sense. If you need any clarification you can use the comments and I'll try my best.
I've run into a bit of an issue with some data that I'm storing in my MongoDB (Note: I'm using mongoose as an ODM). I have two schemas:
mongoose.model('Buyer',{
credit: Number,
})
and
mongoose.model('Item',{
bid: Number,
location: { type: [Number], index: '2d' }
})
Buyer/Item will have a parent/child association, with a one-to-many relationship. I know that I can set up Items to be embedded subdocs to the Buyer document or I can create two separate documents with object id references to each other.
The problem I am facing is that I need to query Items where it's bid is lower than Buyer's credit but also where location is near a certain geo coordinate.
To satisfy the first criteria, it seems I should embed Items as a subdoc so that I can compare the two numbers. But, in order to compare locations with a geoNear query, it seems it would be better to separate the documents, otherwise, I can't perform geoNear on each subdocument.
Is there any way that I can perform both tasks on this data? If so, how should I structure my data? If not, is there a way that I can perform one query and then a second query on the result from the first query?
Thanks for your help!
There is another option (besides embedding and normalizing) for storing hierarchies in mongodb, that is storing them as tree structures. In this case you would store Buyers and Items in separate documents but in the same collection. Each Item document would need a field pointing to its Buyer (parent) document, and each Buyer document's parent field would be set to null. The docs I linked to explain several implementations you could choose from.
If your items are stored in two separate collections than the best option will be write your own function and call it using mongoose.connection.db.eval('some code...');. In such case you can execute your advanced logic on the server side.
You can write something like this:
var allNearItems = db.Items.find(
{ location: {
$near: {
$geometry: {
type: "Point" ,
coordinates: [ <longitude> , <latitude> ]
},
$maxDistance: 100
}
}
});
var res = [];
allNearItems.forEach(function(item){
var buyer = db.Buyers.find({ id: item.buyerId })[0];
if (!buyer) continue;
if (item.bid < buyer.credit) {
res.push(item.id);
}
});
return res;
After evaluation (place it in mongoose.connection.db.eval("...") call) you will get the array of item id`s.
Use it with cautions. If your allNearItems array will be too large or you will query it very often you can face the performance problems. MongoDB team actually has deprecated direct js code execution but it is still available on current stable release.
I'm trying to test out Firebase to allow users to post comments using push. I want to display the data I retrieve with the following;
fbl.child('sell').limit(20).on("value", function(fbdata) {
// handle data display here
}
The problem is the data is returned in order of oldest to newest - I want it in reversed order. Can Firebase do this?
Since this answer was written, Firebase has added a feature that allows ordering by any child or by value. So there are now four ways to order data: by key, by value, by priority, or by the value of any named child. See this blog post that introduces the new ordering capabilities.
The basic approaches remain the same though:
1. Add a child property with the inverted timestamp and then order on that.
2. Read the children in ascending order and then invert them on the client.
Firebase supports retrieving child nodes of a collection in two ways:
by name
by priority
What you're getting now is by name, which happens to be chronological. That's no coincidence btw: when you push an item into a collection, the name is generated to ensure the children are ordered in this way. To quote the Firebase documentation for push:
The unique name generated by push() is prefixed with a client-generated timestamp so that the resulting list will be chronologically-sorted.
The Firebase guide on ordered data has this to say on the topic:
How Data is Ordered
By default, children at a Firebase node are sorted lexicographically by name. Using push() can generate child names that naturally sort chronologically, but many applications require their data to be sorted in other ways. Firebase lets developers specify the ordering of items in a list by specifying a custom priority for each item.
The simplest way to get the behavior you want is to also specify an always-decreasing priority when you add the item:
var ref = new Firebase('https://your.firebaseio.com/sell');
var item = ref.push();
item.setWithPriority(yourObject, 0 - Date.now());
Update
You'll also have to retrieve the children differently:
fbl.child('sell').startAt().limitToLast(20).on('child_added', function(fbdata) {
console.log(fbdata.exportVal());
})
In my test using on('child_added' ensures that the last few children added are returned in reverse chronological order. Using on('value' on the other hand, returns them in the order of their name.
Be sure to read the section "Reading ordered data", which explains the usage of the child_* events to retrieve (ordered) children.
A bin to demonstrate this: http://jsbin.com/nonawe/3/watch?js,console
Since firebase 2.0.x you can use limitLast() to achieve that:
fbl.child('sell').orderByValue().limitLast(20).on("value", function(fbdataSnapshot) {
// fbdataSnapshot is returned in the ascending order
// you will still need to order these 20 items in
// in a descending order
}
Here's a link to the announcement: More querying capabilities in Firebase
To augment Frank's answer, it's also possible to grab the most recent records--even if you haven't bothered to order them using priorities--by simply using endAt().limit(x) like this demo:
var fb = new Firebase(URL);
// listen for all changes and update
fb.endAt().limit(100).on('value', update);
// print the output of our array
function update(snap) {
var list = [];
snap.forEach(function(ss) {
var data = ss.val();
data['.priority'] = ss.getPriority();
data['.name'] = ss.name();
list.unshift(data);
});
// print/process the results...
}
Note that this is quite performant even up to perhaps a thousand records (assuming the payloads are small). For more robust usages, Frank's answer is authoritative and much more scalable.
This brute force can also be optimized to work with bigger data or more records by doing things like monitoring child_added/child_removed/child_moved events in lieu of value, and using a debounce to apply DOM updates in bulk instead of individually.
DOM updates, naturally, are a stinker regardless of the approach, once you get into the hundreds of elements, so the debounce approach (or a React.js solution, which is essentially an uber debounce) is a great tool to have.
There is really no way but seems we have the recyclerview we can have this
query=mCommentsReference.orderByChild("date_added");
query.keepSynced(true);
// Initialize Views
mRecyclerView = (RecyclerView) view.findViewById(R.id.recyclerView);
mManager = new LinearLayoutManager(getContext());
// mManager.setReverseLayout(false);
mManager.setReverseLayout(true);
mManager.setStackFromEnd(true);
mRecyclerView.setHasFixedSize(true);
mRecyclerView.setLayoutManager(mManager);
I have a date variable (long) and wanted to keep the newest items on top of the list. So what I did was:
Add a new long field 'dateInverse'
Add a new method called 'getDateInverse', which just returns: Long.MAX_VALUE - date;
Create my query with: .orderByChild("dateInverse")
Presto! :p
You are searching limitTolast(Int x) .This will give you the last "x" higher elements of your database (they are in ascending order) but they are the "x" higher elements
if you got in your database {10,300,150,240,2,24,220}
this method:
myFirebaseRef.orderByChild("highScore").limitToLast(4)
will retrive you : {150,220,240,300}
In Android there is a way to actually reverse the data in an Arraylist of objects through the Adapter. In my case I could not use the LayoutManager to reverse the results in descending order since I was using a horizontal Recyclerview to display the data. Setting the following parameters to the recyclerview messed up my UI experience:
llManager.setReverseLayout(true);
llManager.setStackFromEnd(true);
The only working way I found around this was through the BindViewHolder method of the RecyclerView adapter:
#Override
public void onBindViewHolder(final RecyclerView.ViewHolder holder, int position) {
final SuperPost superPost = superList.get(getItemCount() - position - 1);
}
Hope this answer will help all the devs out there who are struggling with this issue in Firebase.
Firebase: How to display a thread of items in reverse order with a limit for each request and an indicator for a "load more" button.
This will get the last 10 items of the list
FBRef.child("childName")
.limitToLast(loadMoreLimit) // loadMoreLimit = 10 for example
This will get the last 10 items. Grab the id of the last record in the list and save for the load more functionality. Next, convert the collection of objects into and an array and do a list.reverse().
LOAD MORE Functionality: The next call will do two things, it will get the next sequence of list items based on the reference id from the first request and give you an indicator if you need to display the "load more" button.
this.FBRef
.child("childName")
.endAt(null, lastThreadId) // Get this from the previous step
.limitToLast(loadMoreLimit+2)
You will need to strip the first and last item of this object collection. The first item is the reference to get this list. The last item is an indicator for the show more button.
I have a bunch of other logic that will keep everything clean. You will need to add this code only for the load more functionality.
list = snapObjectAsArray; // The list is an array from snapObject
lastItemId = key; // get the first key of the list
if (list.length < loadMoreLimit+1) {
lastItemId = false;
}
if (list.length > loadMoreLimit+1) {
list.pop();
}
if (list.length > loadMoreLimit) {
list.shift();
}
// Return the list.reverse() and lastItemId
// If lastItemId is an ID, it will be used for the next reference and a flag to show the "load more" button.
}
I'm using ReactFire for easy Firebase integration.
Basically, it helps me storing the datas into the component state, as an array. Then, all I have to use is the reverse() function (read more)
Here is how I achieve this :
import React, { Component, PropTypes } from 'react';
import ReactMixin from 'react-mixin';
import ReactFireMixin from 'reactfire';
import Firebase from '../../../utils/firebaseUtils'; // Firebase.initializeApp(config);
#ReactMixin.decorate(ReactFireMixin)
export default class Add extends Component {
constructor(args) {
super(args);
this.state = {
articles: []
};
}
componentWillMount() {
let ref = Firebase.database().ref('articles').orderByChild('insertDate').limitToLast(10);
this.bindAsArray(ref, 'articles'); // bind retrieved data to this.state.articles
}
render() {
return (
<div>
{
this.state.articles.reverse().map(function(article) {
return <div>{article.title}</div>
})
}
</div>
);
}
}
There is a better way. You should order by negative server timestamp. How to get negative server timestamp even offline? There is an hidden field which helps. Related snippet from documentation:
var offsetRef = new Firebase("https://<YOUR-FIREBASE-APP>.firebaseio.com/.info/serverTimeOffset");
offsetRef.on("value", function(snap) {
var offset = snap.val();
var estimatedServerTimeMs = new Date().getTime() + offset;
});
To add to Dave Vávra's answer, I use a negative timestamp as my sort_key like so
Setting
const timestamp = new Date().getTime();
const data = {
name: 'John Doe',
city: 'New York',
sort_key: timestamp * -1 // Gets the negative value of the timestamp
}
Getting
const ref = firebase.database().ref('business-images').child(id);
const query = ref.orderByChild('sort_key');
return $firebaseArray(query); // AngularFire function
This fetches all objects from newest to oldest. You can also $indexOn the sortKey to make it run even faster
I had this problem too, I found a very simple solution to this that doesn't involved manipulating the data in anyway. If you are rending the result to the DOM, in a list of some sort. You can use flexbox and setup a class to reverse the elements in their container.
.reverse {
display: flex;
flex-direction: column-reverse;
}
myarray.reverse(); or this.myitems = items.map(item => item).reverse();
I did this by prepend.
query.orderByChild('sell').limitToLast(4).on("value", function(snapshot){
snapshot.forEach(function (childSnapshot) {
// PREPEND
});
});
Someone has pointed out that there are 2 ways to do this:
Manipulate the data client-side
Make a query that will order the data
The easiest way that I have found to do this is to use option 1, but through a LinkedList. I just append each of the objects to the front of the stack. It is flexible enough to still allow the list to be used in a ListView or RecyclerView. This way even though they come in order oldest to newest, you can still view, or retrieve, newest to oldest.
You can add a column named orderColumn where you save time as
Long refrenceTime = "large future time";
Long currentTime = "currentTime";
Long order = refrenceTime - currentTime;
now save Long order in column named orderColumn and when you retrieve data
as orderBy(orderColumn) you will get what you need.
just use reverse() on the array , suppose if you are storing the values to an array items[] then do a this.items.reverse()
ref.subscribe(snapshots => {
this.loading.dismiss();
this.items = [];
snapshots.forEach(snapshot => {
this.items.push(snapshot);
});
**this.items.reverse();**
},
For me it was limitToLast that worked. I also found out that limitLast is NOT a function:)
const query = messagesRef.orderBy('createdAt', 'asc').limitToLast(25);
The above is what worked for me.
PRINT in reverse order
Let's think outside the box... If your information will be printed directly into user's screen (without any content that needs to be modified in a consecutive order, like a sum or something), simply print from bottom to top.
So, instead of inserting each new block of content to the end of the print space (A += B), add that block to the beginning (A = B+A).
If you'll include the elements as a consecutive ordered list, the DOM can put the numbers for you if you insert each element as a List Item (<li>) inside an Ordered Lists (<ol>).
This way you save space from your database, avoiding unnecesary reversed data.
I'm trying to write a view which shows me the top 10 tags used in my system. It's fairly easy to get the amount with _count in the reduce function, but that does not order the list by the numbers. Is there any way to do this?
function(doc, meta) {
if(doc.type === 'log') {
emit(doc.tag, 1);
}
}
_count
As a result I'd like to have:
Tag3 10
Tag1 7
Tag2 3
...
Instead of
Tag1 7
Tag2 3
Tag3 10
Most importantly, I do not want to transfer the full set to my application server and handle it there.
In couchbase you can't sort result in/after reduce, so you can't directly get "Top 10" of something. In couchbase views values are always sorted by key. The best way is:
Query your view that returns key-value pair: tag_name - count_value ordered by tag_name
Create job that runs every N minutes, that gets results from [1], sorts them, and writes sorted results to separate key (i.e. "Top10Tags").
In your app you query key Top10Tags.
This could reduce traffic, but results can be outdated. Also you can create that "job" on same server that couchbase runs (i.e. write small node.js app or something else) and it counsume just loopback traffic and small cpu amount for sorting every N mins.
Also, if you're using _count reduce function, you don't need to emit any numbers, use just null:
function(doc, meta) {
if(meta.type === "json" && doc.type === 'log') {
emit(doc.tag, null);
}
}
And if you want to have docs tagged by multiple tags like
{
"type": "log",
"tags": ["tag1","tag2","tag3"]
}
Your map function should be:
function(doc, meta) {
if(meta.type === "json" && doc.type === 'log') {
for(var i = 0; i < doc.tags.length; i++){
emit(doc.tags[i], null);
}
}
}
One more thing about that top10 list. You can store it in memcache bucket if you don't want to store it on disk.
Something you think would be easy but isn't really.
In couchdb, I'd use a list function, and order the results with JavaScript sort(). That way it's all sorted on the server side, and you can have the list only return the top 10.
Bare in mind that with large data sets this will be slow.