CouchDB / Couchbase view ordered by number of keys

CouchDB / Couchbase view ordered by number of keys - javascript

I'm trying to write a view which shows me the top 10 tags used in my system. It's fairly easy to get the amount with _count in the reduce function, but that does not order the list by the numbers. Is there any way to do this?
function(doc, meta) {
if(doc.type === 'log') {
emit(doc.tag, 1);
}
}
_count
As a result I'd like to have:
Tag3 10
Tag1 7
Tag2 3
...
Instead of
Tag1 7
Tag2 3
Tag3 10
Most importantly, I do not want to transfer the full set to my application server and handle it there.

In couchbase you can't sort result in/after reduce, so you can't directly get "Top 10" of something. In couchbase views values are always sorted by key. The best way is:
Query your view that returns key-value pair: tag_name - count_value ordered by tag_name
Create job that runs every N minutes, that gets results from [1], sorts them, and writes sorted results to separate key (i.e. "Top10Tags").
In your app you query key Top10Tags.
This could reduce traffic, but results can be outdated. Also you can create that "job" on same server that couchbase runs (i.e. write small node.js app or something else) and it counsume just loopback traffic and small cpu amount for sorting every N mins.
Also, if you're using _count reduce function, you don't need to emit any numbers, use just null:
function(doc, meta) {
if(meta.type === "json" && doc.type === 'log') {
emit(doc.tag, null);
}
}
And if you want to have docs tagged by multiple tags like
{
"type": "log",
"tags": ["tag1","tag2","tag3"]
}
Your map function should be:
function(doc, meta) {
if(meta.type === "json" && doc.type === 'log') {
for(var i = 0; i < doc.tags.length; i++){
emit(doc.tags[i], null);
}
}
}
One more thing about that top10 list. You can store it in memcache bucket if you don't want to store it on disk.

Something you think would be easy but isn't really.
In couchdb, I'd use a list function, and order the results with JavaScript sort(). That way it's all sorted on the server side, and you can have the list only return the top 10.
Bare in mind that with large data sets this will be slow.

Related

Checking for a key in an object containing array of objects saved in chrome.storage.local

I currently save a bunch of objects (thousands) into the chrome.storage.local and then when on a specific web page checking whether specific IDs on the web page are in fact saved in local storage.
Here's a pseudo code
Bakcground script:
var storage = chrome.storage.local;
var json = '[{"kek1": {"aaa": "aaaValue", "bbb": "bbbValue", "ccc": "cccValue"}},{"kek2": {"ddd": "dddValue", "eee": "eeeValue", "fff": "fffValue"}}]';
var jsonParsed = JSON.parse(json);
jsonParsed.forEach(function(object) {
storage.set(object);
});
Content script (when on a specific page):
ids.forEach(function(id) {
storage.get(id, function(result){
if(!isEmpty(result)) {
//we found it, nice, now flag it as found
}
});
});
function isEmpty(obj) {
for(var key in obj) {
if(obj.hasOwnProperty(key))
return false;
}
return true;
}
Which is easy and nice since I only have to do storage.get(id, ...
Unfortunately, I save a lot of stuff in storage, some of it I need to be removing periodically, which then becomes a hustle since I have to loop through all the objects and determining whether that particular object needs to be removed or it needs to remain.
So i decided I would do like these "parent object". Ie one object for settings, containing an array of objects with different settings the user would save. One object for the stuff that needs to be removed, containing an array objects. Etc
Like so - all relevant info that I want to remove periodically will be under one key "test" (temp name):
var json = '{"test":[{"kek1": {"aaa": "aaaValue", "bbb": "bbbValue", "ccc": "cccValue"}},{"kek2": {"ddd": "dddValue", "eee": "eeeValue", "fff": "fffValue"}}]}';
I know how to access the nested objects and their values:
var jsonParsed = JSON.parse(json);
jsonParsed.test[0].kek1.aaa
But I don't know how I would easily check for the keys saved in the storage since I would have to specify the "element number" ([i]).
Do I just do a for loop itterating over the array like so?
for (i = 0; i < jsonParsed.test.length; i++) {
var getKey = Object.keys(jsonParsed.test[i]);
if (getKey[0] == 'theKeyImLookingFor') {
//do stuff
}
}
To me that feels like non ideal solution since the for loop would have to run for each of the ids on the page and there could sometimes be close to 4000 of them. (4000 for loops back to back)
Is it a good idea to save a single object holding an array of thousands of other objects?
Am I doing it wrong or is this the way to go?

But I don't know how I would easily check for the keys saved in the storage
Use the standard Array methods like find or findIndex:
const i = arrayOfObjects.findIndex(o => 'someKey' in o);
Is it a good idea to save a single object holding an array of thousands of other objects?
It's a bad idea performance-wise.
What you probably need here is an additional value in the storage that would contain an array with ids of other values in the storage that need to be processed in some fashion e.g. expired/removed. It's basically like a database index so you would update it every time when writing an individual object. Since it contains only the ids, updating it is cheaper than rewriting the entire data.
Also, instead of performing lots of calls to the API, do just a single call:
// writing
chrome.storage.local.set(Object.assign({}, ...arrayOfObjects));
// reading
chrome.storage.local.get(arrayOfIds, data => {
for (const id of arrayOfIds) {
const value = data[id];
if (value !== undefined) {
// ok
}
}
});

CouchDB find paired documents and list remaining unpaired documents

I'm relatively new to NoSQL, but I have been enjoying the journey very much! I am however finding the map-reduce way of life a bit tricky! I need some help with a problem!
I have a database with two types of documents, opening transactions and closing transactions. For replication and offline functionality reasons I cannot merge the data into one document. The opening transaction document looks something like :
{
_id: "transaction-open-randomgeneratedstring",
type: "transactions-open",
vehicle: "vehicle-id",
created: "date string"
}
The closing documents looks something like:
{
_id: "transaction-close-randomgeneratedstring",
type: "transactions-close",
openid: "transaction-open-randomgeneratedstring",
created: "date string"
}
The randomgeneratedstring of a closing transactions match the randomgeneratedstring of the corresponding opening transaction.
I need a map-reduce to give me the list of open transactions that does not have a corresponding closing transaction. This will basically give me a list of outstanding transactions.
This is the map-reduce I have thus far, but it is not doing the job.
{
"map": function(doc) {
if(doc.type == "transactions-open") {
emit([doc._id, 0], "OPEN");
}
if(doc.type == "transactions-close"){
emit([doc.openid, 1], "CLOSE");
}
},
"reduce": function(keys, values, rereduce) {
var unique_labels = {};
var open = {};
keys.forEach(function(label) {
if(!unique_labels[label[0]]) {
unique_labels[label[0]] = true;
} else {
open[label[0]] = true;
}
});
return open;
}
}
I am open for changes in the _id naming / structure, but I cannot combine the two documents into one.
Thanks!
EDIT
Based on response from Hod, I changed the reduce to look like:
function(keys, values, rereducer)
{
if(values.length == 1)
return true;
}
This is certainly a step in the right direction, but the unwanted transactions are still in the result set, the value is only null. Is there no way to get those out of the result set?

As described - what you would do with a Join in SQL you do with a reduce in CouchDB. Code something like this - not tested:
{
"map": function(doc) {
if(doc.type == "transactions-open") {
emit([doc._id], 1);
}
if(doc.type == "transactions-close"){
emit([doc.openid], -1);
}
},
"reduce": "_sum";
}
So we emit a 1 for an open transaction under an ID and a -1 for a close under the same ID. Now when you reduce you will get a result for each ID of:
-1 = Closed with no record of an open (error condition).
0 = Opened and Closed
1 = Open and not yet closed.

The problem is with the keys parameter in your reduce function. The reduce phase is not called once with all possible keys. It's called per distinct key, and based on the group_level you specify.
Looking at your code, if you haven't specified any group_level, your reduce function is going to get called for every document separately.
Because you're emitting the id of the open transaction doc for both open and close markers, if you grouped at the first level, you'd get open or open/close pairs. You're still only getting a reduction on a limited set of docs at a time.
You could fix this either in your logic calling the query, or by emitting a key that let's you reduce on the entire set at once. (I imagine there are other ways too. These are the ones that come to mind.)
If you use the key approach, you'd need to emit something that looked like ["transaction", doc._id, 0]. Then a first level grouping would give you the whole transaction set like you're current code expects.
EDIT (Adding information based on edit of question.)
The reduce function is going to get called with whatever grouping you set up. It's always going to return something, even if it's just no results emitted (i.e. null).
If you don't want to handle that in the logic that's running the queries and processing the results, you need to use an approach that will allow you to group all the transaction documents together, instead of just the documents for a single transaction.
Based on what you've done so far, another approach would be to forgo the reduce phase and just look at the number of results returned by a query that's limited to the unique doc id.

CouchDB query issues

I will start off by saying while I am not new to CouchDB, I am new to querying the views using JavaScript and the web.
I have looked at multiple other questions on here, including CouchDB - Queries with params, couchDB queries, Couchdb query with AND operator, CouchDB Querying Dates, and Basic CouchDB Queries, just to list a few.
While all have good information in them, I haven't found one that has my particular problem in it.
I have a view set up like so:
function (docu) {
if(docu.status && docu.doc && docu.orgId.toString() && !docu.deleted){
switch(docu.status){
case "BASE":
emit(docu.name, docu);
break;
case "AIR":
emit(docu.eta, docu);
break;
case "CHECK":
emit(docu.checkTime, docu);
break;
}
}
}
with all documents having a status, doc, orgId, deleted, name, eta, and checkTime. (I changed doc to docu because of my custom doc key.
I am trying to query and emit based on a set of keys, status, doc, orgId, where orgId is an integer.
My jQuery to do this looks like so:
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["status","doc",orgId],
success: function(data) {
console.log(data);
},
error: function(status) {
console.log(status);
}
});
I receive
{"total_rows":59,"offset":59,"rows":[
]}
Sometimes the offset is 0, sometimes it is 59. I feel I must be doing something wrong for this not to be working correctly.
So for my questions:
I did not mention this, but I had to set docu.orgId.toString() because I guess it parses the URL as a string, is there a way to use this number as a numeric value?
How do I correctly view multiple documents based on multiple keys, i.e. if(key1 && key2) emit(doc.name, doc)
Am I doing something obviously wrong that I lack the knowledge to notice?
Thank you all.

You're so very close. To answer your questions
When you're using docu.orgId.toString() in that if-statement you're basically saying: this value must be truthy. If you didn't convert to string, any number, other than 0, would be true. Since you are converting to a string, any value other than an empty string will be true. Also, since you do not use orgId as the first argument in an emit call, at least not in the example above, you cannot query by it at all.
I'll get to this.
A little.
The thing to remember is emit creates a key-value table (that's really all a view is) that you can use to query. Let's say we have the following documents
{type:'student', dept:'psych', name:'josh'},
{type:'student', dept:'compsci', name:'anish'},
{type:'professor', dept:'compsci', name:'kender'},
{type:'professor', dept:'psych', name:'josh'},
{type:'mascot', name:'owly'}
Now let's say we know that for this one view, we want to query 1) everything but mascots, 2) we want to query by type, dept, and name, all of the available fields in this example. We would write a map function like this:
function(doc) {
if (doc.type === 'mascot') { return; } // don't do anything
// allow for queries by type
emit(doc.type, null); // the use of null is explained below
// allow queries by dept
emit(doc.dept, null);
// allow for queries by name
emit(doc.name, null);
}
Then, we would query like this:
// look for all joshs
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["josh"],
// ...
});
// look for everyone in the psych department
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["psych"],
// ...
});
// look for everyone that's a professor and everyone named josh
$.couch.db("myDB").view("designDoc/viewName", {
keys : ["professor", "josh"],
// ...
});
Notice the last query isn't and in the sense of a logical conjunction, it's in the sense of a union. If you wanted to restrict what was returned to documents that were only professors and also joshs, there are a few options. The most basic would be to concatenate the key when you emit. Like
emit('type-' + doc.type + '_name-' + doc.name, null);
You would then query like this: key : ["type-professor_name-josh"]
It doesn't feel very proper to rely on strings like this, at least it didn't to me when I first started doing it, but it is a quite common method for querying key-value stores. The characters - and _ have no special meaning in this example, I simply use them as delimiters.
Another option would be what you mentioned in your comment, to emit an array like
emit([ doc.type, doc.name ], null);
Then you would query like
key: ["professor", "josh"]
This is perfectly fine, but generally, the use case for emitting arrays as keys, is for aggregating returned rows. For example, you could emit([year, month, day]) and if you had a simple reduce function that basically passed the records through:
function(keys, values, rereduce) {
if (rereduce) {
return [].concat.apply([], values);
} else {
return values;
}
}
You could query with the url parameter group_level set to 1 or 2 and start querying by year and month or just year on the exact same view using arrays as keys. Compared to SQL or Mongo it's mad complicated and convoluted, but hey, it's there.
The use of null in the view is really for resource saving. When you query a view, the rows contain an _id that you can use in a second ajax call to get all the documents from, for example, _all_docs.
I hope that makes sense. If you need any clarification you can use the comments and I'll try my best.

How do I find records in Azure table storage that don't match an array of values?

I'm trying to perform a 'doesNotContainAllObjectsInArray' type operation on Azure Mobile Services. For example, let's say I have a table called Number and within that table are these records with these 'number' values: 11111, 22222, 33333, 44444.
I want to be able to write a query that will allow me to pass in an array of numbers that I specifically don't want, for example: [11111,44444] should yield me with [22222, 33333].
I've tried using JavaScript in my where operator, but I'm getting an error back stating that the expression isn't supported. This is what I've tried:
var numberTable = tables.getTable('Number');
var ignoreNumbers = ['11111', '44444'];
numberTable.where(function(numbers) {
return (numbers.indexOf(this.number) > -1);
}, ignoreNumbers).read({
success: function(foundNumbers) {
console.log('Found ' + foundNumbers.length + ' numbers!');
},
error: function(error) {
console.error('Error with query! ' + error);
}
});
Note: I can't hard code the ignoreNumbers values, since that array is produced from a previous query.
Can anyone recommend how I might go about executing a query like this? Would I need build a SQL statement and execute it with mssql? (...is that even possible with Table Storage?)

You are describing the SQL Except operator which isn't supported in Table Queries. The only way I've found to do this is to load the table into memory (often not feasible due to size) and then use LINQ to do an Except query.

I managed to solve this by creating a SQL query and executing it through the request.service.mssql object, something like this:
SELECT * FROM Number WHERE (number != '11111' && number != '22222')
The WHERE part of the query is built by iterating the ignoreNumbers array and building the SQL statement by string concatenation.
Not sure if it's the most efficient thing in the world, but in reality there are only going to be a couple of numbers (maybe 5-10) and so far it seems to work.

Map-Reduce Query to Count Tags

I have a database of documents which are tagged with keywords. I am trying to find (and then count) the unique tags which are used alongside each other. So for any given tag, I want to know what tags have been used alongside that tag.
For example, if I had one document which had the tags [fruit, apple, plant] then when I query [apple] I should get [fruit, plant]. If another document has tags [apple, banana] then my query for [apple] would give me [fruit, plant, banana] instead.
This is my map function which emits all the tags and their neighbours:
function(doc) {
if(doc.tags) {
doc.tags.forEach(function(tag1) {
doc.tags.forEach(function(tag2) {
emit(tag1, tag2);
});
});
}
}
So in my example above, it would emit
apple -- fruit
apple -- plant
apple -- banana
fruit -- apple
fruit -- plant
...
My question is: what should my reduce function be? The reduce function should essentially filter out the duplicates and group them all together.
I have tried a number of different attempts, but my database server (CouchDB) keeps giving me a Error: reduce_overflow_error. Reduce output must shrink more rapidly.
EDIT: I've found something that seems to work, but I'm not sure why. I see there is an optional "rereduce" parameter to the reduce function call. If I ignore these special cases, then it stops throwing reduce_overflow_errors. Can anyone explain why? And also, should I just be ignoring these, or will this bite me in the ass later?
function(keys, values, rereduce) {
if(rereduce) return null; // Throws error without this.
var a = [];
values.forEach(function(tag) {
if(a.indexOf(tag) < 0) a.push(tag);
});
return a;
}

Your answer is nice, and as I said in the comments, if it works for you, that's all you should care about. Here is an alternative implementation in case you ever bump into performance problems.
CouchDB likes tall lists, not fat lists. Instead of view rows keeping an array with every previous tag ever seen, this solution keeps the "sibling" tags in the key of the view rows, and then group them together to guarantee one unique sibling tag per row. Every row is just two tags, but there could be thousands or millions of rows: a tall list, which CouchDB prefers.
The main idea is to emit a 2-array of tag pairs. Suppose we have one doc, tagged fruit, apple, plant.
// Pseudo-code visualization of view rows (before reduce)
// Key , Value
[apple, fruit ], 1
[apple, plant ], 1 // Basically this is every combination of 2 tags in the set.
[fruit, apple ], 1
[fruit, plant ], 1
[plant, apple ], 1
[plant, fruit ], 1
Next I tag something apple, banana.
// Pseudo-code visualization of view rows (before reduce)
// Key , Value
[apple, banana], 1 // This is from my new doc
[apple, fruit ], 1
[apple, plant ], 1 // This is also from my new doc
[banana, apple], 1
[fruit, apple ], 1
[fruit, plant ], 1
[plant, apple ], 1
[plant, fruit ], 1
Why is the value always 1? Because I can make a very simple built-in reduce function: _sum to tell me the count of all tag pairs. Next, query with ?group_level=2 and CouchDB will give you unique pairs, with a count of their total.
A map function to produce this kind of view might look like this:
function(doc) {
// Emit "sibling" tags, keyed on tag pairs.
var tags = doc.tags || []
tags.forEach(function(tag1) {
tags.forEach(function(tag2) {
if(tag1 != tag2)
emit([tag1, tag2], 1)
})
})
}

I have found a correct solution I am much happier with. The trick was that CouchDB must be set to reduce_limit = false so that it stops checking its heuristic against your query.
You can set this via Futon on http://localhost:5984/_utils/config.html under the query_server_config settings, by double clicking on the value.
Once that's done, here is my new map function which works better with the "re-reducing" part of the reduce function:
function(doc) {
if(doc.tags) {
doc.tags.forEach(function(tag1) {
doc.tags.forEach(function(tag2) {
emit(tag1, [tag2]); // Array with single value
});
});
}
}
And here is the reduce function:
function(keys, values) {
var a = [];
values.forEach(function(tags) {
tags.forEach(function(tag) {
if(a.indexOf(tag) < 0) a.push(tag);
});
});
return a;
}
Hope this helps someone!

We Keep Coding

JavaScript is the programming language of the Web.

CouchDB / Couchbase view ordered by number of keys - javascript

Something you think would be easy but isn't really. In couchdb, I'd use a list function, and order the results with JavaScript sort(). That way it's all sorted on the server side, and you can have the list only return the top 10. Bare in mind that with large data sets this will be slow.

Related

Checking for a key in an object containing array of objects saved in chrome.storage.local

CouchDB find paired documents and list remaining unpaired documents

CouchDB query issues

How do I find records in Azure table storage that don't match an array of values?

Map-Reduce Query to Count Tags

Categories

Resources