PouchDB + Conflict Resolution

PouchDB + Conflict Resolution - javascript

I have a really simple question about hard topic:
How does conflict resolution work in PouchDB?
I looked at the documentation, as well as quickly googling, but it didn't help. So, how to do I handle conflict management in my application which is using PouchDB?

Here's how you do it in CouchDB, which you can directly translate into PouchDB terms since the APIs are exactly the same.
You fetch a document, using conflicts=true to ask for conflicts (get() with {conflicts:true} in PouchDB):
http://localhost:5984/db1/foo?conflicts=true
You receive a doc like this:
{
"_id":"foo",
"_rev":"2-f3d4c66dcd7596419c76b2498b3ba21f",
"notgonnawork":"this is from the second db",
"_conflicts":["2-c1592ce7b31cc26e91d2f2029c57e621"]
}
There is a conflict introduced from another database, and that database's revision has (randomly) won. If you used bi-directional replication, both databases will provide this same answer.
Notice that both revisions start with "2-." This indicates that they are both the second revision to the document, and they both live at the same level of the revision tree.
Using the revision ID, you fetch the conflicting version (get() with {rev=...} in PouchDB:
http://localhost:5984/db1/foo?rev=2-c1592ce7b31cc26e91d2f2029c57e621
You receive:
{
"_id":"foo",
"_rev":"2-c1592ce7b31cc26e91d2f2029c57e621",
"notgonnawork":"this is from the first database"
}
After presenting the two conflicting versions to the user, you can then PUT (put()) a third revision on top of both of these. Your third version can combine the results, choose the loser, or whatever you want.
Advanced reading:
The CouchDB docs on conflict resolution
The CouchDB wiki page on conflicts.
Understanding CouchDB Conflicts by Jan Lenhardt

Related

pouchdb or alternative where I can control how much data stored locally?

In the design stage for an app that collects large amounts of data...
Ideally, I want it to be an offline-first app and was looking to Pouchdb/Counchdb - However, the data needs to be kept for years for legal reasons, and my concern is that this is going to consume too much local storage over time.
My thoughts were:
handle sync between pouchdb and couchdb myself, allowing me to purge inactive documents from the local store without impacting the couchdb. This feels messy and probably a lot of work
Build a local store using dexie.js and completely write the sync function. It also looks hard work, but may be less as I'm not trying to mess with a sync function
Search harder :)
Conceptually, I guess I'm looking for a 'DB cache' - holding active json document versions and removing documents that have not been touched for X period. It might be that 'offline' mode is handled separate to the DB cache..

Not sure yet if this is the correct answer..
setup a filter on couchdb to screen out old documents (lets say we have a 'date_modified' field in the doc and we filter out any docs with date_modified older than one month)
have a local routine on the client that deletes documents from the local pouchdb that are older than one month ( actually using the remove() method against the local pouchdb, not updating it with _deleted:true) - from https://pouchdb.com/2015/04/05/filtered-replication.html it appears removed documents don't sync.
docs updated on the Pouchdb will replicate normally
there might be a race condition here for replication, we'll see

What is the recommended way to handle changes in local web storage from upstream?

The question is about the general approach. For me, I try to present the problem using AngularJS with ngStorage.
Let's say I have something like this saved in local storage:
$scope.$storage = $localStorage;
$scope.$storage.food = { type: 'candy', eaten: false }
With this way, I've saved this in local storage. So next time a user visit my page, I know if he/she has eaten the candy. However, in the future, I change my app and change the structure of food.
So how should I update this? Two things must be took care of are:
Notify client of new structure for storing.
Integrate that change with the old storage.
My approach is using a version field to indicate changes, and upon seeing that, reset all clients storage.

This process is called "data migration" (i.e. upgrading a data structure as the application evolves). It's a well-known problem from the database world (and before that for config/preferences files).
The usual approach is to add a version in the header of the data structure. That means the header is always the same (or just changes in backwards-compatible ways) while the payload (the actual data) can change as much as it needs.
A simple solution just checks the version and uses defaults when the version doesn't match. More elaborate schemes contain migration code which can upgrade a data structure from version N to N+1. Control code will then apply all the migration steps necessary to upgrade all data structures to the latest version.

efficient search in mongodb v2.4

I'm using version 2.4 of mongodb which is working fine for my needs, except one thing, i.e. searching as it doesn't support some advanced options like $search. So, is there a way to implement that kind of searching in v2.4. The reason i'm sticking to older version is because i don't want to lose any of my data by upgrading and also i don't want to stop live mongo server.
The result i want should be something similar as this query's result:
db.data.find({$text: { $search: 'query' } }, { score: {$meta: "textScore" }})
This query is working fine for latest versions of mongoDB. And also if you people suggest me to use the latest version, please provide some references which can help me safely upgrading mongodb.

This is a little bit of a catch 22, introduced mainly by text search capabilities being considered "experimental" in earlier versions. Aside from being in an earlier development phase, the implementation is entirely different due to the fact that the "whole" query and index API has been re-written for MongoDB 2.6, largely in order to support the new types of indexes available and make the API for working with the data consistent.
So prior versions implement text search via the "command" interface directly and only. Things work a little differently and the current "deprecation" notice means that working in this way will be removed. But the "text" command will presently still operate as shown in the earlier documentation:
db.data.runCommand("text", { "search": "query" })
So there are limitations here as covered in the existing documentation. Notably being that the number of documents returned are those contained the "limit" argument to that command and there is no concept of "skip". Also that this is a "document" response and not a cursor, so the total results cannot exceed the BSON limit of 16MB.
That said, a little off topic but consider your MongDB 2.6 deployment scenario, and mostly on the following.
Future Proofing. In the earlier forms this is an experimental feature. So any general flaws and problems are not going to generally be "backported" in any way with fixes while you hang on to the version. Some may, but without a good reason to do so this mostly wanes over time. Remember this is "experimental" so due warning was given about use in production.
Consistency/Deprecation. The API for "text" and "geospatial" has changed. So implementation in earlier releases is different and "deprecated", and will go away. The right way is to have the same structure as other queries, and consistently use it in all query forms rather than a direct command.
Deployment. You say you don't wan't to stop the server, but you really should not have one server anyway. Apart from being out of the general philosophy of why you need MongoDB anyway, at the very least a "replica set" is a good idea for data redundancy and the "uptime" of your application. Removing a single point of failure means that you can individually "bring down" discrete nodes and "upgrade" without affecting application downtime.
So that strays "a little" off the programming topic, but for me, the last point is the most important. Better to make sure your application is without the failure points by building this into your deployment architecture. This then makes "staying ahead of the curve" a simpler decision. It is always worth noting the "experimental" clause with technologies before rolling out to production. Cover your bases.

Is it possible to control order of replication?

I have a huge master CouchDB database and slave read-only CouchDB database, that synchronizes with master database.
Because rate of changes is quick, and channel between servers is slow and unstable, I want to set order/priority to define what documents come first. I need to ensure that the documents with highest priority are definitely of the latest version, and I can ignore documents in the end of list.
SORTING, not FILTERING
If it is not possible, what solution could be?
Resource I have already looked at:
http://wiki.apache.org/couchdb/Replication
http://couchapp.org/page/index
UPDATE: the master database is actually Node.js NPM registry, and order is list of Most Depended-upon Packages. I am trying to make proxy, because cloning 50G always fails after a while. But the fact is "we don't need 90% of those modules, but quick & reliable access to those we depend on."

The short answer is no.
The long answer is that CouchDB provides ACID guarantees at the individual document level only, by design. The replicator will update each document atomically when it replicates (as can anyone, the replicator is just using the public API) but does not guarantee ordering, this is mostly because it uses multiple http connections to improve throughput. You can configure that down to 1 if you like and you'll get better ordering, but it's not a panacea.
After the bigcouch merge, all bets are off, there will be multiple sources and multiple targets with no imposed total order.

CouchDB, out of the box, does not provide you with any options to control the order of replication. I'm guessing you could piece something together if you keep documents with different priorities in different databases on the master, though. Then, you could replicate the high-priority master database into the slave database first, replicate lower-priority databases after that, etc.

You could set up filtered replication or named document replication:
http://wiki.apache.org/couchdb/Replication#Filtered_Replication
http://wiki.apache.org/couchdb/Replication#Named_Document_Replication
Both of these are alternatives to replicating an entire database. You could do the replication in smaller batch sizes, and order the batches to match your priorities.

CouchDB: How to change view function via javascript?

I am playing around with CouchDB to test if it is "possible" [1] to store scientific data (simulated and experimental raw data + metadata). A big pro is the schema-less approach of CouchDB: we have to be very flexible with the metadata, as the set of parameters changes very often.
Up to now I have some code to feed raw data, plots (both as attachments), and hierarchical metadata (as JSON) into CouchDB documents, and have written some prototype Javascript for filtering and showing. But the filtering is done on the client side (a.k.a. browser): The map function simply returns everything.
How could I change the (or push a second) map function of a specific _design-document with simple browser-JS?
I do not think that a temporary view would yield any performance gain...
Thanks for your time and answers.
[1]: of course it is possible, but is it also useful? feasible? reasonable?
[added]
Ah, the jquery.couch.js (version 0.9.0) provides a saveDoc() function, which could update the _design document with the new map function.
But I also tried out the query function, which uses a temporary view. Okay, "do not use this in the real product, only during development"... But scientific research is steady development, right?
Temporary views are getting cached, as I noticed, and it works well for ~1000 documents per DB. A second plus: all users (think of 1 to 3, so a big user management is quit of an overkill) can work with their own temporary view.

Never ever use temporary views. They are really only there for dev and debugging purposes. For more information, see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views (specifically the bold "NOTE").
And yes, because design documents are really just documents with special powers, you can run you GET/POST/PUT/DELETE methods on them. However, you will usually need admin privileges to do this. So, if you are allowing a client side piece of software to do that, you are making your entire database public for read/write access - this may be fine for your application, but is important to remember.
Ex., if you restrict access to your database, but put the username and password in client side javascript, then anyone can see that username and password.
Cheers.

I´ve written an helper functions for jquery.couch and design docs, take a look at:
https://github.com/grischaandreew/jquery.couch.js

We Keep Coding

JavaScript is the programming language of the Web.