efficient search in mongodb v2.4 - javascript

I'm using version 2.4 of mongodb which is working fine for my needs, except one thing, i.e. searching as it doesn't support some advanced options like $search. So, is there a way to implement that kind of searching in v2.4. The reason i'm sticking to older version is because i don't want to lose any of my data by upgrading and also i don't want to stop live mongo server.
The result i want should be something similar as this query's result:
db.data.find({$text: { $search: 'query' } }, { score: {$meta: "textScore" }})
This query is working fine for latest versions of mongoDB. And also if you people suggest me to use the latest version, please provide some references which can help me safely upgrading mongodb.

This is a little bit of a catch 22, introduced mainly by text search capabilities being considered "experimental" in earlier versions. Aside from being in an earlier development phase, the implementation is entirely different due to the fact that the "whole" query and index API has been re-written for MongoDB 2.6, largely in order to support the new types of indexes available and make the API for working with the data consistent.
So prior versions implement text search via the "command" interface directly and only. Things work a little differently and the current "deprecation" notice means that working in this way will be removed. But the "text" command will presently still operate as shown in the earlier documentation:
db.data.runCommand("text", { "search": "query" })
So there are limitations here as covered in the existing documentation. Notably being that the number of documents returned are those contained the "limit" argument to that command and there is no concept of "skip". Also that this is a "document" response and not a cursor, so the total results cannot exceed the BSON limit of 16MB.
That said, a little off topic but consider your MongDB 2.6 deployment scenario, and mostly on the following.
Future Proofing. In the earlier forms this is an experimental feature. So any general flaws and problems are not going to generally be "backported" in any way with fixes while you hang on to the version. Some may, but without a good reason to do so this mostly wanes over time. Remember this is "experimental" so due warning was given about use in production.
Consistency/Deprecation. The API for "text" and "geospatial" has changed. So implementation in earlier releases is different and "deprecated", and will go away. The right way is to have the same structure as other queries, and consistently use it in all query forms rather than a direct command.
Deployment. You say you don't wan't to stop the server, but you really should not have one server anyway. Apart from being out of the general philosophy of why you need MongoDB anyway, at the very least a "replica set" is a good idea for data redundancy and the "uptime" of your application. Removing a single point of failure means that you can individually "bring down" discrete nodes and "upgrade" without affecting application downtime.
So that strays "a little" off the programming topic, but for me, the last point is the most important. Better to make sure your application is without the failure points by building this into your deployment architecture. This then makes "staying ahead of the curve" a simpler decision. It is always worth noting the "experimental" clause with technologies before rolling out to production. Cover your bases.

Related

Lerna, conventional commits and long term support releases

We have a number of packages in a monorepo, managed by Lerna and with mandatory conventional commits. While everyone's on the same page at the HEAD of master/latest version, things work great. But we now have a need for creating long term support releases, i.e. a major release that we keep backporting fixes to.
How is this supposed to work with Lerna? E.g.
Say I have a#1.0.0 and b#1.0.0 and b depends on a.
I make a breaking change to a and publish giving me a#2.0.0 and also b#1.0.1 due to the version bump.
I discover a bug in a, fix it on master and publish which creates a#2.0.1 and also b#1.0.2.
I create a branch from point 1 above and backport the fix (for purposes of long term support). When I publish it correctly tries to create a#1.0.1 but fails when trying to create b#1.0.1 since that version already exists.
Any ideas?

Can you make a non-cryptographically secure random number generator secure?

This is more of a fundamental question, but the context is specifically in terms of JavaScript. Given that Math.random is not cryptographically secure, can the results still be considered secure when it has been called a certain number of times that cannot be predicted?
So if I was to generate a 32bit number using window.crypto.getRandomValues for example and select one of the digits as an iteration count – calling Math.random that number of times and using the last result, is the result still predictable?
The purpose of this is to generate a set of secure random numbers between 0 and 1 (exclusive) without having the ability to manually seed Math.random.
My initial thoughts are that the result shouldn't be predictable – but I want to make sure I'm not overlooking something crucial.
Here is a simple Math.random()-style CSPRNG drop-in:
Math.randomer=function(){
return crypto.getRandomValues(new Uint32Array(1))[0] / Math.pow(2,32);
};
// usage demo:
alert(Math.randomer());
Unlike the unsafe random(), this code will still rate-limit because of the use of crypto.getRandomValues, but that's probably a good thing, and you can get dozens of KBs a second with this.
Let's start with a warning; just in case
Honestly, I'm not sure why you would want to use something beyond window.crypto.getRandomValues (or its Linux equivalent /dev/random). If you're planning to "stretch" its output for some reason, chances are you're doing it wrong. Whatever your scenario is, don't hardcode such a seed seed into your script before serving it to clients. Not even if your .js file is created dynamically on the server side. That would be as if you would push encrypted data together with your encryption key… voiding any security gains in its root.
That being said, let's look at your question in your line of thinking…
About your idea
The output of math.random is insecure as it produces predictable outputs. Meaning: having a sequence of outputs, an attacker can successfully recover the state and the following outputs it will produce. Seeding it with a cryptographically secure seed from window.crypto.getRandomValues (or its Linux equivalent /dev/random) will not fix that problem.
As a securer approach you might want to take a look at ChaCha20, which is a cryptographically secure stream cipher. It definitely produces securer outputs than math.random and I've seen several pure vanilla implementation of ChaCha20 at Github et al. So, using something "safer" than math.random shouldn't be all too hard to implement in your script(s). Seed ChaCha20 with window.crypto.getRandomValues (or its Linux equivalent /dev/random) as you were planning to do and you're set.
But…
Please note that I haven't dived into the use of Javascript for crypto purposes itself. Doing so tends to introduce attack vectors. Which is why you'ld (at least) need HTTPS when your project is served online. I'll have to skip mentioning all the other related nitpicks… mainly because you didn't mention such details in your question, but also to prevent this answer from getting too broad/long. A quick search at Security.SE tends to enlighten you about using-Javascript-for-crypto related issues.
Instead - use the Web Cryptographic API
Last but not least, I'ld like to get back to what I said for starters and point you to the fact that you might as well simply use window.crypto.getRandomValues (or its Linux equivalent /dev/random) for all randomness purposes. The speed gains of not doing so are minimal in most scenarios.
Crypto is hard… don't break your neck trying to solve problems on your own. Even for Javascript, an applicable solution already exist:
Web Cryptographic API - Example:
/* assuming that window.crypto.getRandomValues is available */
var array = new Uint32Array(10);
window.crypto.getRandomValues(array);
console.log("Your lucky numbers:");
for (var i = 0; i < array.length; i++) {
console.log(array[i]);
}
See, most modern browsers support a minimum of CryptoAPI which allows your clients to call obj.getRandomValues() from within Javascript - which is practically a call to the system's getRandomValues or /dev/random.
The WebCrypto API was enabled by default starting in Chrome 37 (August 26, 2014)
Mozilla Firefox supports it
Internet Explorer 11 supports it
etc.
Some final words regarding polyfills
If you really must support outdated browsers, decent polyfills can close the gap. But when it comes to security, both "using old browsers" as well as "using polyfills" is a nightmare waiting to go wrong. Instead, be professional and educate clients about the fact that its easier to upgrade to a newer browser, than to pick up polyfills and the problems that come with them.
Murphy's Law applies here: When using polyfills for security/cryptography, what can go wrong will go wrong!
In the end, its always better to be safe and not use polyfills just to support some outdated browsers, than to be sorry when stuff hits the fan. A browser update will cost your client a few minutes. A cryptographic polyfill that fails ruins your reputation forever. Remember that!

JS "upgrade" patterns

Are there any patterns for cases when something (in my case it's a filters) is stored on client (e.g. localStorage) and you need to run a script once per user/version to migrate data you store. For example, initially there is a filter saved in localStorage with a key myFilter after some time you decide that you need to separate filters per environment, so you need separate dev-myFilter, train-myFilter, etc. You update your code to work with environment-dependant filters, but there are users who have old myFilter and you want with next deployed version to run script which will update the key of saved filter if there is one.
Question is - what are patterns/best practices for that?
I don't know about "best practices", but the obvious technical solution, just like with any API or storage format, is to store a version number alongside the data. If you didn't do so from the start, assume version == 1 when absent.
You may be able to avoid this if the data structure is so unique between versions that the version can be determined simply by examining it.
Either way, you simply perform the translation whenever you spot that the user's data is in the old format.
The downside of this is that you have to keep checking; for a web application this is unlikely to be a bottleneck, but if you can make your data forward-compatible from the outset then you may save a bit of processing time on each request. But for the data to be useful you've got to read it anyway, so a little branching for as long as you wish to maintain backward-compatibility is, again, unlikely to be a big problem.

PouchDB + Conflict Resolution

I have a really simple question about hard topic:
How does conflict resolution work in PouchDB?
I looked at the documentation, as well as quickly googling, but it didn't help. So, how to do I handle conflict management in my application which is using PouchDB?
Here's how you do it in CouchDB, which you can directly translate into PouchDB terms since the APIs are exactly the same.
You fetch a document, using conflicts=true to ask for conflicts (get() with {conflicts:true} in PouchDB):
http://localhost:5984/db1/foo?conflicts=true
You receive a doc like this:
{
"_id":"foo",
"_rev":"2-f3d4c66dcd7596419c76b2498b3ba21f",
"notgonnawork":"this is from the second db",
"_conflicts":["2-c1592ce7b31cc26e91d2f2029c57e621"]
}
There is a conflict introduced from another database, and that database's revision has (randomly) won. If you used bi-directional replication, both databases will provide this same answer.
Notice that both revisions start with "2-." This indicates that they are both the second revision to the document, and they both live at the same level of the revision tree.
Using the revision ID, you fetch the conflicting version (get() with {rev=...} in PouchDB:
http://localhost:5984/db1/foo?rev=2-c1592ce7b31cc26e91d2f2029c57e621
You receive:
{
"_id":"foo",
"_rev":"2-c1592ce7b31cc26e91d2f2029c57e621",
"notgonnawork":"this is from the first database"
}
After presenting the two conflicting versions to the user, you can then PUT (put()) a third revision on top of both of these. Your third version can combine the results, choose the loser, or whatever you want.
Advanced reading:
The CouchDB docs on conflict resolution
The CouchDB wiki page on conflicts.
Understanding CouchDB Conflicts by Jan Lenhardt

CouchDB: How to change view function via javascript?

I am playing around with CouchDB to test if it is "possible" [1] to store scientific data (simulated and experimental raw data + metadata). A big pro is the schema-less approach of CouchDB: we have to be very flexible with the metadata, as the set of parameters changes very often.
Up to now I have some code to feed raw data, plots (both as attachments), and hierarchical metadata (as JSON) into CouchDB documents, and have written some prototype Javascript for filtering and showing. But the filtering is done on the client side (a.k.a. browser): The map function simply returns everything.
How could I change the (or push a second) map function of a specific _design-document with simple browser-JS?
I do not think that a temporary view would yield any performance gain...
Thanks for your time and answers.
[1]: of course it is possible, but is it also useful? feasible? reasonable?
[added]
Ah, the jquery.couch.js (version 0.9.0) provides a saveDoc() function, which could update the _design document with the new map function.
But I also tried out the query function, which uses a temporary view. Okay, "do not use this in the real product, only during development"... But scientific research is steady development, right?
Temporary views are getting cached, as I noticed, and it works well for ~1000 documents per DB. A second plus: all users (think of 1 to 3, so a big user management is quit of an overkill) can work with their own temporary view.
Never ever use temporary views. They are really only there for dev and debugging purposes. For more information, see http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views (specifically the bold "NOTE").
And yes, because design documents are really just documents with special powers, you can run you GET/POST/PUT/DELETE methods on them. However, you will usually need admin privileges to do this. So, if you are allowing a client side piece of software to do that, you are making your entire database public for read/write access - this may be fine for your application, but is important to remember.
Ex., if you restrict access to your database, but put the username and password in client side javascript, then anyone can see that username and password.
Cheers.
I´ve written an helper functions for jquery.couch and design docs, take a look at:
https://github.com/grischaandreew/jquery.couch.js

Categories