Can you make a non-cryptographically secure random number generator secure? - javascript

This is more of a fundamental question, but the context is specifically in terms of JavaScript. Given that Math.random is not cryptographically secure, can the results still be considered secure when it has been called a certain number of times that cannot be predicted?
So if I was to generate a 32bit number using window.crypto.getRandomValues for example and select one of the digits as an iteration count – calling Math.random that number of times and using the last result, is the result still predictable?
The purpose of this is to generate a set of secure random numbers between 0 and 1 (exclusive) without having the ability to manually seed Math.random.
My initial thoughts are that the result shouldn't be predictable – but I want to make sure I'm not overlooking something crucial.

Here is a simple Math.random()-style CSPRNG drop-in:
Math.randomer=function(){
return crypto.getRandomValues(new Uint32Array(1))[0] / Math.pow(2,32);
};
// usage demo:
alert(Math.randomer());
Unlike the unsafe random(), this code will still rate-limit because of the use of crypto.getRandomValues, but that's probably a good thing, and you can get dozens of KBs a second with this.

Let's start with a warning; just in case
Honestly, I'm not sure why you would want to use something beyond window.crypto.getRandomValues (or its Linux equivalent /dev/random). If you're planning to "stretch" its output for some reason, chances are you're doing it wrong. Whatever your scenario is, don't hardcode such a seed seed into your script before serving it to clients. Not even if your .js file is created dynamically on the server side. That would be as if you would push encrypted data together with your encryption key… voiding any security gains in its root.
That being said, let's look at your question in your line of thinking…
About your idea
The output of math.random is insecure as it produces predictable outputs. Meaning: having a sequence of outputs, an attacker can successfully recover the state and the following outputs it will produce. Seeding it with a cryptographically secure seed from window.crypto.getRandomValues (or its Linux equivalent /dev/random) will not fix that problem.
As a securer approach you might want to take a look at ChaCha20, which is a cryptographically secure stream cipher. It definitely produces securer outputs than math.random and I've seen several pure vanilla implementation of ChaCha20 at Github et al. So, using something "safer" than math.random shouldn't be all too hard to implement in your script(s). Seed ChaCha20 with window.crypto.getRandomValues (or its Linux equivalent /dev/random) as you were planning to do and you're set.
But…
Please note that I haven't dived into the use of Javascript for crypto purposes itself. Doing so tends to introduce attack vectors. Which is why you'ld (at least) need HTTPS when your project is served online. I'll have to skip mentioning all the other related nitpicks… mainly because you didn't mention such details in your question, but also to prevent this answer from getting too broad/long. A quick search at Security.SE tends to enlighten you about using-Javascript-for-crypto related issues.
Instead - use the Web Cryptographic API
Last but not least, I'ld like to get back to what I said for starters and point you to the fact that you might as well simply use window.crypto.getRandomValues (or its Linux equivalent /dev/random) for all randomness purposes. The speed gains of not doing so are minimal in most scenarios.
Crypto is hard… don't break your neck trying to solve problems on your own. Even for Javascript, an applicable solution already exist:
Web Cryptographic API - Example:
/* assuming that window.crypto.getRandomValues is available */
var array = new Uint32Array(10);
window.crypto.getRandomValues(array);
console.log("Your lucky numbers:");
for (var i = 0; i < array.length; i++) {
console.log(array[i]);
}
See, most modern browsers support a minimum of CryptoAPI which allows your clients to call obj.getRandomValues() from within Javascript - which is practically a call to the system's getRandomValues or /dev/random.
The WebCrypto API was enabled by default starting in Chrome 37 (August 26, 2014)
Mozilla Firefox supports it
Internet Explorer 11 supports it
etc.
Some final words regarding polyfills
If you really must support outdated browsers, decent polyfills can close the gap. But when it comes to security, both "using old browsers" as well as "using polyfills" is a nightmare waiting to go wrong. Instead, be professional and educate clients about the fact that its easier to upgrade to a newer browser, than to pick up polyfills and the problems that come with them.
Murphy's Law applies here: When using polyfills for security/cryptography, what can go wrong will go wrong!
In the end, its always better to be safe and not use polyfills just to support some outdated browsers, than to be sorry when stuff hits the fan. A browser update will cost your client a few minutes. A cryptographic polyfill that fails ruins your reputation forever. Remember that!

Related

Writing high-performance Javascript code without getting deoptimised

When writing performance-sensitive code in Javascript which operates on large numeric arrays (think a linear algebra package, operating on integers or floating-point numbers), one always wants the the JIT to help out as much as possible. Roughly this means:
We always want our arrays to be packed SMIs (small integers) or packed Doubles, depending on whether we're doing integer or floating-point calculations.
We always want to be passing the same type of thing to functions, so that they don't get labelled "megamorphic" and deoptimised. For instance, we always want to be calling vec.add(x, y) with both x and y being packed SMI arrays, or both packed Double arrays.
We want functions to be inlined as much as possible.
When one strays outside of these cases, a sudden and drastic performance dropoff occurs. This can happen for various innocuous reasons:
You might turn a packed SMI array into a packed Double array via a seemingly innocuous operation, like the equivalent of myArray.map(x => -x). This is actually the "best" bad case, since packed Double arrays are still very fast.
You might turn a packed array into a generic boxed array, for example by mapping the array over a function which (unexpectedly) returned null or undefined. This bad case is fairly easy to avoid.
You might deoptimise a whole function such as vec.add() by passing in too many types of things and turning it megamorphic. This could happen if you want to do "generic programming", where vec.add() is used both in cases where you're not being careful about types (so it sees a lot of types come in) and in cases where you want to eke out maximum performance (it should only ever receive boxed doubles, for instance).
My question is more of a soft question, about how one writes high-performance Javascript code in light of the considerations above, while still keeping the code nice and readable. Some specific sub-questions so that you know what kind of answer I'm aiming for:
Is there a set of guidelines somewhere on how to program while staying in the world of packed SMI arrays (for instance)?
Is possible to do generic high-performance programming in Javascript without using something like a macro system to inline things like vec.add() into callsites?
How does one modularise high-performance code into libaries in light of things like megamorphic call sites and deoptimisations? For instance, if I am happily using Linear Algebra package A at high speed, and then I import a package B that depends on A, but B calls it with other types and deoptimises it, suddenly (without my code changing) my code runs slower.
Are there any good easy to use measurement tools for checking what the Javascript engine is doing internally with types?
V8 developer here. Given the amount of interest in this question, and the lack of other answers, I can give this a shot; I'm afraid it won't be the answer you were hoping for though.
Is there a set of guidelines somewhere on how to program while staying in the world of packed SMI arrays (for instance)?
Short answer: it's right here: const guidelines = ["keep your integers small enough"].
Longer answer: giving a comprehensive set of guidelines is difficult for various reasons. In general, our opinion is that JavaScript developers should write code that makes sense to them and their use case, and JavaScript engine developers should figure out how to run that code fast on their engines. On the flip side, there are obviously some limitations to that ideal, in the sense that some coding patterns will always have higher performance costs than others, regardless of engine implementation choices and optimization efforts.
When we talk about performance advice, we try to keep that in mind, and carefully estimate what recommendations have a high likelihood of remaining valid across many engines and many years, and also are reasonably idiomatic/non-intrusive.
Getting back to the example at hand: using Smis internally is supposed to be an implementation detail that user code doesn't need to know about. It'll make some cases more efficient, and shouldn't hurt in other cases. Not all engines use Smis (for example, AFAIK Firefox/Spidermonkey historically hasn't; I've heard that for some cases they do use Smis these days; but I don't know any details and can't speak with any authority on the matter). In V8, the size of Smis is an internal detail, and has actually been changing over time and over versions. On 32-bit platforms, which used to be the majority use case, Smis have always been 31-bit signed integers; on 64-bit platforms they used to be 32-bit signed integers, which recently seemed like the most common case, until in Chrome 80 we shipped "pointer compression" for 64-bit architectures, which required lowering Smi size to the 31 bits known from 32-bit platforms. If you happened to have based an implementation on the assumption that Smis are typically 32 bits, you'd get unfortunate situations like this.
Thankfully, as you noted, double arrays are still very fast. For numerics-heavy code, it probably makes sense to assume/target double arrays. Given the prevalence of doubles in JavaScript, it is reasonable to assume that all engines have good support for doubles and double arrays.
Is possible to do generic high-performance programming in Javascript without using something like a macro system to inline things like vec.add() into callsites?
"generic" is generally at odds with "high-performance". This is unrelated to JavaScript, or to specific engine implementations.
"Generic" code means that decisions have to be made at runtime. Every time you execute a function, code has to run to determine, say, "is x an integer? If so, take that code path. Is x a string? Then jump over here. Is it an object? Does it have .valueOf? No? Then maybe .toString()? Maybe on its prototype chain? Call that, and restart from the beginning with its result". "High-performance" optimized code is essentially built on the idea to drop all these dynamic checks; that's only possible when the engine/compiler has some way to infer types ahead of time: if it can prove (or assume with high enough probability) that x is always going to be an integer, then it only needs to generate code for that case (guarded by a type check if unproven assumptions were involved).
Inlining is orthogonal to all this. A "generic" function can still get inlined. In some cases, the compiler might be able to propagate type information into the inlined function to reduce polymorphism there.
(For comparison: C++, being a statically compiled language, has templates to solve a related problem. In short, they let the programmer explicitly instruct the compiler to create specialized copies of functions (or entire classes), parameterized on given types. That's a nice solution for some cases, but not without its own set of drawbacks, for example long compile times and large binaries. JavaScript, of course, has no such thing as templates. You could use eval to build a system that's somewhat similar, but then you'd run into similar drawbacks: you'd have to do the equivalent of the C++ compiler's work at runtime, and you'd have to worry about the sheer amount of code you're generating.)
How does one modularise high-performance code into libaries in light of things like megamorphic call sites and deoptimisations? For instance, if I am happily using Linear Algebra package A at high speed, and then I import a package B that depends on A, but B calls it with other types and deoptimises it, suddenly (without my code changing) my code runs slower.
Yes, that's a general problem with JavaScript. V8 used to implement certain builtins (things like Array.sort) in JavaScript internally, and this problem (which we call "type feedback pollution") was one of the primary reasons why we have entirely moved away from that technique.
That said, for numerical code, there aren't all that many types (only Smis and doubles), and as you noted they should have similar performance in practice, so while type feedback pollution is indeed a theoretical concern, and in some cases can have significant impact, it's also fairly likely that in linear algebra scenarios you won't see a measurable difference.
Also, inside the engine there are many more situations than "one type == fast" and "more than one type == slow". If a given operation has seen both Smis and doubles, that's totally fine. Loading elements from two kinds of arrays is fine too. We use the term "megamorphic" for the situation when a load has seen so many different types that it's given up on tracking them individually and instead uses a more generic mechanism that scales better to large numbers of types -- a function containing such loads can still get optimized. A "deoptimization" is the very specific act of having to throw away optimized code for a function because a new type is seen that hasn't been seen previously, and that the optimized code therefore isn't equipped to handle. But even that is fine: just go back to unoptimized code to collect more type feedback, and optimize again later. If this happens a couple of times, then it's nothing to worry about; it only becomes a problem in pathologically bad cases.
So the summary of all that is: don't worry about it. Just write reasonable code, let the engine deal with it. And by "reasonable", I mean: what makes sense for your use case, is readable, maintainable, uses efficient algorithms, doesn't contain bugs like reading beyond the length of arrays. Ideally, that's all there is to it, and you don't need to do anything else. If it makes you feel better to do something, and/or if you're actually observing performance issues, I can offer two ideas:
Using TypeScript can help. Big fat warning: TypeScript's types are aimed at developer productivity, not execution performance (and as it turns out, those two perspectives have very different requirements from a type system). That said, there is some overlap: e.g. if you consistently annotate things as number, then the TS compiler will warn you if you accidentally put null into an array or function that's supposed to only contain/operate on numbers. Of course, discipline is still required: a single number_func(random_object as number) escape hatch can silently undermine everything, because the correctness of the type annotations is not enforced anywhere.
Using TypedArrays can also help. They have a little more overhead (memory consumption and allocation speed) per array compared to regular JavaScript arrays (so if you need many small arrays, then regular arrays are probably more efficient), and they're less flexible because they can't grow or shrink after allocation, but they do provide the guarantee that all elements have exactly one type.
Are there any good easy to use measurement tools for checking what the Javascript engine is doing internally with types?
No, and that's intentional. As explained above, we don't want you to specifically tailor your code to whatever patterns V8 can optimize particularly well today, and we don't believe that you really want to do that either. That set of things can change in either direction: if there's a pattern you'd love to use, we might optimize for that in a future version (we have previously toyed with the idea of storing unboxed 32-bit integers as array elements... but work on that hasn't started yet, so no promises); and sometimes if there's a pattern we used to optimize for in the past, we might decide to drop that if it gets in the way of other, more important/impactful optimizations. Also, things like inlining heuristics are notoriously difficult to get right, so making the right inlining decision at the right time is an area of ongoing research and corresponding changes to engine/compiler behavior; which makes this another case where it would be unfortunate for everyone (you and us) if you spent a lot of time tweaking your code until some set of current browser versions does approximately the inlining decisions you think (or know?) are best, only to come back half a year later to realize that then-current browsers have changed their heuristics.
You can, of course, always measure performance of your application as a whole -- that's what ultimately matters, not what choices specifically the engine made internally. Beware of microbenchmarks, for they are misleading: if you only extract two lines of code and benchmark those, then chances are that the scenario will be sufficiently different (e.g., different type feedback) that the engine will make very different decisions.

Is there a better way than crypto.randomBytes to generate unique ids in performance-wise?

Node.js documentation strongly discourages the usage of crypto.randomBytes(). However as I read in an answer of StackOverflow, in all methods of random string generation such as using timestamps etc. the best way to achieve highest entropy is crypto.randomBytes().
I would like to use this uuid strategy to generate validation keys in my node.js system. Is there any other better way performance-wise?
If you want to use CSPRNG, not really.
Using uuid was suggested, but it simply calls crypto.randomBytes(16) and converts it to hex string. randomBytes blocking isn't really a problem, because it offers asynchronous api as well (second arg is callback). When generating such small amounts of data, using the sync api might be faster though.
Docs do still mention lack of entropy possibly causing longer block than usual. It should only be a problem right after boot though and even in that case blocking can be avoided by using the asynchronous api.
The crypto.randomBytes() method will not complete until there is sufficient entropy available. This should normally never take longer than a few milliseconds. The only time when generating the random bytes may conceivably block for a longer period of time is right after boot, when the whole system is still low on entropy.

Am I hashing passwords correctly?

My current project is my first in Node.js (also using MongoDB, Mongoose, and Express, if it matters), and being easily distracted, I have fallen down the rabbit hole of crypto while deciding how to handle user authentication. (No other encryption is needed on this project).
Following the pattern on this page (pattern, not code - I am having problems with installing node.bcrypt but not with node-sodium) and also this page my process is
new user submits password over https
the schema generates a salt
schema hashes a concatenation of the password and salt
schema stores the salt and the password with the user information
Now I don't know if this my personal deficiency, but I am having trouble following the libsodium documentation. node-sodium does not provide any additional information for hashing (though it does have an example for encryption).
This is the code I want to use to generate the hash:
let buf = new Buffer(sodium.crypto_pwhash_STRBYTES);
sodium.randombytes_buf(buf, sodium.crypto_pwhash_STRBYTES);
let salt = buf.toString();
let preBuffer = "somePass" + salt;
let passwordBuf = Buffer.from(preBuffer);
let hash = sodium.crypto_pwhash_str(passwordBuf, sodium.crypto_pwhash_OPSLIMIT_INTERACTIVE, sodium.crypto_pwhash_MEMLIMIT_INTERACTIVE);
So the question is two parts. Is this a good process, and is the code appropriate?
I've used the scrypt-for-humans package in the past for exactly this reason.
https://github.com/joepie91/scrypt-for-humans
Scrypt is a very secure hashing library and this higher level wrapper makes it hard for you to mess anything up. It's also specifically designed for securely hashing passwords so thats a positive as well :)
At the moment the best password hashing algorithm is Argon 2. There is a module called secure-password written by Emil Bay. He talks more about cryptographically secure password hashing and best practices on this podcast. Here is a snippet of what he said about Argon 2.
Normally when you lay out a threat model, perfect security from a mathematical point of view is almost never practical. (In cryptography, can be referred to as perfect secrecy which means, even if you have an enormous computer the size of the universe, it doesn’t matter how big it is, you can never break the security, but that’s not really practical in the real world.) Instead you go for something called computational secrecy. Which means you can break this, but it will cost you too much money and take too much time.
The goal of these hash functions is to make it so expensive to brute force these algorithms that there would be no point in trying. In a threat model, you know that you are not going to get perfect security but can you make it so expensive for your adversary to attack you.
Argon 2 has two parameters that make it immune to large scale GPU attacks. You can control how much memory the function is allowed to use, and you can control how much computation time taken to make a hash. A CPU usually has a lot of memory but a few cores. A GPU has very little memory but thousands of cores. Argon 2 dials up a lot of memory that you can only do about 4 or 8 simultaneous Argon 2 hashes on a single GPU which makes it too expensive to try and crack. In secure-password, I’ve taken the values that Frank Denise who made sodium which it’s built on figured out. It’s within the bounds of an interactive service like a website can afford to create reasonable security without slowing down. To hash a password, you need about 16 or 32 Mb of memory and those parameters can be controlled in Argon 2.
Personally I've used crypto and I do exactly the same 4 steps you are doing right now (after checking a few conditions 7 chars pass, one symbol, one number... ). I'll share the code using crypto.
var salt =rand(160, 36);
var salted_pass = salt + password;
var token = crypto.randomBytes(64).toString('hex'); // I even generate a token for my users
var hashed_password = crypto.createHash('sha512').update(salted_pass).digest("hex");
EDIT: Warning this is not a completly safe method of doing it, as it may turn predictibly. Refer to comments below which explain why it is not a good method.

Generating a short, pseudo-random verifiable alpha numeric code

I have a situation where I need to generate short pseudo-random alphanumeric tokens which are unique, verifiable, and easily type-able by a human. These will be generated from a web app. The tokens don't need to be highly secure - they're used in a silly web game to claim a silly prize. For various reasons, the client wants these tokens to be human-readable and handled via email. This is non-negotiable (I know... but this is how it has to be for reasons beyond my control).
In other words, let's say we get the code "ABCDE12345"
There has to be a way to say "ABCDE12345" is "valid". For example: maybe two or three characters at the start run through an algorithm I write will generate the right sequence of remaining characters. E.g., f("AB")==="CDE12345"
Two people playing the game shouldn't be likely to generate the same token. In my mind, I'd be happy to use the current time in millis + game-character name & score to seed a home-made RNG. (which is to say, NOT use Math.random, since this is a web app). This would seed the two or three character sequence mentioned above.
Am I missing anything? I'm not looking for a concrete algorithm but rather your suggestions. Anything I'm missing?
If you think your token is comparable to an authenticated message saying "give this person a prize" you could look at https://en.wikipedia.org/wiki/Hash-based_message_authentication_code, recoding as necessary with e.g. https://en.wikipedia.org/wiki/Base64 to make the thing printable. Of course, HMAC uses a secret key which you will have to KEEP secret. A public key signature system would not require that you keep the key secret, but I would expect the signature to be longer, and I expect that it is already too long for you if you want non-trivial security.
A simple solution (and easy to hack) would be to generate a meaningful term (one way to achieve such is choose a random article from wikipedia), encrypt it with a pre-known password, and take the least significant x bits.
Now, the key you generate is word-<x bits as a number>.
This is easily verifiable by machine, simply re-encode the word and check if the bits fit, and offers a simple tradeoff of readability vs security (bigger x -> less readable, harder to fake).
Main problem with this approach though, is assuming your game is not communicating with any server, you will need to deploy the preshared secret somehow to your clients, and they will be able to reverse engineer it.

efficient search in mongodb v2.4

I'm using version 2.4 of mongodb which is working fine for my needs, except one thing, i.e. searching as it doesn't support some advanced options like $search. So, is there a way to implement that kind of searching in v2.4. The reason i'm sticking to older version is because i don't want to lose any of my data by upgrading and also i don't want to stop live mongo server.
The result i want should be something similar as this query's result:
db.data.find({$text: { $search: 'query' } }, { score: {$meta: "textScore" }})
This query is working fine for latest versions of mongoDB. And also if you people suggest me to use the latest version, please provide some references which can help me safely upgrading mongodb.
This is a little bit of a catch 22, introduced mainly by text search capabilities being considered "experimental" in earlier versions. Aside from being in an earlier development phase, the implementation is entirely different due to the fact that the "whole" query and index API has been re-written for MongoDB 2.6, largely in order to support the new types of indexes available and make the API for working with the data consistent.
So prior versions implement text search via the "command" interface directly and only. Things work a little differently and the current "deprecation" notice means that working in this way will be removed. But the "text" command will presently still operate as shown in the earlier documentation:
db.data.runCommand("text", { "search": "query" })
So there are limitations here as covered in the existing documentation. Notably being that the number of documents returned are those contained the "limit" argument to that command and there is no concept of "skip". Also that this is a "document" response and not a cursor, so the total results cannot exceed the BSON limit of 16MB.
That said, a little off topic but consider your MongDB 2.6 deployment scenario, and mostly on the following.
Future Proofing. In the earlier forms this is an experimental feature. So any general flaws and problems are not going to generally be "backported" in any way with fixes while you hang on to the version. Some may, but without a good reason to do so this mostly wanes over time. Remember this is "experimental" so due warning was given about use in production.
Consistency/Deprecation. The API for "text" and "geospatial" has changed. So implementation in earlier releases is different and "deprecated", and will go away. The right way is to have the same structure as other queries, and consistently use it in all query forms rather than a direct command.
Deployment. You say you don't wan't to stop the server, but you really should not have one server anyway. Apart from being out of the general philosophy of why you need MongoDB anyway, at the very least a "replica set" is a good idea for data redundancy and the "uptime" of your application. Removing a single point of failure means that you can individually "bring down" discrete nodes and "upgrade" without affecting application downtime.
So that strays "a little" off the programming topic, but for me, the last point is the most important. Better to make sure your application is without the failure points by building this into your deployment architecture. This then makes "staying ahead of the curve" a simpler decision. It is always worth noting the "experimental" clause with technologies before rolling out to production. Cover your bases.

Categories