MD5 for seeded-random number generation, better approaches?

MD5 for seeded-random number generation, better approaches? - javascript

I am making a game, it will likely be built in JavaScript - but this question is rather platform agnostic...
The game involves generation of a random campaign, however to dissuade hacking and reduce the amount of storage space needed to save game (which may potentially be cloud-based) I wanted the campaign generation to be seed based.
Trying to think of ways to accomplish this, I considered an MD5 based approach. For example, lets say at the start of the game the user is given the random seed "ABC123". When selecting which level template to use for each game level, I could generate MD5 hashes...
MD5("ABC123" + "level1"); // = 3f19bf4df62494495a3f23bedeb82cce
MD5("ABC123" + "level2"); // = b499e3184b3c23d3478da9783089cc5b
MD5("ABC123" + "level3"); // = cf240d23885e6bd0228677f1f3e1e857
Ideally, there are only 16 templates. There will be more, but for the sake of demonstration if I were to take the first letter from each hash I have a random number out of 16 which I could re-produce with the same seed, forever.
Level 1 for this seed is always "3" (#3), Level 2 is always "b" (#11), Level 3 is always "c" (#12)
This approach has a few drawbacks I'm sure many will be quick to point out...
MD5 generation is CPU intensive, particularly if used in loops etc...
JavaScript doesn't come with an MD5 encryptor - you'll need to DIY...
That only gives you 16 numbers - or 128 if you use another number. How do you 'round' the number to your required range?
I considered this actually. Divide the number by the potential (16, or 128...), then multiply it by the random range needed. As long as the range remains the same, so too will the result... but that too is a constraint...
Given those drawbacks, is there a better approach which is simple, doesn't require an entire framework? In my case all I really need is an MD5 encryptor function and my implementation is basically complete.
Any advice is appreciated. I guess the "chosen answer" will be the suggestions or approach which is the most useful or practical given everything I've mentioned.

I think you overcomplicate the solution.
1) You don't need the MD5 hash. Actually since in your case there is no interest in the statistical quality of the hash, almost any hash function would be satisfactory. You can use any string hash algorithm which is cheaper to evaluate. If you only accept ASCII characters, then the Pearson hash is also an option - it is fast, simple and easy to port to any language.
2) Do you really need string seeds from the user, or a single integer seed is also acceptable? If acceptable, then you can use an integer hash function, which is significantly faster than a string hash algorithm, also very simple and easy to port.
3) Any decent pseudo-random number generator (PRNG) will give you radically different sequence with each different seed value. It means that with the increasing levels you can simply increase the seed by 1 as ++seed and generate random numbers by that. I recommend to use a custom simple and fast random number generator other than JavaScript's Math.random(). You can use some variant of xorshift.
With these 3 points all your listed drawbacks are addressed and no framework needed.
I wouldn't worry about hacking. As #apokryfos pointed out in the comments even your original solution with MD5 is not secure, and I think that level generation in games is not the best example where you need cryptography. Think about, even big title commercial games are hackable.

Related

Am I hashing passwords correctly?

My current project is my first in Node.js (also using MongoDB, Mongoose, and Express, if it matters), and being easily distracted, I have fallen down the rabbit hole of crypto while deciding how to handle user authentication. (No other encryption is needed on this project).
Following the pattern on this page (pattern, not code - I am having problems with installing node.bcrypt but not with node-sodium) and also this page my process is
new user submits password over https
the schema generates a salt
schema hashes a concatenation of the password and salt
schema stores the salt and the password with the user information
Now I don't know if this my personal deficiency, but I am having trouble following the libsodium documentation. node-sodium does not provide any additional information for hashing (though it does have an example for encryption).
This is the code I want to use to generate the hash:
let buf = new Buffer(sodium.crypto_pwhash_STRBYTES);
sodium.randombytes_buf(buf, sodium.crypto_pwhash_STRBYTES);
let salt = buf.toString();
let preBuffer = "somePass" + salt;
let passwordBuf = Buffer.from(preBuffer);
let hash = sodium.crypto_pwhash_str(passwordBuf, sodium.crypto_pwhash_OPSLIMIT_INTERACTIVE, sodium.crypto_pwhash_MEMLIMIT_INTERACTIVE);
So the question is two parts. Is this a good process, and is the code appropriate?

I've used the scrypt-for-humans package in the past for exactly this reason.
https://github.com/joepie91/scrypt-for-humans
Scrypt is a very secure hashing library and this higher level wrapper makes it hard for you to mess anything up. It's also specifically designed for securely hashing passwords so thats a positive as well :)

At the moment the best password hashing algorithm is Argon 2. There is a module called secure-password written by Emil Bay. He talks more about cryptographically secure password hashing and best practices on this podcast. Here is a snippet of what he said about Argon 2.
Normally when you lay out a threat model, perfect security from a mathematical point of view is almost never practical. (In cryptography, can be referred to as perfect secrecy which means, even if you have an enormous computer the size of the universe, it doesn’t matter how big it is, you can never break the security, but that’s not really practical in the real world.) Instead you go for something called computational secrecy. Which means you can break this, but it will cost you too much money and take too much time.
The goal of these hash functions is to make it so expensive to brute force these algorithms that there would be no point in trying. In a threat model, you know that you are not going to get perfect security but can you make it so expensive for your adversary to attack you.
Argon 2 has two parameters that make it immune to large scale GPU attacks. You can control how much memory the function is allowed to use, and you can control how much computation time taken to make a hash. A CPU usually has a lot of memory but a few cores. A GPU has very little memory but thousands of cores. Argon 2 dials up a lot of memory that you can only do about 4 or 8 simultaneous Argon 2 hashes on a single GPU which makes it too expensive to try and crack. In secure-password, I’ve taken the values that Frank Denise who made sodium which it’s built on figured out. It’s within the bounds of an interactive service like a website can afford to create reasonable security without slowing down. To hash a password, you need about 16 or 32 Mb of memory and those parameters can be controlled in Argon 2.

Personally I've used crypto and I do exactly the same 4 steps you are doing right now (after checking a few conditions 7 chars pass, one symbol, one number... ). I'll share the code using crypto.
var salt =rand(160, 36);
var salted_pass = salt + password;
var token = crypto.randomBytes(64).toString('hex'); // I even generate a token for my users
var hashed_password = crypto.createHash('sha512').update(salted_pass).digest("hex");
EDIT: Warning this is not a completly safe method of doing it, as it may turn predictibly. Refer to comments below which explain why it is not a good method.

How can I predict Math.random results?

How can I predict the results from a roulette gaming website csgopolygon.com, given that it is calling Math.random and Math.floor?

Your hunch that it is, in theory, possible to predict the results of Math.random is correct. This is why, if you ever want to build a gaming/gambling application, you should make sure to use a cryptographically secure pseudo-random number generator. If they are using such, then forget about it.
If however you are correct and they are using System.time as the seed to the standard Random generator that comes with Java, there might be a way. It would involve generating millions of numbers sequences with millions of numbers in each sequence, based on seeds corresponding to (future) timestamps, then observing the actual random numbers generated by the website and trying to find the specific sequence among the millions you generated beforehand. If you have a match, you found the seed. And if you have the seed and know where in the sequence they are, you could then theoretically predict the next numbers.
Problems with this approach:
You need to know the exact algorithm they are using, so you can make sure you are using the same
It would take huge amounts of processing power to generate all the sequences
It would take huge amounts of storage to store them
It would take huge amounts of processing power to search the observed sequence among the stored sequences
You don't have the full picture. Even if you found the right seed and position in that seed's sequence, you still need to predict the next number that you will get, but as it's a multiplayer site (I assume), they might be giving that number to another player.
In other answers it is said that predicting the results of Math.random is impossible. This is incorrect. Math.random is actually very predictable, once you know the seed and the iteration (how many numbers were generated since the seed was set). I actually once built a game that generated random levels. I was messing with the seed and found that if I always set the same seed, I would always get the same levels. Which was cool because my game had infinite levels, but level 27 (for example) always looked exactly the same.
However,
They are not using Java. Check out the 'Provably Fair' link at the top. They discuss how you can verify past rolls yourself by executing PHP code.
These guys are smart. They are publishing the old seeds once they dismiss it. This allows you (using the predictable behavior of pseudo-random number generators) to re-generate all rolls and verify that they never tampered with them.
Basically, you want to find the seed that is currently in use... However, point 5 I mentioned above still holds: you don't have the full picture, so how would you predict what roll you would be getting? Apart from that, finding the seed will prove near impossible. These guys know cryptography, so you can bet they are using a secure random number generator.

You can't, and you probably shouldn't develop a gambling addiction as a 16-year-old. That said, even if you could, that site isn't using JavaScript to generate a number, but a server-side language like PHP or ASP.NET.

Generating a short, pseudo-random verifiable alpha numeric code

I have a situation where I need to generate short pseudo-random alphanumeric tokens which are unique, verifiable, and easily type-able by a human. These will be generated from a web app. The tokens don't need to be highly secure - they're used in a silly web game to claim a silly prize. For various reasons, the client wants these tokens to be human-readable and handled via email. This is non-negotiable (I know... but this is how it has to be for reasons beyond my control).
In other words, let's say we get the code "ABCDE12345"
There has to be a way to say "ABCDE12345" is "valid". For example: maybe two or three characters at the start run through an algorithm I write will generate the right sequence of remaining characters. E.g., f("AB")==="CDE12345"
Two people playing the game shouldn't be likely to generate the same token. In my mind, I'd be happy to use the current time in millis + game-character name & score to seed a home-made RNG. (which is to say, NOT use Math.random, since this is a web app). This would seed the two or three character sequence mentioned above.
Am I missing anything? I'm not looking for a concrete algorithm but rather your suggestions. Anything I'm missing?

If you think your token is comparable to an authenticated message saying "give this person a prize" you could look at https://en.wikipedia.org/wiki/Hash-based_message_authentication_code, recoding as necessary with e.g. https://en.wikipedia.org/wiki/Base64 to make the thing printable. Of course, HMAC uses a secret key which you will have to KEEP secret. A public key signature system would not require that you keep the key secret, but I would expect the signature to be longer, and I expect that it is already too long for you if you want non-trivial security.

A simple solution (and easy to hack) would be to generate a meaningful term (one way to achieve such is choose a random article from wikipedia), encrypt it with a pre-known password, and take the least significant x bits.
Now, the key you generate is word-<x bits as a number>.
This is easily verifiable by machine, simply re-encode the word and check if the bits fit, and offers a simple tradeoff of readability vs security (bigger x -> less readable, harder to fake).
Main problem with this approach though, is assuming your game is not communicating with any server, you will need to deploy the preshared secret somehow to your clients, and they will be able to reverse engineer it.

What is the space efficiency of a directed acyclic word graph (dawg)? and is there a javascript implementation?

I have a dictionary of keywords that I want to make available for autocomplete/suggestion on the client side of a web application. The ajax turnaround introduces too much latency, so it would nice to store the entire word list on the client.
The list could be hundreds of thousands of words, maybe a couple of million. I did a little bit of research, and it seams that a dawg structure would provide space and lookup efficiency, but I can't find real world numbers.
Also, feel free to suggest other possibilities for achieving the same functionality.

I have recently implemented DAWG for a wordgame playing program. It uses a dictionary consisting of 2,7 million words from Polish language. Source plain text file is about 33MB in size. The same word list represented as DAWG in binary file takes only 5MB. Actual size may vary, as it depends on implementation, so number of vertices - 154k and number of edges - 411k are more important figures.
Still, that amount of data is far too big to handle by JavaScript, as stated above. Trying to process several MB of data will hang JavaScript interpreter for a few minutes, effectively hanging whole browser.

My mind cringes at the two facts "couple of million" and "JavaScript". JS is meant to shuffle little pieces of data around, not megabytes. Just imagine how long users would have to wait for your page to load!
There must be a reason why AJAX turnaround is so slow in your case. Google serves billion of AJAX requests every day and their type ahead is snappy (just try it on www.google.com). So there must be something broken in your setup. Find it and fix it.

Your solution sounds practical, but you still might want to look at, for example, jQuery's autocomplete implementation(s) to see how they deal with latency.

A couple of million words in memory (in JavaScript in a Browser)? That sounds big regardless of what kind of structure you decide to store it in. Your might consider other kinds of optimizations instead, like loading subsets of your wordlist based on the characters typed.
For example, if the user enters "a" then you'd start retrieving all the words that start with "a". Then you could optimize your wordlist by returning more common words first, so the more likely ones will match up "instantly" while less common words may load a little slower.

from my undestanding, DAWGs are good for storing and searching for words, but not when you need to generate lists of matches. Once you located the prefix, you will have to browser thru all its children to reconstruct the words which start with this prefix.
I agree with others, you should consider server-side search.

Writing a JavaScript zip code validation function

I would like to write a JavaScript function that validates a zip code, by checking if the zip code actually exists. Here is a list of all zip codes:
http://www.census.gov/tiger/tms/gazetteer/zips.txt (I only care about the 2nd column)
This is really a compression problem. I would like to do this for fun. OK, now that's out of the way, here is a list of optimizations over a straight hashtable that I can think of, feel free to add anything I have not thought of:
Break zipcode into 2 parts, first 2 digits and last 3 digits.
Make a giant if-else statement first checking the first 2 digits, then checking ranges within the last 3 digits.
Or, covert the zips into hex, and see if I can do the same thing using smaller groups.
Find out if within the range of all valid zip codes there are more valid zip codes vs invalid zip codes. Write the above code targeting the smaller group.
Break up the hash into separate files, and load them via Ajax as user types in the zipcode. So perhaps break into 2 parts, first for first 2 digits, second for last 3.
Lastly, I plan to generate the JavaScript files using another program, not by hand.
Edit: performance matters here. I do want to use this, if it doesn't suck. Performance of the JavaScript code execution + download time.
Edit 2: JavaScript only solutions please. I don't have access to the application server, plus, that would make this into a whole other problem =)

You could do the unthinkable and treat the code as a number (remember that it's not actually a number). Convert your list into a series of ranges, for example:
zips = [10000, 10001, 10002, 10003, 23001, 23002, 23003, 36001]
// becomes
zips = [[10000,10003], [23001,23003], [36001,36001]]
// make sure to keep this sorted
then to test:
myzip = 23002;
for (i = 0, l = zips.length; i < l; ++i) {
if (myzip >= zips[i][0] && myzip <= zips[i][1]) {
return true;
}
}
return false;
this is just using a very naive linear search (O(n)). If you kept the list sorted and used binary searching, you could achieve O(log n).

I would like to write a JavaScript function that validates a zip code
Might be more effort than it's worth, keeping it updated so that at no point someone's real valid ZIP code is rejected. You could also try an external service, or do what everyone else does and just accept any 5-digit number!
here is a list of optimizations over a straight hashtable that I can think of
Sorry to spoil the potential Fun, but you're probably not going to manage much better actual performance than JavaScript's Object gives you when used as a hashtable. Object member access is one of the most common operations in JS and will be super-optimised; building your own data structures is unlikely to beat it even if they are potentially better structures from a computer science point of view. In particular, anything using ‘Array’ is not going to perform as well as you think because Array is actually implemented as an Object (hashtable) itself.
Having said that, a possible space compression tool if you only need to know 'valid or not' would be to use a 100000-bit bitfield, packed into a string. For example for a space of only 100 ZIP codes, where codes 032-043 are ‘valid’:
var zipfield= '\x00\x00\x00\x00\xFF\x0F\x00\x00\x00\x00\x00\x00\x00';
function isvalid(zip) {
if (!zip.match('[0-9]{3}'))
return false;
var z= parseInt(zip, 10);
return !!( zipfield.charCodeAt(Math.floor(z/8)) & (1<<(z%8)) );
}
Now we just have to work out the most efficient way to get the bitfield to the script. The naive '\x00'-filled version above is pretty inefficient. Conventional approaches to reducing that would be eg. to base64-encode it:
var zipfield= atob('AAAAAP8PAAAAAAAAAA==');
That would get the 100000 flags down to 16.6kB. Unfortunately atob is Mozilla-only, so an additional base64 decoder would be needed for other browsers. (It's not too hard, but it's a bit more startup time to decode.) It might also be possible to use an AJAX request to transfer a direct binary string (encoded in ISO-8859-1 text to responseText). That would get it down to 12.5kB.
But in reality probably anything, even the naive version, would do as long as you served the script using mod_deflate, which would compress away a lot of that redundancy, and also the repetition of '\x00' for all the long ranges of ‘invalid’ codes.

I use Google Maps API to check whether a zipcode exists.
It's more accurate.

Assuming you've got the zips in a sorted array (seems fair if you're controlling the generation of the datastructure), see if a simple binary search is fast enough.

So... You're doing client side validation and want to optimize for file size? you probably cannot beat general compression. Fortunately, most browsers support gzip for you, so you can use that much for free.
How about a simple json coded dict or list with the zip codes in sorted order and do a look up on the dict. it'll compress well, since its a predictable sequence, import easily since it's json, using the browsers in-built parser, and lookup will probably be very fast also, since that's a javascript primitive.

This might be useful:
PHP Zip Code Range and Distance Calculation
As well as List of postal codes.

We Keep Coding

JavaScript is the programming language of the Web.