I want to be able to change any string containing any utf-8 into a random number between 0 and 1.
I can convert any seed that is a number with the following:
Math.abs(Math.sin(seed));
From this I'm able to generate a pseudo seeded Math.random()-like number.
So it's converting a string into a number. I looked into using crypto and found that making a digest of the string works but is incredibly slow, and is a bit overkill.
Any ideas on how to accomplish this?
Using String.prototype.charCodeAt() you can generate an integer representation:
function stringToSeed(str){
var values = [];
for (var i = 0, len = str.length; i < len; i++) {
values.push(str.charCodeAt(i));
}
// concatenatte and coerce to integer
return values.join('') + 0;
}
var seed = stringToSeed(string);
You can then pass this seed to sin as you were before.
The thought behind concatenation instead of simply adding the values is to ensure that order is taken into account for randomness, otherwise "AB" and "BA" would produce the same value, for example.
Related
I am looking for a portable algorithm for creating a hashCode for binary data. None of the binary data is very long -- I am Avro-encoding keys for use in kafka.KeyedMessages -- we're probably talking anywhere from 2 to 100 bytes in length, but most of the keys are in the 4 to 8 byte range.
So far, my best solution is to convert the data to a hex string, and then do a hashCode of that. I'm able to make that work in both Scala and JavaScript. Assuming I have defined b: Array[Byte], the Scala looks like this:
b.map("%02X" format _).mkString.hashCode
It's a little more elaborate in JavaScript -- luckily someone already ported the basic hashCode algorithm to JavaScript -- but the point is being able to create a Hex string to represent the binary data, I can ensure the hashing algorithm works off the same inputs.
On the other hand, I have to create an object twice the size of the original just to create the hashCode. Luckily most of my data is tiny, but still -- there has to be a better way to do this.
Instead of padding the data as its hex value, I presume you could just coerce the binary data into a String so the String has the same number of bytes as the binary data. It would be all garbled, more control characters than printable characters, but it would be a string nonetheless. Do you run into portability issues though? Endian-ness, Unicode, etc.
Incidentally, if you got this far reading and don't already know this -- you can't just do:
val b: Array[Byte] = ...
b.hashCode
Luckily I already knew that before I started, because I ran into that one early on.
Update
Based on the first answer given, it appears at first blush that java.util.Arrays.hashCode(Array[Byte]) would do the trick. However, if you follow the javadoc trail, you'll see that this is the algorithm behind it, which is as based on the algorithm for List and the algorithm for byte combined.
int hashCode = 1;
for (byte e : list) hashCode = 31*hashCode + (e==null ? 0 : e.intValue());
As you can see, all it's doing is creating a Long representing the value. At a certain point, the number gets too big and it wraps around. This is not very portable. I can get it to work for JavaScript, but you have to import the npm module long. If you do, it looks like this:
function bufferHashCode(buffer) {
const Long = require('long');
var hashCode = new Long(1);
for (var value of buff.values()) { hashCode = hashCode.multiply(31).add(value) }
return hashCode
}
bufferHashCode(new Buffer([1,2,3]));
// hashCode = Long { low: 30817, high: 0, unsigned: false }
And you do get the same results when the data wraps around, sort of, though I'm not sure why. In Scala:
java.util.Arrays.hashCode(Array[Byte](1,2,3,4,5,6,7,8,9,10))
// res30: Int = -975991962
Note that the result is an Int. In JavaScript:
bufferHashCode(new Buffer([1,2,3,4,5,6,7,8,9,10]);
// hashCode = Long { low: -975991962, high: 197407, unsigned: false }
So I have to take the low bytes and ignore the high, but otherwise I get the same results.
This functionality is already available in Java standard library, look at the Arrays.hashCode() method.
Because your binary data are Array[Byte], here is how you can verify it works:
println(java.util.Arrays.hashCode(Array[Byte](1,2,3))) // prints 30817
println(java.util.Arrays.hashCode(Array[Byte](1,2,3))) // prints 30817
println(java.util.Arrays.hashCode(Array[Byte](2,2,3))) // prints 31778
Update: It is not true that the Java implementation boxes the bytes. Of course, there is conversion to int, but there's no way around that. This is the Java implementation:
public static int hashCode(byte a[]) {
if (a == null) return 0;
int result = 1;
for (byte element : a) result = 31 * result + element;
return result;
}
Update 2
If what you need is a JavaScript implementation that gives the same results as a Scala/Java implementation, than you can extend the algorithm by, e.g., taking only the rightmost 31 bits:
def hashCode(a: Array[Byte]): Int = {
if (a == null) {
0
} else {
var hash = 1
var i: Int = 0
while (i < a.length) {
hash = 31 * hash + a(i)
hash = hash & Int.MaxValue // taking only the rightmost 31 bits
i += 1
}
hash
}
}
and JavaScript:
var hashCode = function(arr) {
if (arr == null) return 0;
var hash = 1;
for (var i = 0; i < arr.length; i++) {
hash = hash * 31 + arr[i]
hash = hash % 0x80000000 // taking only the rightmost 31 bits in integer representation
}
return hash;
}
Why do the two implementations produce the same results? In Java, integer overflow behaves as if the addition was performed without loss of precision and then bits higher than 32 got thrown away and & Int.MaxValue throws away the 32nd bit. In JavaScript, there is no loss of precision for integers up to 253 which is a limit the expression 31 * hash + a(i) never exceeds. % 0x80000000 then behaves as taking the rightmost 31 bits. The case without overflows is obvious.
This is the meat of algorithm used in the Java library:
int result 1;
for (byte element : a) result = 31 * result + element;
You comment:
this algorithm isn't very portable
Incorrect. If we are talking about Java, then provided that we all agree on the type of the result, then the algorithm is 100% portable.
Yes the computation overflows, but it overflows exactly the same way on all valid implementations of the Java language. A Java int is specified to be 32 bits signed two's complement, and the behavior of the operators when overflow occurs is well-defined ... and the same for all implementations. (The same goes for long ... though the size is different, obviously.)
I'm not an expert, but my understanding is that Scala's numeric types have the same properties as Java. Javascript is different, being based on IEE 754 double precision floating point. However, with case you should be able to code the Java algorithm portably in Javascript. (I think #Mifeet's version is wrong ...)
I'm working on the Rosalind problem Mortal Fibonacci Rabbits and the website keeps telling me my answer is wrong when I use my algorithm written JavaScript. When I use the same algorithm in Python I get a different (and correct) answer.
The inconsistency only happens when the result gets large. For example fibd(90, 19) returns 2870048561233730600 in JavaScript but in Python I get 2870048561233731259.
Is there something about numbers in JavaScript that give me a different answer or am making a subtle mistake in my JavaScript code?
The JavaScript solution:
function fibd(n, m) {
// Create an array of length m and set all elements to 0
var rp = new Array(m);
rp = rp.map(function(e) { return 0; });
rp[0] = 1;
for (var i = 1; i < n; i++) {
// prepend the sum of all elements from 1 to the end of the array
rp.splice(0, 0, rp.reduce(function (e, s) { return s + e; }) - rp[0]);
// Remove the final element
rp.pop();
}
// Sum up all the elements
return rp.reduce(function (e, s) { return s + e; });
}
The Python solution:
def fibd(n, m):
# Create an array of length m and set all elements to 0
rp = [0] * m
rp[0] = 1
for i in range(n-1):
# The sum of all elements from 1 the end and dropping the final element
rp = [sum(rp[1:])] + rp[:-1]
return sum(rp)
I think Javascript only has a "Number" datatype, and this actually an IEEE double under the hood. 2,870,048,561,233,730,600 is too large to hold precisely in IEEE double, so it is approximated. (Notice the trailing "00" - 17 decimal places is about right for double.)
Python on the other hand has bignum support, and will quite cheerfully deal with 4096 bit integers (for those that play around with cryptographic algorithms, this is a huge boon).
You might will be able to find a Javascript bignum library if you search - for example http://silentmatt.com/biginteger/
Just doing a bit of research, this article seems interesting. Javascript only supports 53bits integers.
The result given by Python is indeed out of the maximum safe range for JS. If you try to do
parseInt('2870048561233731259')
It will indeed return
2870048561233731000
Are there any standard hash functions/methods that maps an arbitrary 9 digit integer into another (unique) 9 digit integer, such that it is somewhat difficult to map back (without using brute force).
Hashes should not collide, so every output 1 ≤ y < 10^9 needs to be mapped from one and only one input value in 1 ≤ x < 10^9.
The problem you describe is really what Format-Preserving Encryption aims to solve.
One standard is currently being worked out by NIST: the new FFX mode of encryption for block ciphers.
It may be more complex than what you expected though. I cannot find any implementation in Javascript, but some examples exist in other languages: here (Python) or here (C++).
You are requiring a non-colliding hash function with only about 30 bits. That's going to be a tall order for any hash function. Actually, what you need is not a Pseudo Random Function such as a hash but a Pseudo Random Permutation.
You could use an encryption function for this, but you would obviously need to keep the key secret. Furthermore, encryption functions normally bits as input and output, and 10^9 is not likely to use an exact number of bits. So if you are going for such an option you may have to use format preserving encryption.
You may also use any other function that is a PRP within the group 0..10^9-1 (after decrementing the value with 1), but if an attacker finds out what parameters you are using then it becomes really simple to revert back to the original. An example would be a multiplication with a number that is relatively prime with 10^9-1, modulo 10^9-1.
This is what i can come up with:
var used = {};
var hash = function (num) {
num = md5(num);
if (used[num] !== undefined) {
return used[num];
} else {
var newNum;
do {
newNum = Math.floor(Math.random() * 1000000000) + 1;
} while (contains(newNum))
used[num] = newNum;
return newNum;
}
};
var contains = function (num) {
for (var i in used) {
if (used[i] === num) {
return true;
}
}
return false;
};
var md5 = function (num) {
//method that return an md5 (or any other) hash
};
I should note however that it will run into problems when you try to hash a lot of different numbers because the do..while will produce random numbers and compare them with already generated numbers. If you have already generated a lot of numbers it will get more and more unlikely to find the remaining ones.
How would I take a textual input from the user (anything their keyboard would allow them to type), and transfer it to a number?
From there, I would probably take that number, and feed it into a seeded random number generator.
I'm getting the idea from Minecraft's random seed option, but I can't find anything on it.
There are probably more interesting algorithms but this just converts the characters to ints and multiplies it by the position in order to weight them (so that 'abc' is different than 'cba'). You could also use a hash function of some kind as well, but I thought that might be overkill for this purpose.
var input = 'askljfhasjfh', num = 0;
for (var i = 0, len = input.length; i < len; ++i) {
num += input.charCodeAt(i) * (i + 1);
}
console.log(num);
Keep in mind though, you can't seed the Math.random() function in Javascript, it always just uses the current date for the seed.
We are trying to create a random number generator to create serial numbers for products on a virtual assembly line.
We got the random numbers to generate, however since they are serial numbers we don't want it to create duplicates.
Is there a way that it can go back and check to see if the number generated has already been generated, and then to tell it that if it is a duplicate to generate a new number, and to repeat this process until it has a "unique" number.
The point of a serial number is that they're NOT random. Serial, by definition, means that something is arranged in a series. Why not just use an incrementing number?
The easiest way to fix this problem is to avoid it. Use something that is monotonically increasing (like time) to form part of your serial number. To that you can prepend some fixed value that identifies the line or something.
So your serial number format could be NNNNYYYYMMDDHHMMSS, where NNNN is a 4-digit line number and YYYY is the 4 digit year, MM is a 2 digit month, ...
If you can produce multiple things per second per line, then add date components until you get to the point where only one per unit time is possible -- or simply add the count of items produced this day to the YYYYMMDD component (e.g., NNNNYYYYMMDDCCCCCC).
With a truly random number you would have to store the entire collection and review it for each number. Obviously this would mean that your generation would become slower and slower the larger the number of keys you generate (since it would have to retry more and more often and compare to a larger dataset).
This is entirely why truly random numbers just are never used for this purpose. For serial numbers the standard is always to just do a sequential number - is there any real real for them to be random?
Unique IDs are NEVER random - GUIDs and the like are based on the system time and (most often) MAC address. They're globally unique because of the algorithm used and the machine specifics - not because of the size of the value or any level of randomness.
Personally I would do everything I could to either use a sequential value (perhaps with a unique prefix if you have multiple channels) or, better, use a real GUID for your purpose.
is this what you are looking for?
var rArray;
function fillArray (range)
{
rArray = new Array ();
for(var x = 0; x < range; x++)
rArray [x] = x;
}
function randomND (range)
{
if (rArray == null || rArray.length < 1)
fillArray (range);
var pos = Math.floor(Math.random()*rArray.length);
var ran = rArray [pos];
for(var x = pos; x < rArray.length; x++)
rArray [x] = rArray [x+1];
var tempArray = new Array (rArray.length-1)
for(var x = 0; x < tempArray.length; x++)
tempArray [x] = rArray [x];
rArray = tempArray;
return ran;
}