Portable hashCode implementation for binary data - javascript

I am looking for a portable algorithm for creating a hashCode for binary data. None of the binary data is very long -- I am Avro-encoding keys for use in kafka.KeyedMessages -- we're probably talking anywhere from 2 to 100 bytes in length, but most of the keys are in the 4 to 8 byte range.
So far, my best solution is to convert the data to a hex string, and then do a hashCode of that. I'm able to make that work in both Scala and JavaScript. Assuming I have defined b: Array[Byte], the Scala looks like this:
b.map("%02X" format _).mkString.hashCode
It's a little more elaborate in JavaScript -- luckily someone already ported the basic hashCode algorithm to JavaScript -- but the point is being able to create a Hex string to represent the binary data, I can ensure the hashing algorithm works off the same inputs.
On the other hand, I have to create an object twice the size of the original just to create the hashCode. Luckily most of my data is tiny, but still -- there has to be a better way to do this.
Instead of padding the data as its hex value, I presume you could just coerce the binary data into a String so the String has the same number of bytes as the binary data. It would be all garbled, more control characters than printable characters, but it would be a string nonetheless. Do you run into portability issues though? Endian-ness, Unicode, etc.
Incidentally, if you got this far reading and don't already know this -- you can't just do:
val b: Array[Byte] = ...
b.hashCode
Luckily I already knew that before I started, because I ran into that one early on.
Update
Based on the first answer given, it appears at first blush that java.util.Arrays.hashCode(Array[Byte]) would do the trick. However, if you follow the javadoc trail, you'll see that this is the algorithm behind it, which is as based on the algorithm for List and the algorithm for byte combined.
int hashCode = 1;
for (byte e : list) hashCode = 31*hashCode + (e==null ? 0 : e.intValue());
As you can see, all it's doing is creating a Long representing the value. At a certain point, the number gets too big and it wraps around. This is not very portable. I can get it to work for JavaScript, but you have to import the npm module long. If you do, it looks like this:
function bufferHashCode(buffer) {
const Long = require('long');
var hashCode = new Long(1);
for (var value of buff.values()) { hashCode = hashCode.multiply(31).add(value) }
return hashCode
}
bufferHashCode(new Buffer([1,2,3]));
// hashCode = Long { low: 30817, high: 0, unsigned: false }
And you do get the same results when the data wraps around, sort of, though I'm not sure why. In Scala:
java.util.Arrays.hashCode(Array[Byte](1,2,3,4,5,6,7,8,9,10))
// res30: Int = -975991962
Note that the result is an Int. In JavaScript:
bufferHashCode(new Buffer([1,2,3,4,5,6,7,8,9,10]);
// hashCode = Long { low: -975991962, high: 197407, unsigned: false }
So I have to take the low bytes and ignore the high, but otherwise I get the same results.

This functionality is already available in Java standard library, look at the Arrays.hashCode() method.
Because your binary data are Array[Byte], here is how you can verify it works:
println(java.util.Arrays.hashCode(Array[Byte](1,2,3))) // prints 30817
println(java.util.Arrays.hashCode(Array[Byte](1,2,3))) // prints 30817
println(java.util.Arrays.hashCode(Array[Byte](2,2,3))) // prints 31778
Update: It is not true that the Java implementation boxes the bytes. Of course, there is conversion to int, but there's no way around that. This is the Java implementation:
public static int hashCode(byte a[]) {
if (a == null) return 0;
int result = 1;
for (byte element : a) result = 31 * result + element;
return result;
}
Update 2
If what you need is a JavaScript implementation that gives the same results as a Scala/Java implementation, than you can extend the algorithm by, e.g., taking only the rightmost 31 bits:
def hashCode(a: Array[Byte]): Int = {
if (a == null) {
0
} else {
var hash = 1
var i: Int = 0
while (i < a.length) {
hash = 31 * hash + a(i)
hash = hash & Int.MaxValue // taking only the rightmost 31 bits
i += 1
}
hash
}
}
and JavaScript:
var hashCode = function(arr) {
if (arr == null) return 0;
var hash = 1;
for (var i = 0; i < arr.length; i++) {
hash = hash * 31 + arr[i]
hash = hash % 0x80000000 // taking only the rightmost 31 bits in integer representation
}
return hash;
}
Why do the two implementations produce the same results? In Java, integer overflow behaves as if the addition was performed without loss of precision and then bits higher than 32 got thrown away and & Int.MaxValue throws away the 32nd bit. In JavaScript, there is no loss of precision for integers up to 253 which is a limit the expression 31 * hash + a(i) never exceeds. % 0x80000000 then behaves as taking the rightmost 31 bits. The case without overflows is obvious.

This is the meat of algorithm used in the Java library:
int result 1;
for (byte element : a) result = 31 * result + element;
You comment:
this algorithm isn't very portable
Incorrect. If we are talking about Java, then provided that we all agree on the type of the result, then the algorithm is 100% portable.
Yes the computation overflows, but it overflows exactly the same way on all valid implementations of the Java language. A Java int is specified to be 32 bits signed two's complement, and the behavior of the operators when overflow occurs is well-defined ... and the same for all implementations. (The same goes for long ... though the size is different, obviously.)
I'm not an expert, but my understanding is that Scala's numeric types have the same properties as Java. Javascript is different, being based on IEE 754 double precision floating point. However, with case you should be able to code the Java algorithm portably in Javascript. (I think #Mifeet's version is wrong ...)

Related

Hex String to INT32 - Little Endian (DCBA Format) Javascript

Implementing something based on a pathetic documentation without no info nothing.
The example is just this
(7F02AAF7)H => (F7AA027F)H = -139853185
Let's say even if I convert 7F02AAF7 to F7AA027F, then still the output via
'parseInt('F7AA027F', 16)' is different from what I am expecting.
I did some google search and found this website http://www.scadacore.com/field-tools/programming-calculators/online-hex-converter/
Here when you input 7F02AAF7 then it is processed to wanted number under INT32 - Little Endian (DCBA) system. I tried this search term but no luck.
Can you please tell me what exactly am I supposed to do here and is there any node.js library which can help me with this.
You could adapt the excellent answer of T.J. Crowder and use DataView#setUint8 for the given bytes with DataView#getInt32 and an indicator for littleEndian.
var data = '7F02AAF7'.match(/../g);
// Create a buffer
var buf = new ArrayBuffer(4);
// Create a data view of it
var view = new DataView(buf);
// set bytes
data.forEach(function (b, i) {
view.setUint8(i, parseInt(b, 16));
});
// get an int32 with little endian
var num = view.getInt32(0, 1);
console.log(num);
Node can do that pretty easily using buffers:
Buffer.from('7F02AAF7', 'hex').readInt32LE()
JavaScript integers are usually stored as a Number value:
4.3.19
primitive value corresponding to a double-precision 64-bit binary format IEEE 754 value
So the result of parseInt is a float where the value losslessly fits into the fraction part of the float (52 bits capacity). parseInt also doesn't parse it as two-complement notation.
If you want to force anything that you read into 32 bit, then the easiest would be two force it to be automatically converted to 32 bit by applying a binary operation. I would suggest:
parseInt("F7AA027F", 16) | 0
The binary OR (|) with 0 is essentially a no-op, but it converts any integer to 32 bit. This trick is often used in order to convert numbers to 32 bit in order to make calculations on it faster.
Also, this is portable.
In my case, I am trying to send accelerometer data from Arduino to Pi.
The raw data I am reading is that [0x0, 0x0, 0x10, 0xBA].
If you lack knowledge about the topic as me, use the scadacore.com website to find out find your data should correspond to. In my case it is Float - Little Endian (DCBA) which outputs: -0.03126526. Now we know what kind of a conversion that we need.
Then, check available functions based on the language. In my case, Node.js buffer library offers buf.readFloatLE([offset]) function which is the one I need.

Javascript is giving a different answer to same algorithm in Python

I'm working on the Rosalind problem Mortal Fibonacci Rabbits and the website keeps telling me my answer is wrong when I use my algorithm written JavaScript. When I use the same algorithm in Python I get a different (and correct) answer.
The inconsistency only happens when the result gets large. For example fibd(90, 19) returns 2870048561233730600 in JavaScript but in Python I get 2870048561233731259.
Is there something about numbers in JavaScript that give me a different answer or am making a subtle mistake in my JavaScript code?
The JavaScript solution:
function fibd(n, m) {
// Create an array of length m and set all elements to 0
var rp = new Array(m);
rp = rp.map(function(e) { return 0; });
rp[0] = 1;
for (var i = 1; i < n; i++) {
// prepend the sum of all elements from 1 to the end of the array
rp.splice(0, 0, rp.reduce(function (e, s) { return s + e; }) - rp[0]);
// Remove the final element
rp.pop();
}
// Sum up all the elements
return rp.reduce(function (e, s) { return s + e; });
}
The Python solution:
def fibd(n, m):
# Create an array of length m and set all elements to 0
rp = [0] * m
rp[0] = 1
for i in range(n-1):
# The sum of all elements from 1 the end and dropping the final element
rp = [sum(rp[1:])] + rp[:-1]
return sum(rp)
I think Javascript only has a "Number" datatype, and this actually an IEEE double under the hood. 2,870,048,561,233,730,600 is too large to hold precisely in IEEE double, so it is approximated. (Notice the trailing "00" - 17 decimal places is about right for double.)
Python on the other hand has bignum support, and will quite cheerfully deal with 4096 bit integers (for those that play around with cryptographic algorithms, this is a huge boon).
You might will be able to find a Javascript bignum library if you search - for example http://silentmatt.com/biginteger/
Just doing a bit of research, this article seems interesting. Javascript only supports 53bits integers.
The result given by Python is indeed out of the maximum safe range for JS. If you try to do
parseInt('2870048561233731259')
It will indeed return
2870048561233731000

Node.js and 64-bit varints

I'm in the process of writing a Node.js based application which talks via TCP to a C++ based server. The server speaks a binary protocol, quite similar to Protocol Buffers, but not exactly the same.
One data type the server returns is that of a unsigned 64-bit integer (uint64_t), serialized as a varint, where the most significant bit is used to indicate whether the next byte is also part of the int.
I am unable to parse this out in Javascript currently due to the 32-bit limitation on bitwise operations, and also the fact that JS doesn't do 64-bit ints natively. Does anyone have any suggestions on how I could do this?
My varint reading code is very similar to that shown here: https://github.com/chrisdickinson/varint/blob/master/decode.js
I thought I could use node-bignum to represent the number, but I'm unsure how to turn a Buffer consisting of varint bytes into this.
Cheers,
Nathan
Simply took the existing varint read module and modified it to yield a Bignum object instead of a regular number:
Bignum = require('bignum');
module.exports = read;
var MSB = 0x80
, REST = 0x7F;
function read(buf, offset) {
var res = Bignum(0)
, offset = offset || 0
, counter = offset
, b
, shift = 0
, l = buf.length;
do {
if(counter >= l) {
read.bytesRead = 0;
return undefined
}
b = buf[counter++];
res = res.add(Bignum(b & REST).shiftLeft(shift));
shift += 7
} while (b >= MSB);
read.bytes = counter - offset;
return res
}
Use it exactly the same way as you would have used the original decode module.

creating a simple one-way hash

Are there any standard hash functions/methods that maps an arbitrary 9 digit integer into another (unique) 9 digit integer, such that it is somewhat difficult to map back (without using brute force).
Hashes should not collide, so every output 1 ≤ y < 10^9 needs to be mapped from one and only one input value in 1 ≤ x < 10^9.
The problem you describe is really what Format-Preserving Encryption aims to solve.
One standard is currently being worked out by NIST: the new FFX mode of encryption for block ciphers.
It may be more complex than what you expected though. I cannot find any implementation in Javascript, but some examples exist in other languages: here (Python) or here (C++).
You are requiring a non-colliding hash function with only about 30 bits. That's going to be a tall order for any hash function. Actually, what you need is not a Pseudo Random Function such as a hash but a Pseudo Random Permutation.
You could use an encryption function for this, but you would obviously need to keep the key secret. Furthermore, encryption functions normally bits as input and output, and 10^9 is not likely to use an exact number of bits. So if you are going for such an option you may have to use format preserving encryption.
You may also use any other function that is a PRP within the group 0..10^9-1 (after decrementing the value with 1), but if an attacker finds out what parameters you are using then it becomes really simple to revert back to the original. An example would be a multiplication with a number that is relatively prime with 10^9-1, modulo 10^9-1.
This is what i can come up with:
var used = {};
var hash = function (num) {
num = md5(num);
if (used[num] !== undefined) {
return used[num];
} else {
var newNum;
do {
newNum = Math.floor(Math.random() * 1000000000) + 1;
} while (contains(newNum))
used[num] = newNum;
return newNum;
}
};
var contains = function (num) {
for (var i in used) {
if (used[i] === num) {
return true;
}
}
return false;
};
var md5 = function (num) {
//method that return an md5 (or any other) hash
};
I should note however that it will run into problems when you try to hash a lot of different numbers because the do..while will produce random numbers and compare them with already generated numbers. If you have already generated a lot of numbers it will get more and more unlikely to find the remaining ones.

JavaScript 'var' Data/Object Sizes

Does JavaScript optimize the size of variables stored in memory? For instance, will a variable that has a boolean value take up less space than one that has an integer value?
Basically, will the following array:
var array = new Array(8192);
for (var i = 0; i < array.length; i++)
array[i] = true;
be any smaller in the computer's memory than:
var array = new Array(8192);
far (var i = 0; i < array.length; i++)
array[i] = 9;
Short answer: Yes.
Boolean's generally (and it will depend on the user agent and implementation) will take up 4 bytes, while integer's will take up 8.
Check out this other StackOverflow question to see how some others managed to measure memory footprints in JS: JavaScript object size
Edit: Section 8.5 of the ECMAScript Spec states the following:
The Number type has exactly 18437736874454810627 values, representing the doubleprecision 64-bit format IEEE 754 values as specified in the IEEE Standard for Binary Floating-Point Arithmetic
... so all numbers should, regardless of implementation, be 8 bytes.
Well, js has only one number type, which is a 64-bit float. Each character in a string is 16 bits ( src: douglas crockford's , javascript the good parts ). Handling of bools is probably thus interpreter implementation specific. if I remember correctly though, the V8 engine surely handles the 'Boolean' object as a 'c bool'.

Categories