Looking at these implementations, I am wondering if one could explain the reason behind the specific operations. Not coming from computer science, I am not sure why these decisions were made.
function binb2rstr(input) {
var str = []
for (var i = 0, n = input.length * 32; i < n; i += 8) {
var code = (input[i >> 5] >>> (24 - i % 32)) & 0xFF
var val = String.fromCharCode(code)
str.push(val)
}
return str.join('')
}
function rstr2binb(input) {
var output = Array(input.length >> 2)
for (var i = 0, n = output.length; i < n; i++) {
output[i] = 0
}
for (var i = 0, n = input.length * 8; i < n; i += 8) {
output[i >> 5] |= (input.charCodeAt(i / 8) & 0xFF) << (24 - i % 32)
}
return output
}
What I understand so far are:
i += 8 is for iterating through bytes.
0xFF is 255, which is 2^8 - 1, so 1 byte.
32 which is the size of a word, or 4 bytes
| is bitwise OR, <<, >>>, and & are likewise bit operators.
The % modulus keeps the value within that max value of x = x % max.
What I don't understand is:
i >> 5, how that was picked.
& 0xFF, how that was picked.
24 - i % 32, where the 24 came from.
var code = (input[i >> 5] >>> (24 - i % 32)) & 0xFF, how the character code is computed from that.
input.length >> 2
Wondering if this is just a standard computer science function because it's hard to tell where these variables come from and how this was learned. It seems like these values must be a standard algorithm based on byte length but I can't tell how to get there with these open questions. Thank you for your help.
This code consists of some pretty clever bit-fiddling based on 32-bit values.
But let's work on your points:
i >> 5, how that was picked.
This divides i by 32 --- corresponding to the n = input.length * 32 overall length. Considering the whole algorithm this means that one value is processed four times (0,8,16,24) before selecting the next input value.
& 0xFF, how that was picked.
This simply selects the lowest 8-bit of a n-bit value.
24 - i % 32, where the 24 came from.
This relates to i += 8. The i % 32 indicates four different iterations (32/8=4) which are temp= (0, 8, 16, 24). So 24-temp results in (24,16,8,0).
var code = (input[i >> 5] >>> (24 - i % 32)) & 0xFF, how the character code is computed from that.
1. 1st iteration: i=0 ;24-0=24; input[0] >>> 24 & 0xFF = highest byte of input[0] shifted to lowest
2. 2nd iteration: i=8 ;24-8=16; input[0] >>> 16 & 0xFF = 2nd highest byte of input[0] shifted to 2nd lowest
3. 3rd iteration: i=16;24-16=8; input[0] >>> 8 & 0xFF = 2nd lowest byte of input[0] shifted to 2nd highest
4. 4th iteration: i=8 ;24-24=0; input[0] >>> 0 & 0xFF = lowest byte of input[0] shifted to highest
This was the Big-Endian-Conversion.
The next iteration has i=32 and starts the next iteration input[32/32]=input[1].
Overall this algorithm shifts the 32-bit code to the right and masks the lowest 8-bit to be used as a CharCode by String.fromCharCode(code).
The last one is from a different algorithm and so input.length >> 2 simply does a division by 2 discarding the possible rest of 1.
Concerning your last question:
It seems like these values must be a standard algorithm based on byte length but I can't tell how to get there with these open questions.
This is far from a standard algorithm. It is just a clever bit-manipulation based on bytes.
In assembler this code would be even easier to understand.
There is even one instruction called BSWAP to swap between 32-bit Big-Endian and Little-Endian values in a register.
Related
So we have this to break a 32 bit integer into 8-bit chunks:
var chunks = [
(num & 0xff000000) >> 24,
(num & 0x00ff0000) >> 16,
(num & 0x0000ff00) >> 8,
(num & 0x000000ff)
]
How can you tell how many chunks it will be before computing the chunks? Basically I would like to know if it will be 1, 2, 3, or 4 bytes before I chunk it into the array. Some bit trick or something, on the 32-bit integer.
function countBytes(num) {
// ???
}
There are several approaches I can think of, depending on your preference and/or codebase style.
The first one uses more analytical maths that the other and can perform a little worse than the bitwise maths one below:
// We will need a logarithm with base 16 since you are working with hexadecimals
const BASE_16 = Math.log(16);
const log16 = (num) => Math.log(num) / BASE_16;
// This is a function that gives you the number of non-zero chunks you get out
const getNumChunks = (num) => {
// First we grab the base-16 logarithm of the number, that will give us the number of places
// you need to left-shift 16 to get your number.
const numNonZeroes = Math.round(log16(num));
// We need to divide that number by 2 since you are grabbing bits by two
const numChunks = Math.ceil(numNonZeroes / 2);
return numChunks;
}
The second one is strictly bitwise:
const getNumChunks = (num) => {
let probe = 0xff;
let numChunks = 0;
while ((probe & num) || num > probe) {
probe = probe << 8;
numChunks++;
}
return numChunks;
}
JSFiddle here
Or this one liner making use of the clz32 function to determine how many bytes of a 32 bit unsigned int are being utilized...
function numOfBytes( x ) {
return x === 0 ? 1 : (((31 - Math.clz32( x )) / 8) + 1) | 0;
}
Say I have a bitarray like this
101110101010101010101111111011001100100100001011001111101000101001
The two operations I want to do are:
Read a contiguous subset of the bits into a single bitarray (or integer in the case of JavaScript)
Read a noncontigous subset of bits into a single integer in JavaScript.
So for (1), let's say I want this:
1011[101010101010101011111]11011001100100100001011001111101000101001
== 101010101010101011111
= 1398111 in decimal
Bits 4-25 or so.
For (2), I would like to most optimally select a non-contiguous subset of bits and combine them optimally into a final value.
1011[101]0101010101010[11]1111[1]011001100100100001011001111101000101001
== 101 ++ 11 ++ 1
= 101111
= 47 in decimal
Bits 4-6, 21-22, and 27 or so.
What is the right/optimal way of doing this?
The format is still a bit vague, but here's a way to do something like this. I'm making some assumptions that make the problem easier, namely:
At most 32 bits are extracted at once (so it fits in a Number without weird hacks)
Bits are in an Uint32Array (or compatible storage, as long as it has 32 bits per element)
The least significant bit of the 0th entry of the array is number 0 overall
The bit string represented this way is ... + tobits(array[1]) + tobits(array[0]) for example [ 0, 256 ] represents 00000000000000000000000100000000_00000000000000000000000000000000 (the underscore indicates the boundary between the pieces). Maybe that's the wrong way around, it can be changed but this way is simple.
The ith bit is in the i >> 5-th (aka i / 32 but with integer division) word, at offset i & 31 (aka i % 32) within that word. That's what makes this order simple to work with.
By the first assumption, at most 2 entries/words in the array are spanned by the range, so there are only two cases:
The bottom of the range is in one word and the top is in the next.
The range is wholly contained in a single word. Touching a second word should be avoided, as it might be beyond the end of the array. Also even if the second word can be touched, it wouldn't be as easy to deal with the excess bits because shift counts are taken modulo 32 so high << 32 wouldn't do the trick.
In code (not tested)
function extractRange(bits, begin, end) {
// extract bits [begin .. end],
// begin==end extracts a single bit
var beginIdx = begin >> 5;
var endIdx = end >> 5;
var beginOfs = begin & 31;
var endOfs = end & 31;
var len = end - begin + 1;
var msk = -1 >>> (32 - len);
console.assert(len > 0 && len <= 32);
if (beginIdx == endIdx) {
// begin and end are in the same word
// discard the bits before the begin of the range and mask off the high bits
return ((bits[endIdx] >>> beginOfs) & msk) >>> 0;
}
else {
var low = bits[beginIdx];
var high = bits[endIdx];
// the high word contains the high bits of the result, in its lowest bits
// the low word contains the low bits of the result, in its highest bits:
// xxxxhhhh_llllxxxx
// high word must be shifted left by the number of bits in the low word
var bitsInLow = 32 - beginOfs;
return (((low >>> beginOfs) | (high << bitsInLow)) & msk) >>> 0;
}
}
Examples:
[0xdeadbeef, 0xcafebabe] means that the string is really 0xcafebabedeadbeef (in bits)
extractRange([0xdeadbeef, 0xcafebabe], 0, 31).toString(16) =
deadbeef
extractRange([0xdeadbeef, 0xcafebabe], 4, 35).toString(16) =
edeadbee
extractRange([0xdeadbeef, 0xcafebabe], 8, 39).toString(16) =
bedeadbe
extractRange([0xdeadbeef, 0xcafebabe], 60, 63).toString(16) =
c
extractRange([0xdeadbeef, 0xcafebabe], 30, 33).toString(16) =
b // ...ed... in binary 11101101, taking the middle 4 bits, 1011 = b
For non-contiguous ranges, you could extract every individual range and then append them. I don't think there is a nicer way in general.
A streamlined version, that uses only generators up to the final step, thus avoiding loading whole input into memory.
// for single number
function* gUintToBits(input, dim) {
let i = 0;
while (i < dim) {
yield (input >> (dim - 1 - i++)) & 1;
// or like this, if bits should bo from left to right:
// yield (input >> i++) & 1;
}
}
// for array of numbers
function* gUintArrayToBits(input, dim) {
for (let item of input) {
yield* gUintToBits(item, dim);
}
}
// apply intervals mask directly to generator
function* genWithIntervalsApplied(iterOfBits, intervals) {
// fast, if number of intervals is not so big
const isInsideIntervals = (n, itrvs) => {
for (let intv of itrvs) {
if (n >= intv[0] && n < intv[1]) return true;
}
return false;
};
let index = 0;
for (let item of iterOfBits) {
if (isInsideIntervals(index++, intervals)) {
yield item;
}
}
}
// Finally consume the generator
function extractIntervalsFromUint8Array(input, intervals) {
let result = ''
for (let x of genWithIntervalsApplied(gUintArrayToBits(input, 8), intervals)) {
result += `${x}`
}
return result
}
const result = extractIntervalsFromUint8Array(
[1, 3, 9, 127],
[[8, 16], [24, 32]],
);
const dec = parseInt(result, 2);
console.log(result);
console.log(dec);
output:
// 0000001101111111
// 895
Say you have two integers 10 and 20. That is 00001010 and 00010100. I would then like to just basically concat these as strings, but have the result be a new integer.
00001010 + 00010100 == 0000101000010100
That final number is 2580.
However, I am looking for a way to do this without actually converting them to string. Looking for something more efficient that just does some bit twiddling on the integers themselves. I'm not too familiar with that, but I imagine it would be along the lines of:
var a = 00001010 // == 10
var b = 00010100 // == 20
var c = a << b // == 2580
Note, I would like for this to work with any sequences of bits. So even:
var a = 010101
var b = 01110
var c = a + b == 01010101110
You basic equation is:
c = b + (a << 8).
The trick here is that you need to always shift by 8. But since a and b do not always use all 8 bits in the byte, JavaScript will automatically omit any leading zeros. We need to recover the number of leading zeros (of b), or trailing zeros of a, and prepend them back before adding. This way, all the bits stay in their proper position. This requires an equation like this:
c = b + (a << s + r)
Where s is the highest set bit (going from right to left) in b, and r is the remaining number of bits such that s + r = 8.
Essentially, all you are doing is shifting the first operand a over by 8 bits, to effectively add trailing zeros to a or equally speaking, padding leading zeros to the second operand b. Then you add normally. This can be accomplishing using logarithms, and shifting, and bitwise OR operation to provide an O(1) solution for some arbitrary positive integers a and b where the number of bits in a and b do not exceed some positive integer n. In the case of a byte, n = 8.
// Bitwise log base 2 in O(1) time
function log2(n) {
// Check if n > 0
let bits = 0;
if (n > 0xffff) {
n >>= 16;
bits = 0x10;
}
if (n > 0xff) {
n >>= 8;
bits |= 0x8;
}
if (n > 0xf) {
n >>= 4;
bits |= 0x4;
}
if (n > 0x3) {
n >>= 2;
bits |= 0x2;
}
if (n > 0x1) {
bits |= 0x1;
}
return bits;
}
// Computes the max set bit
// counting from the right to left starting
// at 0. For 20 (10100) we get bit # 4.
function msb(n) {
n |= n >> 1;
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n |= n >> 16;
n = n + 1;
// We take the log here because
// n would otherwise be the largest
// magnitude of base 2. So, for 20,
// n+1 would be 16. Which, to
// find the number of bits to shift, we must
// take the log base 2
return log2(n >> 1);
}
// Operands
let a = 0b00001010 // 10
let b = 0b00010100 // 20
// Max number of bits in
// in binary number
let n = 8
// Max set bit is the 16 bit, which is in position
// 4. We will need to pad 4 more zeros
let s = msb(b)
// How many zeros to pad on the left
// 8 - 4 = 4
let r = Math.abs(n - s)
// Shift a over by the computed
// number of bits including padded zeros
let c = b + (a << s + r)
console.log(c)
Output:
2580
Notes:
This is NOT commutative.
Add error checking to log2() for negative numbers, and other edge cases.
References:
https://www.geeksforgeeks.org/find-significant-set-bit-number/
https://github.com/N02870941/java_data_structures/blob/master/src/main/java/util/misc/Mathematics.java
so the problem:
a is 10 (in binary 0000 1010)
b is 20 (in binary 0100 0100)
you want to get 2580 using bit shift somehow.
if you right shift a by 8 using a<<=8 (this is the same as multiplying a by 2^8) you get 1010 0000 0000 which is the same as 10*2^8 = 2560. since the lower bits of a are all 0's (when you use << it fills the new bits with 0) you can just add b on top of it 1010 0000 0000 + 0100 0100 gives you 1010 0001 0100.
so in 1 line of code, it's var result = a<<8 + b. Remember in programming languages, most of them have no explicit built-in types for "binary". But everything is binary in its nature. so int is a "binary", an object is "binary" ....etc. When you want to do some binary operations on some data you can just use the datatype you have as operands for binary operations.
this is a more general version of how to concatenate two numbers' binary representations using no string operations and data
/*
This function concate b to the end of a and put 0's in between them.
b will be treated starting with it's first 1 as its most significant bit
b needs to be bigger than 0, otherwise, Math.log2 will give -Infinity for 0 and NaN for negative b
padding is the number of 0's to add at the end of a
*/
function concate_bits(a, b, padding) {
//add the padding 0's to a
a <<= padding;
//this gets the largest power of 2
var power_of_2 = Math.floor(Math.log2(b));
var power_of_2_value;
while (power_of_2 >= 0) {
power_of_2_value = 2 ** power_of_2;
a <<= 1;
if (b >= power_of_2_value) {
a += 1;
b -= power_of_2_value;
}
power_of_2--;
}
return a;
}
//this will print 2580 as the result
let result = concate_bits(10, 20, 3);
console.log(result);
Note, I would like for this to work with any sequences of bits. So even:
var a = 010101
var b = 01110
var c = a + b == 01010101110
This isn't going to be possible unless you convert to a string or otherwise store the number of bits in each number. 10101 010101 0010101 etc are all the same number (21), and once this is converted to a number, there is no way to tell how many leading zeroes the number originally had.
I am developing a multiplayer game server.
On my case, every single byte that really matter for gaming experience and saving bandwith.
Client and server will send some integer values each other.
Integers mostly will have values lower than 100.
In some cases, that integers could have values between 0 and 100000.
All that integers will be send in same sequence. (Imagine that they are integer array)
Using 8 bit integer array or 16 bit integer array is not an option to me because of possible values greater than 65535.
And, I do not want to use 32 bit integer array just for the values what be in action rarely.
So, I developed an algorithm for that (here is the javascript port):
function write(buffer, number){
while(number > 0x7f){
buffer.push(0x80 | (number & 0x7f));
number >>= 7;
}
buffer.push(number);
}
function read(buffer){
var cur, result = 0, shift = 0x8DC54E1C0; // ((((((28 << 6) | 21) << 6) | 14) << 6) | 7) << 6;
while((cur = buffer.shift()) > 0x7f)
{
result |= (cur & 0x7f) << shift;
shift >>= 6;
}
return result | (cur << shift);
}
var d = [];
var number = 127;
write(d, number);
alert("value bytes: " + d);
var newResult = read(d);
alert("equals : " + (number === newResult));
My question is: Is there a better way to solve that problem ?
Thanks in advance
*"Efficient" here basically means in terms of smaller size (to reduce the IO waiting time), and speedy retrieval/deserialization times. Storing times are not as important.
I have to store a couple of dozen arrays of integers, each with 1800 values in the range 0-50, in the browser's localStorage -- that is, as a string.
Obviously, the simplest method is to just JSON.stringify it, however, that adds a lot of unnecessary information, considering that the ranges of the data is well known. An average size for one of these arrays is then ~5500 bytes.
Here are some other methods I've tried (resultant size, and time to deserialize it 1000 times at the end)
zero-padding the numbers so each was 2 characters long, eg:
[5, 27, 7, 38] ==> "05270738"
base 50 encoding it:
[5, 11, 7, 38] ==> "5b7C"
just using the value as a character code (adding 32 to avoid the weird control characters at the start):
[5, 11, 7, 38] ==> "%+'F" (String.fromCharCode(37), String.fromCharCode(43) ...)
Here are my results:
size Chrome 18 Firefox 11
-------------------------------------------------
JSON.stringify 5286 60ms 99ms
zero-padded 3600 354ms 703ms
base 50 1800 315ms 400ms
charCodes 1800 21ms 178ms
My question is if there is an even better method I haven't yet considered?
Update
MДΓΓБДLL suggested using compression on the data. Combining this LZW implementation with the base 50 and charCode data. I also tested aroth's code (packing 4 integers into 3 bytes). I got these results:
size Chrome 18 Firefox 11
-------------------------------------------------
LZW base 50 1103 494ms 999ms
LZW charCodes 1103 194ms 882ms
bitpacking 1350 2395ms 331ms
If your range is 0-50, then you can pack 4 numbers into 3 bytes (6 bits per number). This would allow you to store 1800 numbers using ~1350 bytes. This code should do it:
window._firstChar = 48;
window.decodeArray = function(encodedText) {
var result = [];
var temp = [];
for (var index = 0; index < encodedText.length; index += 3) {
//skipping bounds checking because the encoded text is assumed to be valid
var firstChar = encodedText.charAt(index).charCodeAt() - _firstChar;
var secondChar = encodedText.charAt(index + 1).charCodeAt() - _firstChar;
var thirdChar = encodedText.charAt(index + 2).charCodeAt() - _firstChar;
temp.push((firstChar >> 2) & 0x3F); //6 bits, 'a'
temp.push(((firstChar & 0x03) << 4) | ((secondChar >> 4) & 0xF)); //2 bits + 4 bits, 'b'
temp.push(((secondChar & 0x0F) << 2) | ((thirdChar >> 6) & 0x3)); //4 bits + 2 bits, 'c'
temp.push(thirdChar & 0x3F); //6 bits, 'd'
}
//filter out 'padding' numbers, if present; this is an extremely inefficient way to do it
for (var index = 0; index < temp.length; index++) {
if(temp[index] != 63) {
result.push(temp[index]);
}
}
return result;
};
window.encodeArray = function(array) {
var encodedData = [];
for (var index = 0; index < dataSet.length; index += 4) {
var num1 = dataSet[index];
var num2 = index + 1 < dataSet.length ? dataSet[index + 1] : 63;
var num3 = index + 2 < dataSet.length ? dataSet[index + 2] : 63;
var num4 = index + 3 < dataSet.length ? dataSet[index + 3] : 63;
encodeSet(num1, num2, num3, num4, encodedData);
}
return encodedData;
};
window.encodeSet = function(a, b, c, d, outArray) {
//we can encode 4 numbers in 3 bytes
var firstChar = ((a & 0x3F) << 2) | ((b >> 4) & 0x03); //6 bits for 'a', 2 from 'b'
var secondChar = ((b & 0x0F) << 4) | ((c >> 2) & 0x0F); //remaining 4 bits from 'b', 4 from 'c'
var thirdChar = ((c & 0x03) << 6) | (d & 0x3F); //remaining 2 bits from 'c', 6 bits for 'd'
//add _firstChar so that all values map to a printable character
outArray.push(String.fromCharCode(firstChar + _firstChar));
outArray.push(String.fromCharCode(secondChar + _firstChar));
outArray.push(String.fromCharCode(thirdChar + _firstChar));
};
Here's a quick example: http://jsfiddle.net/NWyBx/1
Note that storage size can likely be further reduced by applying gzip compression to the resulting string.
Alternately, if the ordering of your numbers is not significant, then you can simply do a bucket-sort using 51 buckets (assuming 0-50 includes both 0 and 50 as valid numbers) and store the counts for each bucket instead of the numbers themselves. That would likely give you better compression and efficiency than any other approach.
Assuming (as in your test) that compression takes more time than the size reduction saves you, your char encoding is the smallest you'll get without bitshifting. You're currently using one byte for each number, but if they're guaranteed to be small enough you could put two numbers in each byte. That would probably be an over-optimization, unless this is a very hot piece of your code.
You might want to consider using Uint8Array or ArrayBuffer. This blogpost shows how it's done. Copying his logic, here's an example, assuming you have an existing Uint8Array named arr.
function arrayBufferToBinaryString(buffer, cb) {
var blobBuilder = new BlobBuilder();
blobBuilder.append(buffer);
var blob = blobBuilder.getBlob();
var reader = new FileReader();
reader.onload = function (e) {
cb(reader.result);
};
reader.readAsBinaryString(blob);
}
arrayBufferToBinaryString(arr.buffer, function(s) {
// do something with s
});