In Bitwise operation, what does | 0x80 do? I know (& 0xFF) is convert value to 8 bit integer but how about (| 0x80) ?
I have the following code:
const buf = createHash('sha256').update('test').digest()
for (let i = 0; i < n; i++) {
const ubyte = buf.readUInt8(i)
const shifted = (ubyte >> 1) | mask
destBuf.writeUInt8(shifted, i)
mask = (ubyte & 1) * 0x80 // mask is 0 or 128
}
Can anyone explain that for me?
0x... means that what comes next is an hexadecimal number.
0x80 is the hexadecimal representation of the number 128. In binary, this equals 10000000.
The | character is the bitwise or operator. Suppose you have a 8-bit number:
a = xxxxxxxx
with x being either a 0 or a 1. Now, masking this number with 0x80 means:
xxxxxxxx | 10000000 = 1xxxxxxx
So it basically means you will have a 1 for your leftmost significant bit, while keeping all the other bits the same.
Now, in your code you use this mask in the line:
const shifted = (ubyte >> 1) | mask
What this does is takes the number ubyte:
ubyte = xxxxxxxy // x and y can be either 1 or 0
It shifts it right by onw digit:
ubyte >> 1 = zxxxxxxx // y gets lost, and z is a 0 if ubyte was unsigned.
Now it masks this number with your mask. When the mask is 128, the result is:
(ubyte >> 1) | 10000000 = 1xxxxxxx
So you will have a 1 as your most significant bit, and the other bits are unchanged.
It combines the bits of both participating numbers with the logical "or":
const b= 0x7;
function tst(a,b){
console.log(a.toString(2).padStart(8)+" first number: "+a)
console.log(b.toString(2).padStart(8)+" second number: "+b)
console.log((a | b).toString(2).padStart(8)+" bitwise overlay: "+(a|b))
console.log("----")
}
[0x80,0x6A,0x70,0x8f].forEach(a=>tst(a,b))
Related
Im trying to understand the gzip speicifaction (http://www.zlib.org/rfc-gzip.html)
Especial the section 2, Overall conventions:
Bytes stored within a computer do not have a "bit order", since they are always treated as a unit. However, a byte considered as an integer between 0 and 255 does have a most- and least-significant bit, and since we write numbers with the most-significant digit on the left, we also write bytes with the most-significant bit on the left. In the diagrams below, we number the bits of a byte so that bit 0 is the least-significant bit, i.e., the bits are numbered:
+--------+
|76543210|
+--------+
This document does not address the issue of the order in which bits of a byte are transmitted on a bit-sequential medium, since the data format described here is byte- rather than bit-oriented.
Within a computer, a number may occupy multiple bytes. All multi-byte numbers in the format described here are stored with the least-significant byte first (at the lower memory address). For example, the decimal number 520 is stored as:
0 1
+--------+--------+
|00001000|00000010|
+--------+--------+
^ ^
| |
| + more significant byte = 2 x 256
+ less significant byte = 8
The problem that i have is, im not sure how to calcualte the length for the FEXTRA header:
+---+---+=================================+
| XLEN |...XLEN bytes of "extra field"...| (more-->)
+---+---+=================================+
If i have one (sub)-field with a string, length of 1600 bytes (characters) then my complete FEXTRA length should be 1600 (payload) + 2 (SI1&I2, subfield ID), right ?
But the length bytes are set to 73 & 3 and i am not sure why.
Can someone clarify how i can calculate the complete FEXTRA length with the two length bytes ?
Im using nodejs for the operations on the .tgz/.gz file.
Demo code:
const fs = require("fs");
//const bitwise = require("bitwise");
// http://www.zlib.org/rfc-gzip.html
// http://www.onicos.com/staff/iz/formats/gzip.html
// https://de.wikipedia.org/wiki/Gzip
// https://dev.to/somedood/bitmasks-a-very-esoteric-and-impractical-way-of-managing-booleans-1hlf
// https://www.npmjs.com/package/bitwise
// https://stackoverflow.com/questions/1436438/how-do-you-set-clear-and-toggle-a-single-bit-in-javascript
fs.readFile("./test.gz", (err, bytes) => {
if (err) {
console.log(err);
process.exit(100);
}
console.log("bytes: %d", bytes.length);
let header = bytes.slice(0, 10);
let flags = header[3];
let eFlags = header[8];
let OS = header[9];
console.log("Is tarfile:", header[0] === 31 && header[1] === 139);
console.log("compress method:", header[2] === 8 ? "deflate" : "other");
console.log("M-Date: %d%d%d%d", bytes[4], bytes[5], bytes[6], bytes[7]);
console.log("OS", OS);
console.log("flags", flags);
console.log();
// | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
// +---+---+---+---+---+---+---+---+---+---+
// |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
// +---+---+---+---+---+---+---+---+---+---+
//
//
// |10 |11 |
// +---+---+=================================+
// | XLEN |...XLEN bytes of "extra field"...| (more-->)
// +---+---+=================================+
// (if FLG.FEXTRA set)
//
// bit 0 FTEXT
// bit 1 FHCRC
// bit 2 FEXTRA
// bit 3 FNAME
// bit 4 FCOMMENT
// bit 5 reserved
// bit 6 reserved
// bit 7 reserved
// bitwise operation on header flags
const FLAG_RESERVED_3 = (bytes[3] >> 7) & 1;
const FLAG_RESERVED_2 = (bytes[3] >> 6) & 1;
const FLAG_RESERVED_1 = (bytes[3] >> 5) & 1;
const FLAG_COMMENT = (bytes[3] >> 4) & 1;
const FLAG_NAME = (bytes[3] >> 3) & 1;
const FLAG_EXTRA = (bytes[3] >> 2) & 1;
const FLAG_CRC = (bytes[3] >> 1) & 1;
const FLAG_TEXT = (bytes[3] >> 0) & 1;
console.log("FLAG_RESERVED_3", FLAG_RESERVED_3);
console.log("FLAG_RESERVED_2", FLAG_RESERVED_2);
console.log("FLAG_RESERVED_1", FLAG_RESERVED_1);
console.log("FLAG_COMMENT", FLAG_COMMENT);
console.log("FLAG_NAME", FLAG_NAME);
console.log("FLAG_EXTRA", FLAG_EXTRA);
console.log("FLAG_CRC", FLAG_CRC);
console.log("FLAG_TEXT", FLAG_TEXT);
console.log();
if (FLAG_EXTRA) {
let len1 = bytes[10];
let len2 = bytes[11];
console.log("Extra header lenght", len1, len2);
}
});
EDIT 2:
After reading it over and over and over again, i think i got it:
if (FLAG_EXTRA) {
let len1 = bytes[10];
let len2 = bytes[11];
console.log("Extra header lenght", len1 + (len2 * 256));
}
len1 (byte 0) is a nuber till 256, len2 is a multplicator for 256.
len2 * 256 + len1 = FEXTRA header length.
Can someone correct me if im wrong ?!
Thank you guys!
Say you have two integers 10 and 20. That is 00001010 and 00010100. I would then like to just basically concat these as strings, but have the result be a new integer.
00001010 + 00010100 == 0000101000010100
That final number is 2580.
However, I am looking for a way to do this without actually converting them to string. Looking for something more efficient that just does some bit twiddling on the integers themselves. I'm not too familiar with that, but I imagine it would be along the lines of:
var a = 00001010 // == 10
var b = 00010100 // == 20
var c = a << b // == 2580
Note, I would like for this to work with any sequences of bits. So even:
var a = 010101
var b = 01110
var c = a + b == 01010101110
You basic equation is:
c = b + (a << 8).
The trick here is that you need to always shift by 8. But since a and b do not always use all 8 bits in the byte, JavaScript will automatically omit any leading zeros. We need to recover the number of leading zeros (of b), or trailing zeros of a, and prepend them back before adding. This way, all the bits stay in their proper position. This requires an equation like this:
c = b + (a << s + r)
Where s is the highest set bit (going from right to left) in b, and r is the remaining number of bits such that s + r = 8.
Essentially, all you are doing is shifting the first operand a over by 8 bits, to effectively add trailing zeros to a or equally speaking, padding leading zeros to the second operand b. Then you add normally. This can be accomplishing using logarithms, and shifting, and bitwise OR operation to provide an O(1) solution for some arbitrary positive integers a and b where the number of bits in a and b do not exceed some positive integer n. In the case of a byte, n = 8.
// Bitwise log base 2 in O(1) time
function log2(n) {
// Check if n > 0
let bits = 0;
if (n > 0xffff) {
n >>= 16;
bits = 0x10;
}
if (n > 0xff) {
n >>= 8;
bits |= 0x8;
}
if (n > 0xf) {
n >>= 4;
bits |= 0x4;
}
if (n > 0x3) {
n >>= 2;
bits |= 0x2;
}
if (n > 0x1) {
bits |= 0x1;
}
return bits;
}
// Computes the max set bit
// counting from the right to left starting
// at 0. For 20 (10100) we get bit # 4.
function msb(n) {
n |= n >> 1;
n |= n >> 2;
n |= n >> 4;
n |= n >> 8;
n |= n >> 16;
n = n + 1;
// We take the log here because
// n would otherwise be the largest
// magnitude of base 2. So, for 20,
// n+1 would be 16. Which, to
// find the number of bits to shift, we must
// take the log base 2
return log2(n >> 1);
}
// Operands
let a = 0b00001010 // 10
let b = 0b00010100 // 20
// Max number of bits in
// in binary number
let n = 8
// Max set bit is the 16 bit, which is in position
// 4. We will need to pad 4 more zeros
let s = msb(b)
// How many zeros to pad on the left
// 8 - 4 = 4
let r = Math.abs(n - s)
// Shift a over by the computed
// number of bits including padded zeros
let c = b + (a << s + r)
console.log(c)
Output:
2580
Notes:
This is NOT commutative.
Add error checking to log2() for negative numbers, and other edge cases.
References:
https://www.geeksforgeeks.org/find-significant-set-bit-number/
https://github.com/N02870941/java_data_structures/blob/master/src/main/java/util/misc/Mathematics.java
so the problem:
a is 10 (in binary 0000 1010)
b is 20 (in binary 0100 0100)
you want to get 2580 using bit shift somehow.
if you right shift a by 8 using a<<=8 (this is the same as multiplying a by 2^8) you get 1010 0000 0000 which is the same as 10*2^8 = 2560. since the lower bits of a are all 0's (when you use << it fills the new bits with 0) you can just add b on top of it 1010 0000 0000 + 0100 0100 gives you 1010 0001 0100.
so in 1 line of code, it's var result = a<<8 + b. Remember in programming languages, most of them have no explicit built-in types for "binary". But everything is binary in its nature. so int is a "binary", an object is "binary" ....etc. When you want to do some binary operations on some data you can just use the datatype you have as operands for binary operations.
this is a more general version of how to concatenate two numbers' binary representations using no string operations and data
/*
This function concate b to the end of a and put 0's in between them.
b will be treated starting with it's first 1 as its most significant bit
b needs to be bigger than 0, otherwise, Math.log2 will give -Infinity for 0 and NaN for negative b
padding is the number of 0's to add at the end of a
*/
function concate_bits(a, b, padding) {
//add the padding 0's to a
a <<= padding;
//this gets the largest power of 2
var power_of_2 = Math.floor(Math.log2(b));
var power_of_2_value;
while (power_of_2 >= 0) {
power_of_2_value = 2 ** power_of_2;
a <<= 1;
if (b >= power_of_2_value) {
a += 1;
b -= power_of_2_value;
}
power_of_2--;
}
return a;
}
//this will print 2580 as the result
let result = concate_bits(10, 20, 3);
console.log(result);
Note, I would like for this to work with any sequences of bits. So even:
var a = 010101
var b = 01110
var c = a + b == 01010101110
This isn't going to be possible unless you convert to a string or otherwise store the number of bits in each number. 10101 010101 0010101 etc are all the same number (21), and once this is converted to a number, there is no way to tell how many leading zeroes the number originally had.
Looking at these implementations, I am wondering if one could explain the reason behind the specific operations. Not coming from computer science, I am not sure why these decisions were made.
function binb2rstr(input) {
var str = []
for (var i = 0, n = input.length * 32; i < n; i += 8) {
var code = (input[i >> 5] >>> (24 - i % 32)) & 0xFF
var val = String.fromCharCode(code)
str.push(val)
}
return str.join('')
}
function rstr2binb(input) {
var output = Array(input.length >> 2)
for (var i = 0, n = output.length; i < n; i++) {
output[i] = 0
}
for (var i = 0, n = input.length * 8; i < n; i += 8) {
output[i >> 5] |= (input.charCodeAt(i / 8) & 0xFF) << (24 - i % 32)
}
return output
}
What I understand so far are:
i += 8 is for iterating through bytes.
0xFF is 255, which is 2^8 - 1, so 1 byte.
32 which is the size of a word, or 4 bytes
| is bitwise OR, <<, >>>, and & are likewise bit operators.
The % modulus keeps the value within that max value of x = x % max.
What I don't understand is:
i >> 5, how that was picked.
& 0xFF, how that was picked.
24 - i % 32, where the 24 came from.
var code = (input[i >> 5] >>> (24 - i % 32)) & 0xFF, how the character code is computed from that.
input.length >> 2
Wondering if this is just a standard computer science function because it's hard to tell where these variables come from and how this was learned. It seems like these values must be a standard algorithm based on byte length but I can't tell how to get there with these open questions. Thank you for your help.
This code consists of some pretty clever bit-fiddling based on 32-bit values.
But let's work on your points:
i >> 5, how that was picked.
This divides i by 32 --- corresponding to the n = input.length * 32 overall length. Considering the whole algorithm this means that one value is processed four times (0,8,16,24) before selecting the next input value.
& 0xFF, how that was picked.
This simply selects the lowest 8-bit of a n-bit value.
24 - i % 32, where the 24 came from.
This relates to i += 8. The i % 32 indicates four different iterations (32/8=4) which are temp= (0, 8, 16, 24). So 24-temp results in (24,16,8,0).
var code = (input[i >> 5] >>> (24 - i % 32)) & 0xFF, how the character code is computed from that.
1. 1st iteration: i=0 ;24-0=24; input[0] >>> 24 & 0xFF = highest byte of input[0] shifted to lowest
2. 2nd iteration: i=8 ;24-8=16; input[0] >>> 16 & 0xFF = 2nd highest byte of input[0] shifted to 2nd lowest
3. 3rd iteration: i=16;24-16=8; input[0] >>> 8 & 0xFF = 2nd lowest byte of input[0] shifted to 2nd highest
4. 4th iteration: i=8 ;24-24=0; input[0] >>> 0 & 0xFF = lowest byte of input[0] shifted to highest
This was the Big-Endian-Conversion.
The next iteration has i=32 and starts the next iteration input[32/32]=input[1].
Overall this algorithm shifts the 32-bit code to the right and masks the lowest 8-bit to be used as a CharCode by String.fromCharCode(code).
The last one is from a different algorithm and so input.length >> 2 simply does a division by 2 discarding the possible rest of 1.
Concerning your last question:
It seems like these values must be a standard algorithm based on byte length but I can't tell how to get there with these open questions.
This is far from a standard algorithm. It is just a clever bit-manipulation based on bytes.
In assembler this code would be even easier to understand.
There is even one instruction called BSWAP to swap between 32-bit Big-Endian and Little-Endian values in a register.
This is a function where bytes comes from a string, and each letter's char code was grabbed to create an array of bytyes
function checksum(bytes) {
var a = 0;
var b = 0;
for (var i = 0; i < bytes.length; i++) {
a = (a + bytes[i]) % 0xff;
b = (b + a) % 0xff;
}
return (b << 8) | a;
}
I'm mostly confused at the role of % 0xff and b << 8. Could someone help me break this down?
a is equal to the sum of all the elements modulo 255
b is equal to the sum of all the values that a assumes modulo 255 (so 1 time the last element + 2 times the one before that ....)
The final value is a 16 bit number where the higher 8 bits are b and the lower 8 bits are a.
That is actually a Fletcher's checksum
https://en.wikipedia.org/wiki/Fletcher%27s_checksum
This function calculates a and b, and generates the checksum from it.
a is calculated as follows:
0+the value of the current char
a must be between 0 and 254, hence modulo 255 is applied
b is calculated as the cumulative value of a modulo 255
increase the value of b by a
b must be between 0 and 254, hence modulo 255 is applied
at the end the checksum is generated by concatenating a to b
take b's bits and move them 8 places to the left
set a to the right side of b
The result will have a length of 2 bytes (16 bits), where the first byte is b and the second a.
Example:
(c is the current char value)
c | a |b
----------------
8 |8 |8
13 |21 |29
5 |26 |55
0 |26 |81
180 |206 |287
100 |51 |83
cheksum = 51,83 = 0x3353 = 0011 0011 0101 0011
Thanks to everyone in advance:
alert((~1).toString(2));
This outputs: -10
But in PHP/Java it outputs 11111111111111111111111111111110
Am I missing something? Why does Javascript add a "-" to the output?
I know Java uses two's complement to represent negative numbers, and 11111111111111111111111111111110 in binary, which is what ~1 gives, represents -2. Or, represented in binary with a negative sign, -10, which is what you got.
The way you calculate the negative of 10 (in base 2) using two's complement is that you first invert all of the bits, giving you:
11111111111111111111111111111101
then you add 1, giving you:
11111111111111111111111111111110
I guess the same is happening in Javascript.
You can use the shift operator >>> to convert the number to an unsigned integer before converting to binary:
(~1 >>> 0).toString(2) // "11111111111111111111111111111110"
Short answer:
A bitwise NOT (~1) performs a 1's complement conversion of the
decimal which gives us -2
The .toString() function basically takes the decimal without the sign 2, converts it to binary 10 and adds a - sign which gives us -10.
A more detailed answer:
It's in the function .toString(). When you output a number via .toString():
If the numObj is negative, the sign is preserved. This is the case
even if the radix is 2; the string returned is the positive binary
representation of the numObj preceded by a - sign, not the two's
complement of the numObj.
Taken from the developer.mozilla.org we got this formula that calculates the 1's complement of an integer, this is used when you perform a NOT (~) on a decimal:
Bitwise NOTing any number x yields -(x + 1). For example, ~5 yields
-6.
Maybe it's better explained with this table and an example:
+-------------------------+-----+-----+-----+-----+-----+-----+------+
| Base 10 Integer | -3 | -2 | -1 | 0 | 1 | 2 | 3 |
+-------------------------+-----+-----+-----+-----+-----+-----+------+
| Base 10 1's Complement | 2 | 1 | 0 | -1 | -2 | -3 | -4 |
+-------------------------+-----+-----+-----+-----+-----+-----+------+
| Base 2 | | | | 0 | 1 | 10 | 11 |
+-------------------------+-----+-----+-----+-----+-----+-----+------+
| Result ~x.toString(2) | 10 | 1 | 0 | -1 | -10 | -11 | -100 |
+-------------------------+-----+-----+-----+-----+-----+-----+------+
Starting with Base 10 integer "2"
Base 10 integer "2" its 1's Complement is "-3". This is the same as performing a NOT (~)
.toString function take the unsigned value (= "3" in base 10 and = "11" in base 2)
.toString function adds a "-" symbol
.toString outputs "-11"
This assumes that you are working in 32 bits...
var valueToNot = parseInt("11110000", 2);
var notResult = 0xFFFFFFFF - valueToNot;
console.log(notResult.toString(2));
results in
11111111111111111111111100001111
Here's a solution to implement NOT in javascript. It ain't pretty but it works.
// Since ~ is the two's complement, then the one's complement is ~(num -1).
var num = 9;
num.toString(2); //returns 1001
~(num - 1).toString(2); //returns -1001
// WHAT the hell?? I guess the negative sign acts as a sign bit.
If you want to view the Binary String of a decimal after a NOT (bit Toggle), then use the following code.
// Programer: Larry Battle
// Purpose: Provide a bit toggle function for javascript.
var getStrCopy = function (str, copies) {
var newStr = str;
copies = (copies > 0) ? copies : 1;
while (--copies) {
newStr += str;
}
return newStr;
};
var convertDecToBase = function ( dec, base, length, padding ) {
padding = padding || '0' ;
var num = dec.toString( base );
length = length || num.length;
if (num.length !== length) {
if (num.length > length) {
throw new Error("convertDecToBase(): num(" + num + ") > length(" + length + ") too long.");
}
num = getStrCopy( padding, (length - num.length)) + num;
}
return num;
};
var formatBinaryStr = function( str ){
return str.replace( /\d{4}/g, '$& ' ).replace( /\s$/,'');
};
var toggleBits = function( dec, length, doFormat ){
length = length || 8;
var str = convertDecToBase( dec, 2, length || 8 );
var binaryStr = str.replace( /0/g, 'o' ).replace( /1/g, '0').replace( /o/g, '1' );
return ( doFormat ) ? formatBinaryStr( binaryStr ) : binaryStr ;
};
// The following requires Firebug or Google Chrome Dev Tools
clear();
console.log( toggleBits( 1 ) ); // returns "11111110"
console.log( toggleBits( 2 ) ); // returns "11111101"
console.log( toggleBits( 50, 16 ) );// returns "1111111111001101"
console.log( toggleBits( 15, 8, true ) ); // returns "1111 0000"
console.log( toggleBits( 520, 16, true ) ); //returns "1111 1101 1111 0111"