understandment: byte order in protocol specification (gzip)

understandment: byte order in protocol specification (gzip) - javascript

Im trying to understand the gzip speicifaction (http://www.zlib.org/rfc-gzip.html)
Especial the section 2, Overall conventions:
Bytes stored within a computer do not have a "bit order", since they are always treated as a unit. However, a byte considered as an integer between 0 and 255 does have a most- and least-significant bit, and since we write numbers with the most-significant digit on the left, we also write bytes with the most-significant bit on the left. In the diagrams below, we number the bits of a byte so that bit 0 is the least-significant bit, i.e., the bits are numbered:
+--------+
|76543210|
+--------+
This document does not address the issue of the order in which bits of a byte are transmitted on a bit-sequential medium, since the data format described here is byte- rather than bit-oriented.
Within a computer, a number may occupy multiple bytes. All multi-byte numbers in the format described here are stored with the least-significant byte first (at the lower memory address). For example, the decimal number 520 is stored as:
0 1
+--------+--------+
|00001000|00000010|
+--------+--------+
^ ^
| |
| + more significant byte = 2 x 256
+ less significant byte = 8
The problem that i have is, im not sure how to calcualte the length for the FEXTRA header:
+---+---+=================================+
| XLEN |...XLEN bytes of "extra field"...| (more-->)
+---+---+=================================+
If i have one (sub)-field with a string, length of 1600 bytes (characters) then my complete FEXTRA length should be 1600 (payload) + 2 (SI1&I2, subfield ID), right ?
But the length bytes are set to 73 & 3 and i am not sure why.
Can someone clarify how i can calculate the complete FEXTRA length with the two length bytes ?
Im using nodejs for the operations on the .tgz/.gz file.
Demo code:
const fs = require("fs");
//const bitwise = require("bitwise");
// http://www.zlib.org/rfc-gzip.html
// http://www.onicos.com/staff/iz/formats/gzip.html
// https://de.wikipedia.org/wiki/Gzip
// https://dev.to/somedood/bitmasks-a-very-esoteric-and-impractical-way-of-managing-booleans-1hlf
// https://www.npmjs.com/package/bitwise
// https://stackoverflow.com/questions/1436438/how-do-you-set-clear-and-toggle-a-single-bit-in-javascript
fs.readFile("./test.gz", (err, bytes) => {
if (err) {
console.log(err);
process.exit(100);
}
console.log("bytes: %d", bytes.length);
let header = bytes.slice(0, 10);
let flags = header[3];
let eFlags = header[8];
let OS = header[9];
console.log("Is tarfile:", header[0] === 31 && header[1] === 139);
console.log("compress method:", header[2] === 8 ? "deflate" : "other");
console.log("M-Date: %d%d%d%d", bytes[4], bytes[5], bytes[6], bytes[7]);
console.log("OS", OS);
console.log("flags", flags);
console.log();
// | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
// +---+---+---+---+---+---+---+---+---+---+
// |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
// +---+---+---+---+---+---+---+---+---+---+
//
//
// |10 |11 |
// +---+---+=================================+
// | XLEN |...XLEN bytes of "extra field"...| (more-->)
// +---+---+=================================+
// (if FLG.FEXTRA set)
//
// bit 0 FTEXT
// bit 1 FHCRC
// bit 2 FEXTRA
// bit 3 FNAME
// bit 4 FCOMMENT
// bit 5 reserved
// bit 6 reserved
// bit 7 reserved
// bitwise operation on header flags
const FLAG_RESERVED_3 = (bytes[3] >> 7) & 1;
const FLAG_RESERVED_2 = (bytes[3] >> 6) & 1;
const FLAG_RESERVED_1 = (bytes[3] >> 5) & 1;
const FLAG_COMMENT = (bytes[3] >> 4) & 1;
const FLAG_NAME = (bytes[3] >> 3) & 1;
const FLAG_EXTRA = (bytes[3] >> 2) & 1;
const FLAG_CRC = (bytes[3] >> 1) & 1;
const FLAG_TEXT = (bytes[3] >> 0) & 1;
console.log("FLAG_RESERVED_3", FLAG_RESERVED_3);
console.log("FLAG_RESERVED_2", FLAG_RESERVED_2);
console.log("FLAG_RESERVED_1", FLAG_RESERVED_1);
console.log("FLAG_COMMENT", FLAG_COMMENT);
console.log("FLAG_NAME", FLAG_NAME);
console.log("FLAG_EXTRA", FLAG_EXTRA);
console.log("FLAG_CRC", FLAG_CRC);
console.log("FLAG_TEXT", FLAG_TEXT);
console.log();
if (FLAG_EXTRA) {
let len1 = bytes[10];
let len2 = bytes[11];
console.log("Extra header lenght", len1, len2);
}
});
EDIT 2:
After reading it over and over and over again, i think i got it:
if (FLAG_EXTRA) {
let len1 = bytes[10];
let len2 = bytes[11];
console.log("Extra header lenght", len1 + (len2 * 256));
}
len1 (byte 0) is a nuber till 256, len2 is a multplicator for 256.
len2 * 256 + len1 = FEXTRA header length.
Can someone correct me if im wrong ?!
Thank you guys!

Related

Converting from a Uint8Array to an ArrayBuffer prepends 32 bytes?

Consider the input bytes that is a 400byte Uint8Array. The following code converts it first to an ArrayBuffer via the .buffer method and subsequently to a Float32Array :
static bytesToFloatArray(bytes) {
let buffer = bytes.buffer; // Get the ArrayBuffer from the Uint8Array.
let floats = new Float32Array(buffer);
return floats
}
The surprise is that the conversion to the ArrayBuffer prepends 32 bytes - which is reflected in having 8 extra "0" float values at the beginning of the subsequent Float32Array :
Why does the buffer method add the 32 bytes - and how can that be avoided (or corrected) ?

Why does the buffer method add the 32 bytes?
It didn't. The buffer had 432 bytes in the first place, even before the Uint8Array was created on it. The difference comes from the typed array using an offset and/or a length which essentially restrict the view to a slice of the buffer.
And how can that be avoided (or corrected)?
Use the same offset and adjusted length:
function bytesToFloatArray(bytes) {
return new Float32Array(bytes.buffer, bytes.byteOffset, bytes.byteLength/Float32Array.BYTES_PER_ELEMENT);
}

Post: #Bergi's answer is correct, I did not even look at the buffer Uint8 was created from.
I was not able to reproduce the behavior you had in chrome console, but you can always resort to using DataView to have fine grained control, something like this:
(beware of the endianness, and I did not test the code below, I might have done mistake in the byte orders)
let test8 = new Uint8Array(400);
test8.forEach((d,i,a) => a[i] = 0xff * Math.random() | 0);
function c8to32(uint8, endian = false){ //big endian by default
var cpBuff = new ArrayBuffer(uint8.length),
view = new DataView(cpBuff);
for (let i = 0,v = void(0); i < uint8.length; i += 4){
if(!endian) { //big
v = uint8[i] << 24
| uint8[i + 1] << 16
| uint8[i + 2] << 8
| uint8[i + 3];
} else { //little
v = uint8[i]
| uint8[i + 1] << 8
| uint8[i + 2] << 16
| uint8[i + 3] << 24;
}
view.setFloat32(
i,
v,
endian
);
}
return cpBuff;
}
document.getElementById("resultbig").textContent = c8to32(test8).byteLength;
document.getElementById("resultlittle").textContent = c8to32(test8, true).byteLength;
<div id="resultbig">test</div>
<div id="resultlittle">test</div>

What does OR 0x80 do?

In Bitwise operation, what does | 0x80 do? I know (& 0xFF) is convert value to 8 bit integer but how about (| 0x80) ?
I have the following code:
const buf = createHash('sha256').update('test').digest()
for (let i = 0; i < n; i++) {
const ubyte = buf.readUInt8(i)
const shifted = (ubyte >> 1) | mask
destBuf.writeUInt8(shifted, i)
mask = (ubyte & 1) * 0x80 // mask is 0 or 128
}
Can anyone explain that for me?

0x... means that what comes next is an hexadecimal number.
0x80 is the hexadecimal representation of the number 128. In binary, this equals 10000000.
The | character is the bitwise or operator. Suppose you have a 8-bit number:
a = xxxxxxxx
with x being either a 0 or a 1. Now, masking this number with 0x80 means:
xxxxxxxx | 10000000 = 1xxxxxxx
So it basically means you will have a 1 for your leftmost significant bit, while keeping all the other bits the same.
Now, in your code you use this mask in the line:
const shifted = (ubyte >> 1) | mask
What this does is takes the number ubyte:
ubyte = xxxxxxxy // x and y can be either 1 or 0
It shifts it right by onw digit:
ubyte >> 1 = zxxxxxxx // y gets lost, and z is a 0 if ubyte was unsigned.
Now it masks this number with your mask. When the mask is 128, the result is:
(ubyte >> 1) | 10000000 = 1xxxxxxx
So you will have a 1 as your most significant bit, and the other bits are unchanged.

It combines the bits of both participating numbers with the logical "or":
const b= 0x7;
function tst(a,b){
console.log(a.toString(2).padStart(8)+" first number: "+a)
console.log(b.toString(2).padStart(8)+" second number: "+b)
console.log((a | b).toString(2).padStart(8)+" bitwise overlay: "+(a|b))
console.log("----")
}
[0x80,0x6A,0x70,0x8f].forEach(a=>tst(a,b))

How to calculate the number of bytes a 32-bit number will take when chunked into bytes

So we have this to break a 32 bit integer into 8-bit chunks:
var chunks = [
(num & 0xff000000) >> 24,
(num & 0x00ff0000) >> 16,
(num & 0x0000ff00) >> 8,
(num & 0x000000ff)
]
How can you tell how many chunks it will be before computing the chunks? Basically I would like to know if it will be 1, 2, 3, or 4 bytes before I chunk it into the array. Some bit trick or something, on the 32-bit integer.
function countBytes(num) {
// ???
}

There are several approaches I can think of, depending on your preference and/or codebase style.
The first one uses more analytical maths that the other and can perform a little worse than the bitwise maths one below:
// We will need a logarithm with base 16 since you are working with hexadecimals
const BASE_16 = Math.log(16);
const log16 = (num) => Math.log(num) / BASE_16;
// This is a function that gives you the number of non-zero chunks you get out
const getNumChunks = (num) => {
// First we grab the base-16 logarithm of the number, that will give us the number of places
// you need to left-shift 16 to get your number.
const numNonZeroes = Math.round(log16(num));
// We need to divide that number by 2 since you are grabbing bits by two
const numChunks = Math.ceil(numNonZeroes / 2);
return numChunks;
}
The second one is strictly bitwise:
const getNumChunks = (num) => {
let probe = 0xff;
let numChunks = 0;
while ((probe & num) || num > probe) {
probe = probe << 8;
numChunks++;
}
return numChunks;
}
JSFiddle here

Or this one liner making use of the clz32 function to determine how many bytes of a 32 bit unsigned int are being utilized...
function numOfBytes( x ) {
return x === 0 ? 1 : (((31 - Math.clz32( x )) / 8) + 1) | 0;
}

Decreasing byte size of integer block in socket communication

I am developing a multiplayer game server.
On my case, every single byte that really matter for gaming experience and saving bandwith.
Client and server will send some integer values each other.
Integers mostly will have values lower than 100.
In some cases, that integers could have values between 0 and 100000.
All that integers will be send in same sequence. (Imagine that they are integer array)
Using 8 bit integer array or 16 bit integer array is not an option to me because of possible values greater than 65535.
And, I do not want to use 32 bit integer array just for the values what be in action rarely.
So, I developed an algorithm for that (here is the javascript port):
function write(buffer, number){
while(number > 0x7f){
buffer.push(0x80 | (number & 0x7f));
number >>= 7;
}
buffer.push(number);
}
function read(buffer){
var cur, result = 0, shift = 0x8DC54E1C0; // ((((((28 << 6) | 21) << 6) | 14) << 6) | 7) << 6;
while((cur = buffer.shift()) > 0x7f)
{
result |= (cur & 0x7f) << shift;
shift >>= 6;
}
return result | (cur << shift);
}
var d = [];
var number = 127;
write(d, number);
alert("value bytes: " + d);
var newResult = read(d);
alert("equals : " + (number === newResult));
My question is: Is there a better way to solve that problem ?
Thanks in advance

Most efficient way to store large arrays of integers in localStorage with Javascript

*"Efficient" here basically means in terms of smaller size (to reduce the IO waiting time), and speedy retrieval/deserialization times. Storing times are not as important.
I have to store a couple of dozen arrays of integers, each with 1800 values in the range 0-50, in the browser's localStorage -- that is, as a string.
Obviously, the simplest method is to just JSON.stringify it, however, that adds a lot of unnecessary information, considering that the ranges of the data is well known. An average size for one of these arrays is then ~5500 bytes.
Here are some other methods I've tried (resultant size, and time to deserialize it 1000 times at the end)
zero-padding the numbers so each was 2 characters long, eg:
[5, 27, 7, 38] ==> "05270738"
base 50 encoding it:
[5, 11, 7, 38] ==> "5b7C"
just using the value as a character code (adding 32 to avoid the weird control characters at the start):
[5, 11, 7, 38] ==> "%+'F" (String.fromCharCode(37), String.fromCharCode(43) ...)
Here are my results:
size Chrome 18 Firefox 11
-------------------------------------------------
JSON.stringify 5286 60ms 99ms
zero-padded 3600 354ms 703ms
base 50 1800 315ms 400ms
charCodes 1800 21ms 178ms
My question is if there is an even better method I haven't yet considered?
Update
MДΓΓБДLL suggested using compression on the data. Combining this LZW implementation with the base 50 and charCode data. I also tested aroth's code (packing 4 integers into 3 bytes). I got these results:
size Chrome 18 Firefox 11
-------------------------------------------------
LZW base 50 1103 494ms 999ms
LZW charCodes 1103 194ms 882ms
bitpacking 1350 2395ms 331ms

If your range is 0-50, then you can pack 4 numbers into 3 bytes (6 bits per number). This would allow you to store 1800 numbers using ~1350 bytes. This code should do it:
window._firstChar = 48;
window.decodeArray = function(encodedText) {
var result = [];
var temp = [];
for (var index = 0; index < encodedText.length; index += 3) {
//skipping bounds checking because the encoded text is assumed to be valid
var firstChar = encodedText.charAt(index).charCodeAt() - _firstChar;
var secondChar = encodedText.charAt(index + 1).charCodeAt() - _firstChar;
var thirdChar = encodedText.charAt(index + 2).charCodeAt() - _firstChar;
temp.push((firstChar >> 2) & 0x3F); //6 bits, 'a'
temp.push(((firstChar & 0x03) << 4) | ((secondChar >> 4) & 0xF)); //2 bits + 4 bits, 'b'
temp.push(((secondChar & 0x0F) << 2) | ((thirdChar >> 6) & 0x3)); //4 bits + 2 bits, 'c'
temp.push(thirdChar & 0x3F); //6 bits, 'd'
}
//filter out 'padding' numbers, if present; this is an extremely inefficient way to do it
for (var index = 0; index < temp.length; index++) {
if(temp[index] != 63) {
result.push(temp[index]);
}
}
return result;
};
window.encodeArray = function(array) {
var encodedData = [];
for (var index = 0; index < dataSet.length; index += 4) {
var num1 = dataSet[index];
var num2 = index + 1 < dataSet.length ? dataSet[index + 1] : 63;
var num3 = index + 2 < dataSet.length ? dataSet[index + 2] : 63;
var num4 = index + 3 < dataSet.length ? dataSet[index + 3] : 63;
encodeSet(num1, num2, num3, num4, encodedData);
}
return encodedData;
};
window.encodeSet = function(a, b, c, d, outArray) {
//we can encode 4 numbers in 3 bytes
var firstChar = ((a & 0x3F) << 2) | ((b >> 4) & 0x03); //6 bits for 'a', 2 from 'b'
var secondChar = ((b & 0x0F) << 4) | ((c >> 2) & 0x0F); //remaining 4 bits from 'b', 4 from 'c'
var thirdChar = ((c & 0x03) << 6) | (d & 0x3F); //remaining 2 bits from 'c', 6 bits for 'd'
//add _firstChar so that all values map to a printable character
outArray.push(String.fromCharCode(firstChar + _firstChar));
outArray.push(String.fromCharCode(secondChar + _firstChar));
outArray.push(String.fromCharCode(thirdChar + _firstChar));
};
Here's a quick example: http://jsfiddle.net/NWyBx/1
Note that storage size can likely be further reduced by applying gzip compression to the resulting string.
Alternately, if the ordering of your numbers is not significant, then you can simply do a bucket-sort using 51 buckets (assuming 0-50 includes both 0 and 50 as valid numbers) and store the counts for each bucket instead of the numbers themselves. That would likely give you better compression and efficiency than any other approach.

Assuming (as in your test) that compression takes more time than the size reduction saves you, your char encoding is the smallest you'll get without bitshifting. You're currently using one byte for each number, but if they're guaranteed to be small enough you could put two numbers in each byte. That would probably be an over-optimization, unless this is a very hot piece of your code.

You might want to consider using Uint8Array or ArrayBuffer. This blogpost shows how it's done. Copying his logic, here's an example, assuming you have an existing Uint8Array named arr.
function arrayBufferToBinaryString(buffer, cb) {
var blobBuilder = new BlobBuilder();
blobBuilder.append(buffer);
var blob = blobBuilder.getBlob();
var reader = new FileReader();
reader.onload = function (e) {
cb(reader.result);
};
reader.readAsBinaryString(blob);
}
arrayBufferToBinaryString(arr.buffer, function(s) {
// do something with s
});

We Keep Coding

JavaScript is the programming language of the Web.

understandment: byte order in protocol specification (gzip) - javascript

Related

Converting from a Uint8Array to an ArrayBuffer prepends 32 bytes?

What does OR 0x80 do?

How to calculate the number of bytes a 32-bit number will take when chunked into bytes

Decreasing byte size of integer block in socket communication

Most efficient way to store large arrays of integers in localStorage with Javascript

Categories

Resources