Fast conversion of an array of unsigned integers to base64 - javascript

I have a VBArray in Javascript containing a long set of 8-bit unsigned integers, more than 1'000'000 entries usually.
I can easily transform it to a conventional array or Uint8Array, and my goal is to obtain its base64 representation.
I have tried the methods here, but running
var b64encoded = btoa(String.fromCharCode.apply(null, _uint8Array));
throws an out of stack space exception.
The conversion in itself is not a problem, because I could write my own conversion method which does the following
create empty bit string
foreach value in the array
get binary with toString(2)
pad the binary to make it 8-bit
add it to the bit string
Base64 conversion is then trivial.
Performance, as you can imagine, is rather poor. Any suggestions on how to improve this?

You could try something like this to limit the number of arguments, thus reducing the required stack space:
var A = new Uint8Array(10000000), s = '';
// Encode at most 49152 bytes at a time
for (var i = 0; i < A.length; i += 49152) {
s += btoa(String.fromCharCode.apply(null, A.subarray(i, i + 49152)));
}
You can change the number 49152 to anything that is both under the browser's limit and divisible by 3.

Related

Javascript Equvivalant to BinaryReader.ReadString() from C#

I am converting some C# code into JavaScript code and while this file has multiple datatypes and I found a matching functionality in Javascrip from across the libraries, I am not able to find one particular function in JS.
That function is https://learn.microsoft.com/en-us/dotnet/api/system.io.binaryreader.readstring?view=net-7.0
There are couple of questions that I have:
First of all what confuses me is that isn't a string inherently a variable length variable? If so, how can this function not take a length argument?
Let's assume that there is some cap on the length of the string. If so, does JS/TS have any similar functionality? Or any package that I can download to mimic the C# functionality?
Thank you in advance.
BinaryReader expects strings to be encoded in specific format - the format BinaryWriter writes them. As stated in documentation:
Reads a string from the current stream. The string is prefixed with
the length, encoded as an integer seven bits at a time
So length of the string is stored right before the string itself, encoded "as integer seven bits at a time". We can get more info about that from BinaryWriter.Write7BitEncodedInt:
The integer of the value parameter is written out seven bits at a
time, starting with the seven least-significant bits. The high bit of
a byte indicates whether there are more bytes to be written after this
one.
If value will fit in seven bits, it takes only one byte of space. If
value will not fit in seven bits, the high bit is set on the first
byte and written out. value is then shifted by seven bits and the next
byte is written. This process is repeated until the entire integer has
been written.
So it's variable-length encoding: unlike the usual approach to always use 4 bytes for Int32 value, this approach uses variable number of bytes. That way the length of short string can take less than 4 bytes (strings with length less than 128 bytes will take just 1 byte for example).
You can reproduce this logic in javascript - just read one byte at a time. Lowest 7-bits represent (part of) the length information, and highest bit indicates whether next byte also represents length information (otherwise it's the start of actual string).
Then when you got the length - use TextDecoder to decode byte array into string of given encoding. Here is the same function in typescript. It accepts buffer (Uint8Array), offset into that buffer and encoding (by default UTF-8, check docs of TextDecoder for other available encodings):
class BinaryReader {
getString(buffer: Uint8Array, offset: number, encoding: string = "utf-8") {
let length = 0; // length of following string
let cursor = 0;
let nextByte: number;
do {
// just grab next byte
nextByte = buffer[offset + cursor];
// grab 7 bits of current byte, then shift them according to this byte position
// that is if that's first byte - do not shift, second byte - shift by 7, etc
// then merge into length with or.
length = length | ((nextByte & 0x7F) << (cursor * 7));
cursor++;
}
while (nextByte >= 0x80); // do this while most significant bit is 1
// get a slice of the length we got
let sliceWithString = buffer.slice(offset + cursor, offset + cursor + length);
let decoder = new TextDecoder(encoding);
return decoder.decode(sliceWithString);
}
}
Worth adding various sanity checks into the above code if will be used in production (that we do not read too much bytes reading length, that calculated length is actually in bounds of buffer etc).
Small test, using binary representation of string "TEST STRING", written by BinaryWriter.Write(string) in C#:
let buffer = new Uint8Array([12, 84, 69, 83, 84, 32, 83, 84, 82, 73, 78, 71, 33]);
let reader = new BinaryReader();
console.log(reader.getString(buffer, 0, "utf-8"));
// outputs TEST STRING
Update. You mention in comments that in your data the length of the string is represented by 4 bytes, so for example length 29 is represented by [0, 0, 0, 29]. That means your data was not written using BinaryWriter, and so cannot be read using BinaryReader, so you don't actually need analog of BinaryReader.GetString, contrary to what your question asks.
Anyway if you need to handle such case - you can do it:
class BinaryReader {
getString(buffer: Uint8Array, offset: number, encoding: string = "utf-8") {
// create a view over first 4 bytes starting at offset
let view = new DataView(buffer.buffer, offset, 4);
// read those 4 bytes as int 32 (big endian, since your example is like that)
let length = view.getInt32(0);
// get a slice of the length we got
let sliceWithString = buffer.slice(offset + 4, offset + 4 + length);
let decoder = new TextDecoder(encoding);
return decoder.decode(sliceWithString);
}
}

Issue with combining large array of numbers into one single number

I am trying to convert an array of numbers into one single number, for example
[1,2,3] to 123.
However, my code can't handle big arrays since it can’t return exact number. Such as
[6,1,4,5,3,9,0,1,9,5,1,8,6,7,0,5,5,4,3] returns 6145390195186705000
Is there any way that I could properly convert into a single number.I would really appreciate any help.
var integer = 0;
var digits = [1,2,3,4]
//combine array of digits into int
digits.forEach((num,index,self) => {
integer += num * Math.pow(10,self.length-index-1)
});
The biggest integer value javacript can hold is +/- 9007199254740991. Note that the bitwise operators and shift operators operate on 32-bit ints, so in that case, the max safe integer is 2^31-1, or 2147483647.
In my opinion, you can choose one of the following:
store the numbers as strings and manipulate them as numbers; you might have to implement special functions to add/subtract/multiply/divide them (these are classic algorithmic problems)
use the BigInt; BigInts are a new numeric primitive in JavaScript that can represent integers with arbitrary precision. With BigInts, you can safely store and operate on large integers even beyond the safe integer limit. Unfortunately, they work only with Chrome right now. If you want to work with other browsers, you might check this or even this if you work with angularjs or nodejs.
Try the following code in the Chrome's console:
let x = BigInt([6,1,4,5,3,9,0,1,9,5,1,8,6,7,0,5,5,4,3].join(''));
console.log(x);
This will print 6145390195186705543n. The n suffix marks that it is a big integer.
Cheers!
You can use JavaScript Array join() Method and parse it into integer.
Example:
parseInt([6,1,4,5,3,9,0,1,9,5,1,8,6,7,0,5].join(''))
results:
6145390195186705
Edited: Use BigInt instead of parseInt , but it works only on chrome browser.
The largest number possible in Javascript is
+/- 9007199254740991
Use BigInt. Join all numbers as a string and pass it in BigInt global function to convert it into int
var integer = 0;
var digits = [1,2,3,4]
//combine array of digits into int
digits.forEach((num,index,self) => {
integer += num;
});
integer= BigInt(integer);
Note : Works only on Chrome as of now. You can use othee libraries like BigInteger.js or MathJS

How can I split( a binary-containing string at a specific binary value?

Overview:
I'm building a Javascript tool inside a web page. Except for loading that page, the tool will run without server communication. A user will select a local file containing multiple binary records, each with a x'F0 start byte and x'F0 end byte. The data in between is constrained to x'00 - x'7F and consists of:
bit maps
1-byte numbers
2-byte numbers, low order byte first
a smattering of ASCII characters
The records vary in lengths and use different formats.
[It's a set of MIDI Sysex messages, probably not relevant].
The local file is read via reader.readAsArrayBuffer and then processed thus:
var contents = event.target.result;
var bytes = new Uint8Array(contents);
var rawAccum = '';
for (x = 0; x < bytes.length; x++) {
rawAccum += bytes[x];
}
var records = rawAccum.split(/\xF0/g);
I expect this to split the string into an array of its constituent records, deleting the x'F0 start byte in the process.
It actually does very little. records.length is 1 and records[0] contains the entire input stream.
[The actual split code is: var records = rawAccum.split(/\xF0\x00\x00\x26\x02/g); which should remove several identical bytes from the start of each record. When this failed I tried the abbreviated version above, with identical (non)results.]
I've looked at the doc on split( and at several explanations of \xXX among regex references. Clearly something does not work as I have deduced. My experience with JavaScript is minimal and sporadic.
How can I split a string of binary data at the occurrence of a specific binary byte?
The splitting appears to work correctly:
var rawAccum = "\xf0a\xf0b\xf0c\xf0"
console.log( rawAccum.length); // 7
var records = rawAccum.split(/\xF0/g);
console.log(records); // "", "a", "b", "c", ""
but the conversion of the array buffer to a string looks suspicious. Try converting the unsigned byte value to a string before appending it to rawAccum:
for (x = 0; x < bytes.length; x++) {
rawAccum += String.fromCharCode( bytes[x]);
}
Data conversions (update after comment)
The filereader reads the file into an array buffer in memory, but JavaScript does not provide access to array buffers directly. You can either create and initialize a typed array from the buffer (e.g. using the Uint8Array constructor as in the post), or access bytes in the buffer using a DataView object. Methods of DataView objects can convert sequences of bytes at specified positions to integers of varying types, such as the 16 bit integers in the Midi sysex records.
JavaScript strings use sequences of 16 bit values to hold characters, where each character uses one or two 16 bit values encoded using UTF-16 character encoding. 8 bit characters use only the lower 8 bits of a single 16 bit value to store their Unicode code point.
It is possible to convert an array buffer of octet values into a "binary string", by storing each byte value from the buffer in the low order bits of a 16 bit character and appending it to an existing string. This is what the post attempts to do. But in JavaScript strings (and individual characters which have a string length of 1) are not a subset of integer numbers and have their own data type, "string".
So to convert an unsigned 8 bit number to a JavaScript 16 bit character of type "string", use the fromCharCode static method of the global String object, as in
rawAccum += String.fromCharCode( bytes[x]);
Calling String.fromCharCode is also how to convert an ASCII character code located within MIDI data to a character in JavaScript.
To convert a binary string character derived from an 8 bit value back into a number, use the String instance method charCodeAt on a string value and provide the character position:
var byteValue = "\xf0".charCodeAt(0);
returns the number 0xf0 or 250 decimal.
If you append a number to a string, as in the question, the number is implicitly converted to a decimal string representation of its value first:
"" + 0xf0 + 66 // becomes the string "24066"
Note that an array buffer can be inspected using a Uint8Array created from it, sliced into pieces using the buffer's slice method and have integers of various types extracted from the buffer using data views. Please review if creating a binary string remains the best way to extract and interpret Midi record contents.

How to create a binary string from an array in the browser?

I want to send the data captured from my microphone (converted to unsigned 16bit integers) in the browser to my server as a binary string, but I'm having a hard time doing that.
I tried using String.fromCodePoint on my array, but it seems that the resulting String is not a valid one. I also tried using DataView, but not sure how to get a binary String out of that either.
Does anyone know how that can be achieved?
EDIT: I am referring to binary data, as is "binary file", and not to "binary representation of an integer".
This did not answer the question.
If only in the browser you can use the btoa function to encode to a base64 string.
Edit:
I am assuming you have an array of integers.
Final Edit:
Maybe try something like:
arr.map(function(item) {
var value = item.toString(2);
var padLength = 16 - value.length;
var binaryVal = "0".repeat(padLength) + value;
return binaryVal;
}).join("");

JSON increases Float32Array buffer size many folds when sending through websocket

I got a strange experience. When I send data of this arraybuffer setting:
var f32s = Float32Array(2048);
for (var i = 0; i < f32s.length; i++) {
f32s[i] = buffer[i]; // fill the array
ws.send(f32s[i]);
}
The buffer size I got at the other end is 8192 bytes.
But when I send chunck of buffer in JSON format like bellow:
var obj = {
buffer_id: 4,
data: f32s[i]
};
var json = JSON.stringify({ type:'buffer', data: obj });
ws.send(json);
The buffer size I got at the other end bloat to 55,xxx bytes with data filled in and 17,xxx bytes with no data filled.
Why this happen and how do I keep the buffer size low?
I want to do this because the stream is choppy when I render it at the other end.
Thank you.
I would expect this is happening because a float 32 array requires exactly 32 bits per number within the data structure, however json being an ascii format represents each number with a serious of 8bit characters, and then another 8bits for the comma and maybe again for the decimal and again for the delimiting whitespace.
Therefore the data [0.1234545, 111.3242, 523.12341] for instance requires 3 * 32 => 96 bits to represent within a float32array but as a json string requires 8bits for each of the 32 characters in this example which comes to 256 bits.

Categories