How can I efficiently write numeric data to a file? - javascript

Say I have an array containing a million random numbers:
[ 0.17309080497872764, 0.7861753816498267, ...]
I need to save them to disk, to be read back later. I could store them in a text format like JSON or csv, but that will waste space. I'd prefer a binary format where each number takes up only 8 bytes on disk.
How can I do this using node?
UPDATE
I did not find an answer to this specific question, with a full example, in the supposedly duplicate question. I was able to solve it myself, but in a verbose way that could surely be improved:
// const a = map(Math.random, Array(10));
const a = [
0.9651891365487693,
0.7385397746441058,
0.5330173086062189,
0.08100066198727673,
0.11758119861500771,
0.26647845473863674,
0.0637438360410223,
0.7070151519015955,
0.8671093412761386,
0.20282735866103718
];
// write the array to file as raw bytes (80B total)
var wstream = fs.createWriteStream('test.txt');
a.forEach(num => {
const b = new Buffer(8);
b.writeDoubleLE(num);
wstream.write(b);
})
wstream.end(() => {
// read it back
const buff = fs.readFileSync('test.txt');
const aa = a.map((_, i) => buff.readDoubleLE(8*i));
console.log(aa);
});

I think this was answered in Read/Write bytes of float in JS
The ArrayBuffer solution is probably what you are looking for.

Related

Javascript Webworker how to put json information into array buffer

I have thousands of small strings that I have to pass from a webworker back to the main page, each one is something like this:
"this string needs to be sent"
How would I be able to include it into an array buffer in order to increase the transfer speed? I understand how to use numbers with array buffers, but how do you use strings? I am looking for something like this:
var strings = ["str1","str2","str3",...]
for (var i = 0; i < strings.length; i++) {
arraybuffer[i] = //Whatever operation works to add strings[i]
}
It's worth measuring and comparing performance of various techniques. The worker could use SharedArrayBuffer if supported in your target browsers (not exemplified below), otherwise Transferrable objects can be used with postMessage(). TextEncoder creates ArrayBuffers from strings.
Individual strings can be transferred as they are encoded:
const encoder = new TextEncoder()
strings.forEach(s => {
const encoded = encoder.encode(s)
postMessage(encoded, [encoded.buffer])
})
An array of strings could be transferred in batches:
const encoded = strings.map(s => encoder.encode(s))
postMessage(encoded, encoded.map(bytes => bytes.buffer))

Stream JSON-parsable array to file

So you're reading data from a file, cleaning out the data, and writing it back to another file, but the new file isn't accepted JSON format.
You need to fill an object in the new file. You get a chunk from the file, alter it, and save it to the new file.
For this you stream the data out, edit the chunks, and stream it back into the other file. Great.
You're sure to add , after each item to keep the array readable later on,
but now the last item has a trailing comma ,...
You don't know the count of items in the original file, and you also don't know when the reader is at the last item.
You use something like JSONStream on that array but JSONStream also does not provide the index.
The only end events are for your writers and readers.
How do you remove the trailing comma before/after writing?
read_file = 'animals.json' //very large file
write_file = 'brown_dogs.json' //moderately large file
let read_stream = fs.createReadStream(read_file);
let write_stream = fs.createWriteStream(write_file);
let dog_stream = require('JSONStream').parse('array_of_animals.dogs.*');
write_stream
.on('finish', () => {
//the writer is done writing my list of dogs, but my array has a
//trailing comma, now my brown_dogs.json isn't parsable
})
.write('{"brown_dogs": ['); //lets start
read_stream
.pipe(dog_stream)
.on('data', dog => {
//basic logic before we save the item
if (dog._fur_colour === 'brown'){
let _dog = {
type : dog._type,
colour : dog._fur_colour,
size : dog._height
}
};
//we write our accepted dog
write_stream.write(JSON.stringify(_dog) + ',');
}
})
.on('end', () => {
//done reading animals.json
write_stream.write(']}');
})
--
If your resulting JSON file is small, you may simply add all the dogs to an array and only save all the contents to the file in one go. This means the file is not only JSON friendly, but also small enough to simply open with JSON.parse()
If your resulting JSON file is large, you may need to stream the items out in any case. Luckily JSONStream allows us to not only extract each dog individually but also ignore the trailing comma.
This is what I understand to be the solution...but I don't think it's perfect. Why can't the file be accepted JSON, regardless of the size.
This is actually very simple.
Add an empty string var to the beginning of the insert. Set the string to a separator after the first insert.
//update this string after the first insert
let separator = '';
read_stream
.pipe(dog_stream)
.on('data', dog => {
//basic logic before we save the item
if (dog._fur_colour === 'brown'){
let _dog = {
type : dog._type,
colour : dog._fur_colour,
size : dog._height
}
};
//we write our accepted dog
write_stream.write(separator + JSON.stringify(_dog));
//update this after first insert
separator = ',';
}
})
I think
I added toJSONArray method exactly for this in scramjet see docs here. It puts a comma only between the chunks.
The code would look like this:
fs.createReadStream(read_file)
.pipe(require('JSONStream').parse('array_of_animals.dogs.*'))
.pipe(new DataStream())
.filter(dog => dog._fur_colour === 'brown') // this will filter out the non-brown dogs.
.map(dog => { // remap the data
reutrn {
type : dog._type,
colour : dog._fur_colour,
size : dog._height
};
})
.toJSONArray(['{"brown_dogs": [', ']}']) // add your enclosure
.pipe(fs.createWriteStream(write_file));
This code should make a fine JSON.

How do I reverse a buffer.toString() on a hex buffer?

const uuidc = '9acf0decef304b229ea1560d4b3bf7d0';
const packed = Buffer.from(uuidc, 'hex');
const packedAndStringified = 'm:' + packed;
I have some keys stored in a redis database that were stored like above. The problem is that once a string is appended to pack it is (I'm guessing) effectively converting the hex buffer into a binary string.
The stringified output looks something like: K;��V��
Is there any way for me to get packedAndStringified back to packed, and ultimately get the uuidc pulled back out?
https://nodejs.org/api/buffer.html#buffer_buf_tostring_encoding_start_end
Here it should be const packedAndStringified = 'm:' + packed.toString('hex'); ?

node encode and decode utf-16 buffer

im working on a javascript/nodejs application that needs to talk with a C++ tcp/udp socket. It seems like I get from the old C++ clients an utf16 buffer. I didn't found a solution right now to convert it to a readable string and the other direction seems to be the same problem.
Is there a easy way for this two directions?
Nice greetings
If you have a UTF-16-encoded buffer, you can convert it to a UTF-8 string like this:
let string = buffer.toString('utf16le');
To read these from a stream, it's easiest to use convert to string at the very end:
let chunks = [];
stream.on('data', chunk => chunks.push(chunk))
.on('end', () => {
let buffer = Buffer.concat(chunks);
let string = buffer.toString('utf16le');
...
});
To convert a JS string to UTF-16:
let buffer = Buffer.from(string, 'utf16le')

What is the fastest way to read and parse a file of numerical ASCII pairs in Node.js?

I'm using Node.js to read and parse a file of pairs encoding numbers. I have a file like this:
1561 0506
1204 900
6060 44
And I want to read it as an array, like this:
[[1561,0506],[1204,900],[6060,44]]
For that, I am using a readStream, reading the file as chunks and using native string functions to do the parsing:
fileStream.on("data",function(chunk){
var newLineIndex;
file = file + chunk;
while ((newLineIndex = file.indexOf("\n")) !== -1){
var spaceIndex = file.indexOf(" ");
edges.push([
Number(file.slice(0,spaceIndex)),
Number(file.slice(spaceIndex+1,newLineIndex))]);
file = file.slice(newLineIndex+1);
};
});
That took way to many time, though (4s for the file I need on my machine). I see some reasons:
Use of strings;
use of "Number";
Dynamic array of arrays.
I've rewriten the algorithm without using the builtin string functions, but loops instead and, to my surprise, it became much slower! Is there any way to make it faster?
Caveat: I have not tested the performance of this solution, but it's complete so should be easy to try.
How about using this liner implementation based on the notes in this question.
Using the liner:
var fs = require('fs')
var liner = require('./liner')
var source = fs.createReadStream('mypathhere')
source.pipe(liner)
liner.on('readable', function () {
var line
while (line = liner.read()) {
var parts = line.split(" ");
edges.push([Number(parts[0]), Number(parts[1])]);
}
})
As you can see I also moved the edge array to be an inline constant-sized array separate from the split parts, which I'm guessing would speed up allocation. You could even try swapping out using indexOf(" ") instead of split(" ").
Beyond this you could instrument the code to identify any further bottlenecks.

Categories