Javascript Webworker how to put json information into array buffer - javascript

I have thousands of small strings that I have to pass from a webworker back to the main page, each one is something like this:
"this string needs to be sent"
How would I be able to include it into an array buffer in order to increase the transfer speed? I understand how to use numbers with array buffers, but how do you use strings? I am looking for something like this:
var strings = ["str1","str2","str3",...]
for (var i = 0; i < strings.length; i++) {
arraybuffer[i] = //Whatever operation works to add strings[i]
}

It's worth measuring and comparing performance of various techniques. The worker could use SharedArrayBuffer if supported in your target browsers (not exemplified below), otherwise Transferrable objects can be used with postMessage(). TextEncoder creates ArrayBuffers from strings.
Individual strings can be transferred as they are encoded:
const encoder = new TextEncoder()
strings.forEach(s => {
const encoded = encoder.encode(s)
postMessage(encoded, [encoded.buffer])
})
An array of strings could be transferred in batches:
const encoded = strings.map(s => encoder.encode(s))
postMessage(encoded, encoded.map(bytes => bytes.buffer))

Related

URL Parse Exercise (JavaScript)

So here is a description of the problem that I've been talked to solve:
We need some logic that extracts the variable parts of a url into a hash. The keys
of the extract hash will be the "names" of the variable parts of a url, and the
values of the hash will be the values. We will be supplied with:
A url format string, which describes the format of a url. A url format string
can contain constant parts and variable parts, in any order, where "parts"
of a url are separated with "/". All variable parts begin with a colon. Here is
an example of such a url format string:
'/:version/api/:collection/:id'
A particular url instance that is guaranteed to have the format given by
the url format string. It may also contain url parameters. For example,
given the example url format string above, the url instance might be:
'/6/api/listings/3?sort=desc&limit=10'
Given this example url format string and url instance, the hash we want that
maps all the variable parts of the url instance to their values would look like this:
{
version: 6,
collection: 'listings',
id: 3,
sort: 'desc',
limit: 10
}
So I technically have a semi-working solution to this but, my questions are:
Am I understanding the task correctly? I'm not sure if I'm supposed to be dealing with two inputs (URL format string and URL instance) or if I'm just supposed to be working with one URL as a whole. (my solution takes two separate inputs)
In my solution, I keep reusing the split() method to chunk the array/s down and it feels a little repetitive. Is there a better way to do this?
If anyone can help me understand this challenge better and/or help me clean up my solution, it would be greatly appreciated!
Here is my JS:
const obj = {};
function parseUrl(str1, str2) {
const keyArr = [];
const valArr = [];
const splitStr1 = str1.split("/");
const splitStr2 = str2.split("?");
let val1 = splitStr2[0].split("/");
let val2 = splitStr2[1].split("&");
splitStr1.forEach((i) => {
keyArr.push(i);
});
val1.forEach((i) => {
valArr.push(i);
});
val2.forEach((i) => {
keyArr.push(i.split("=")[0]);
valArr.push(i.split("=")[1]);
});
for (let i = 0; i < keyArr.length; i++) {
if (keyArr[i] !== "" && valArr[i] !== "") {
obj[keyArr[i]] = valArr[i];
}
}
return obj;
};
console.log(parseUrl('/:version/api/:collection/:id', '/6/api/listings/3?sort=desc&limit=10'));
And here is a link to my codepen so you can see my output in the console:
https://codepen.io/TOOTCODER/pen/yLabpBo?editors=0012
Am I understanding the task correctly? I'm not sure if I'm supposed to
be dealing with two inputs (URL format string and URL instance) or if
I'm just supposed to be working with one URL as a whole. (my solution
takes two separate inputs)
Yes, your understanding of the problem seems correct to me. What this task seems to be asking you to do is implement a route parameter and a query string parser. These often come up when you want to extract data from part of the URL on the server-side (although you don't usually need to implement this logic your self). Do keep in mind though, you only want to get the path parameters if they have a : in front of them (currently you're retrieving all values for all), not all parameters (eg: api in your answer should be excluded from the object (ie: hash)).
In my solution, I keep reusing the split() method to chunk the array/s
down and it feels a little repetitive. Is there a better way to do
this?
The number of .split() methods that you have may seem like a lot, but each of them is serving its own purpose of extracting the data required. You can, however, change your code to make use of other array methods such as .map(), .filter() etc. to cut your code down a little. The below code also considers the case when no query string (ie: ?key=value) is provided:
function parseQuery(queryString) {
return queryString.split("&").map(qParam => qParam.split("="));
}
function parseUrl(str1, str2) {
const keys = str1.split("/")
.map((key, idx) => [key.replace(":", ""), idx, key.charAt(0) === ":"])
.filter(([,,keep]) => keep);
const [path, query = ""] = str2.split("?");
const pathParts = path.split("/");
const entries = keys.map(([key, idx]) => [key, pathParts[idx]]);
return Object.fromEntries(query ? [...entries, ...parseQuery(query)] : entries);
}
console.log(parseUrl('/:version/api/:collection/:id', '/6/api/listings/3?sort=desc&limit=10'));
It would be even better if you don't have to re-invent the wheel, and instead make use of the URL constructor, which will allow you to extract the required information from your URLs more easily, such as the search parameters, this, however, requires that both strings are valid URLs:
function parseUrl(str1, str2) {
const {pathname, searchParams} = new URL(str2);
const keys = new URL(str1).pathname.split("/")
.map((key, idx) => [key.replace(":", ""), idx, key.startsWith(":")])
.filter(([,,keep]) => keep);
const pathParts = pathname.split("/");
const entries = keys.map(([key, idx]) => [key, pathParts[idx]]);
return Object.fromEntries([...entries, ...searchParams]);
}
console.log(parseUrl('https://www.example.com/:version/api/:collection/:id', 'https://www.example.com/6/api/listings/3?sort=desc&limit=10'));
Above, we still need to write our own custom logic to obtain the URL parameters, however, we don't need to write any logic to extract the query string data as this is done for us by using URLSearchParams. We're also able to lower the number of .split()s used as we can obtain use the URL constructor to give us an object with a parsed URL already. If you end up using a library (such as express), you will get the above functionality out-of-the-box.

How can I efficiently write numeric data to a file?

Say I have an array containing a million random numbers:
[ 0.17309080497872764, 0.7861753816498267, ...]
I need to save them to disk, to be read back later. I could store them in a text format like JSON or csv, but that will waste space. I'd prefer a binary format where each number takes up only 8 bytes on disk.
How can I do this using node?
UPDATE
I did not find an answer to this specific question, with a full example, in the supposedly duplicate question. I was able to solve it myself, but in a verbose way that could surely be improved:
// const a = map(Math.random, Array(10));
const a = [
0.9651891365487693,
0.7385397746441058,
0.5330173086062189,
0.08100066198727673,
0.11758119861500771,
0.26647845473863674,
0.0637438360410223,
0.7070151519015955,
0.8671093412761386,
0.20282735866103718
];
// write the array to file as raw bytes (80B total)
var wstream = fs.createWriteStream('test.txt');
a.forEach(num => {
const b = new Buffer(8);
b.writeDoubleLE(num);
wstream.write(b);
})
wstream.end(() => {
// read it back
const buff = fs.readFileSync('test.txt');
const aa = a.map((_, i) => buff.readDoubleLE(8*i));
console.log(aa);
});
I think this was answered in Read/Write bytes of float in JS
The ArrayBuffer solution is probably what you are looking for.

What is the fastest way to read and parse a file of numerical ASCII pairs in Node.js?

I'm using Node.js to read and parse a file of pairs encoding numbers. I have a file like this:
1561 0506
1204 900
6060 44
And I want to read it as an array, like this:
[[1561,0506],[1204,900],[6060,44]]
For that, I am using a readStream, reading the file as chunks and using native string functions to do the parsing:
fileStream.on("data",function(chunk){
var newLineIndex;
file = file + chunk;
while ((newLineIndex = file.indexOf("\n")) !== -1){
var spaceIndex = file.indexOf(" ");
edges.push([
Number(file.slice(0,spaceIndex)),
Number(file.slice(spaceIndex+1,newLineIndex))]);
file = file.slice(newLineIndex+1);
};
});
That took way to many time, though (4s for the file I need on my machine). I see some reasons:
Use of strings;
use of "Number";
Dynamic array of arrays.
I've rewriten the algorithm without using the builtin string functions, but loops instead and, to my surprise, it became much slower! Is there any way to make it faster?
Caveat: I have not tested the performance of this solution, but it's complete so should be easy to try.
How about using this liner implementation based on the notes in this question.
Using the liner:
var fs = require('fs')
var liner = require('./liner')
var source = fs.createReadStream('mypathhere')
source.pipe(liner)
liner.on('readable', function () {
var line
while (line = liner.read()) {
var parts = line.split(" ");
edges.push([Number(parts[0]), Number(parts[1])]);
}
})
As you can see I also moved the edge array to be an inline constant-sized array separate from the split parts, which I'm guessing would speed up allocation. You could even try swapping out using indexOf(" ") instead of split(" ").
Beyond this you could instrument the code to identify any further bottlenecks.

Appending ArrayBuffers

What is the preferable way of appending/combining ArrayBuffers?
I'm receiving and parsing network packets with a variety of data structures. Incoming messages are read into ArrayBuffers. If a partial packet arrives I need to store it and wait for the next message before re-attempting to parse it.
Currently I'm doing something like this:
function appendBuffer( buffer1, buffer2 ) {
var tmp = new Uint8Array( buffer1.byteLength + buffer2.byteLength );
tmp.set( new Uint8Array( buffer1 ), 0 );
tmp.set( new Uint8Array( buffer2 ), buffer1.byteLength );
return tmp.buffer;
}
Obviously you can't get around having to create a new buffer as ArrayBuffers are of a fixed length, but is it necessary to initialize typed arrays? Upon arrival I just want is to be able to treat the buffers as buffers; types and structures are of no concern.
Why not using a Blob ? (I realize it might not have been available at that time).
Just create a Blob with your data, like var blob = new Blob([array1,array2,string,...]) and turn it back into an ArrayBuffer (if needed) using a FileReader (see this).
Check this : What's the difference between BlobBuilder and the new Blob constructor?
And this : MDN Blob API
EDIT :
I wanted to compare the efficiency of these two methods (Blobs, and the method used in the question) and created a JSPerf : http://jsperf.com/appending-arraybuffers
Seems like using Blobs is slower (In fact, I guess it's the use of Filereader to read the Blob that takes the most time). So now you know ;)
Maybe it would me more efficient when there are more than 2 ArrayBuffer (like reconstructing a file from its chunks).
function concat (views: ArrayBufferView[]) {
let length = 0
for (const v of views)
length += v.byteLength
let buf = new Uint8Array(length)
let offset = 0
for (const v of views) {
const uint8view = new Uint8Array(v.buffer, v.byteOffset, v.byteLength)
buf.set(uint8view, offset)
offset += uint8view.byteLength
}
return buf
}
It seems you've already concluded that there is no way around creating a new array buffer. However, for performance sake, it could be beneficial to append the contents of the buffer to a standard array object, then create a new array buffer or typed array from that.
var data = [];
function receive_buffer(buffer) {
var i, len = data.length;
for(i = 0; i < buffer.length; i++)
data[len + i] = buffer[i];
if( buffer_stream_done())
callback( new Uint8Array(data));
}
Most javascript engines will already have some space set aside for dynamically allocated memory. This method will utilize that space instead of creating numerous new memory allocations, which can be a performance killer inside the operating system kernel. On top of that you'll also shave off a few function calls.
A second, more involved option would be to allocate the memory beforehand.
If you know the maximum size of any data stream then you could create an array buffer of that size, fill it up (partially if necessary) then empty it when done.
Finally, if performance is your primary goal, and you know the maximum packet size (instead of the entire stream) then start out with a handful of array buffers of that size. As you fill up your pre-allocated memory, create new buffers between network calls -- asynchronously if possible.
You could always use DataView (http://www.khronos.org/registry/typedarray/specs/latest/#8) rather than a specific typed array, but as has been mentioned in the comments to your question, you can't actually do much with ArrayBuffer on its own.

Line-oriented streams in Node.js

I'm developing a multi-process application using Node.js. In this application, a parent process will spawn a child process and communicate with it using a JSON-based messaging protocol over a pipe. I've found that large JSON messages may get "cut off", such that a single "chunk" emitted to the data listener on the pipe does not contain the full JSON message. Furthermore, small JSON messages may be grouped in the same chunk. Each JSON message will be delimited by a newline character, and so I'm wondering if there is already a utility that will buffer the pipe read stream such that it emits one line at a time (and hence, for my application, one JSON document at a time). This seems like it would be a pretty common use case, so I'm wondering if it has already been done.
I'd appreciate any guidance anyone can offer. Thanks.
Maybe Pedro's carrier can help you?
Carrier helps you implement new-line
terminated protocols over node.js.
The client can send you chunks of
lines and carrier will only notify you
on each completed line.
My solution to this problem is to send JSON messages each terminated with some special unicode character. A character that you would never normally get in the JSON string. Call it TERM.
So the sender just does "JSON.stringify(message) + TERM;" and writes it.
The reciever then splits incomming data on the TERM and parses the parts with JSON.parse() which is pretty quick.
The trick is that the last message may not parse, so we simply save that fragment and add it to the beginning of the next message when it comes. Recieving code goes like this:
s.on("data", function (data) {
var info = data.toString().split(TERM);
info[0] = fragment + info[0];
fragment = '';
for ( var index = 0; index < info.length; index++) {
if (info[index]) {
try {
var message = JSON.parse(info[index]);
self.emit('message', message);
} catch (error) {
fragment = info[index];
continue;
}
}
}
});
Where "fragment" is defined somwhere where it will persist between data chunks.
But what is TERM? I have used the unicode replacement character '\uFFFD'. One could also use the technique used by twitter where messages are separated by '\r\n' and tweets use '\n' for new lines and never contain '\r\n'
I find this to be a lot simpler than messing with including lengths and such like.
Simplest solution is to send length of json data before each message as fixed-length prefix (4 bytes?) and have a simple un-framing parser which buffers small chunks or splits bigger ones.
You can try node-binary to avoid writing parser manually. Look at scan(key, buffer) documentation example - it does exactly line-by line reading.
As long as newlines (or whatever delimiter you use) will only delimit the JSON messages and not be embedded in them, you can use the following pattern:
let buf = ''
s.on('data', data => {
buf += data.toString()
const idx = buf.indexOf('\n')
if (idx < 0) { return } // No '\n', no full message
let lines = buf.split('\n')
buf = lines.pop() // if ends in '\n' then buf will be empty
for (let line of lines) {
// Handle the line
}
})

Categories