javascript web workers multithreaded string search slower than single thread?

javascript web workers multithreaded string search slower than single thread? - javascript

I have a fuzzysearch function. I have a list of 52k words. I'm running the function against each word. it takes about 30ms to finish.
i tried splitting it up into 8 web worker threads by sending 1/8th of the list to each worker (i have 8 threads on my cpu) using myWorker.postMessage({targets:slice, search}). but this is much slower, around 100ms.
my question is: is it possible for multithreaded to be faster here? or is it simply too much data to copy around to finish in under 30ms threaded? is it possible to not copy the memory and have some kind of shared memory?
(it seems like just simply sending the data to the workers is slower than me actually searching all the data in 1 thread!)

is it possible to not copy the memory and have some kind of shared
memory?
You can use second parameter of Worker.postMessage() to transfer the created object from Worker thread to main thread, or from main thread to worker.
// transfer data to `Worker` instance
worker.postMessage(data.buffer, [data.buffer]) // where `data` is an `ArrayBuffer`
// transfer data from `Worker` instance
self.postMessage(data.buffer, [data.buffer]) // where `data` is an `ArrayBuffer`
Passing data by transferring ownership (transferable objects)
Google Chrome 17+ and Firefox 18+ contain an additional way to pass
certain types of objects (transferable objects, that is objects
implementing the Transferable interface) to or from a worker with
high performance. Transferable objects are transferred from one
context to another with a zero-copy operation, which results in a vast
performance improvement when sending large data sets.

Related

Javascript threading model for multiple Websockets

I have a javascript client which has to communicate with more than 1 than one Websocket server. One of these servers sends small, high frequency payloads that I can process quickly, while the other sends larger, low frequency data that takes a long time to process:
this.hifreq = new WebSocket("ws://192.168.1.2:4646/hi");
this.hifreq.onmessage = this.onHighfreqMessage;
this.lofreq = new WebSocket("ws://192.168.1.3:4646/lo");
this.lofreq.onmessage = this.onLowfreqMessage;
I cannot find any precise documentation indicating how the threading model will work. Everybody seems to be saying that the browser model is single threaded, so there is no way I can receive two payloads and work on them simultaneously, but I can't find the single concrete documentation that says that. Is that correct? and if so, is there a way to handle the messages on different threads?
I want to make the page as responsive as possible, and my current understanding is that once I start processing the large payload, I cannot update the page in the background with the high frequency data (which I can process almost instantaneously).
I am coming form a C++/Java background, so I am trying to understand what my options are here.

You can use a Web Worker to do heavy background task. Note that JavaScript still appears to be single threaded. You have no access to window object of the page in the worker thread. You should use postMessage and onmessage on DedicatedWorkerGlobalScope to communicate with the main script.

How to dynamically increase the size of a SharedArrayBuffer

I am writing a multi-threaded program. The main thread is constantly receiving network data, and the amount of data is relatively large, so sub-threads are used to process the data.
The received data is a 100-byte packet. Each time I receive a packet, I create a 100-byte SharedArrayBuffer and send it to the child thread via postMessage(). But the main thread receives the data very fast, so it is necessary to frequently call postMessage to notify the sub-thread, which leads to high CPU usage...affecting the response speed of the main thread
So I was thinking, if SharedArraybuffer can grow dynamically, the received data is constantly appended at the end of the SharedArrayBuffer, I only notify the child thread once, so that the child thread can also access the data.
I would like to ask how to dynamically increase the length of SharedArrayBuffer. I have tried to implement it in a chained way, storing a SharedArrayBuffer object in another SharedArrayBuffer object, but the browser does not allow this.

I would like to ask how to dynamically increase the length of SharedArrayBuffer.
From MDN web docs (emphasis mine).
"The SharedArrayBuffer object is used to represent a generic, fixed-length raw binary data buffer, similar to the ArrayBuffer object, but in a way that they can be used to create views on shared memory."
Fixed-length means you can't resize it...so it's not surprising it doesn't have a resize() method.
(Note: One thing that does cross my mind though is I believe there is a very new ability for SharedArrayBuffer to be used in WebAssembly as "linear memory" which has a grow_memory operator. I would imagine taking advantage of this would be very difficult, if it is possible at all, and likely would not be supported in many browsers if it was.)
I have tried to implement it in a chained way, storing a SharedArrayBuffer object in another SharedArrayBuffer object, but the browser does not allow this.
Nope. You can only write numbers.
It might seem that you could use a number to index into a table of SharedArrayBuffers, and link them that way. But then you have to worry about how to share that table between threads--same problem.
So no matter what you do, whichever thread makes the decision to update the shared buffering structure will need to notify the others of the update somehow. For that notification to be able to transfer SharedArrayBuffers, it will have to use postMessage to do it.
Have you considered experimenting with allocating a larger SharedArrayBuffer to start with, and treat it like a circular buffer so that the main thread reads out of the writes the sub threads are doing, in a "producer/consumer" pattern?
If you insist on implementing resizes, you might consider having some portion of the buffer hold an indicator that it is "stale" and a new one must be requested from the thread that resized it. You'll have to control that with synchronization. If you make a small sample that does this, it would probably make a good technical article...and if you have trouble with the small sample, it would be a good basis for further questions here.

There is no way to resize, only copy via a typed array.
But no RAM is actually allocated, until the ram is actually used. Under Node.js (v14.14.0) you can see how the ram usage gradually increases as the buffer is filled or how it is basically instantly used if array.fill is used.
const sharedBuffer = new SharedArrayBuffer(512 * 1024 * 1024)
const array = new Uint8Array(sharedBuffer)
// array.fill(1) // Causes ram to be allocated right away

WebWorkers and Asynchronous shared data access. How in Scala.js?

Please consider a Scala.js class that contains a large JavaScript typed array called xArr.
A process called p(xArr) consumes xArr as input but takes a long time to complete. In order to avoid script timeout warnings, p(xArr) runs in a Web Worker.
Recall these constraints on communication between the main thread and the Web Worker thread:
Communication in either direction takes the form of message passing.
Message data must conform to the requirements of JavaScript's structured clone algorithm.
Unless specified in the optional transfer list, message data gets duplicated instead of transfered to/from the main and Worker threads.
To transfer message data instead of copying it to/from the worker thread, the data must implement the Transferable interface and the transfer list must contain a reference to the transferable data.
If a transferable object transfers between threads, the sending thread loses access to it.
Because of xArr's size, sending a copy of it to the worker thread will incur severe memory costs, but becasue of p(xArr)'s run time, it can not run in the main thread.
Fortunately, typed arrays implement the Transferable interface, so to save compute and memory resources, the program invokes p(xArr) by transferring xArr to the WebWorker which invokes p(xArr) then transfers xArr back to the main thread.
Unfortunately, other asynchronous methods in the main thread must access xArr which may have transferred to the worker's scope at invocation time.
What Scala language features could govern access to xArr so that method calls execute immediately when the main thread owns xArr but wait for it to return to scope when the worker owns xArr?
In other words: How would you handle a class variable that continuously alternates between defined and undefined over time?
Would you suggest locks? Promise/Callback queues? Would you approach the problem in an entirely different way? If so, how?
Remember that this is a Scala.js library, so we must disqualify JVM specific features.

I understand your very real pain here. This used to work with SharedArrayBuffer but it is currently disabled in Chrome. Sadly there is no alternative for shared memory:
Note that SharedArrayBuffer was disabled by default in all major browsers on 5 January, 2018 in response to Spectre.
There are plans to re-add SharedArrayBuffer after proper security auditing will be complete. I guess we'll have to wait.
If you were running your code in Node - this would be hard but possible.

Thanks to all who considered this issue. A solution exists as of 19 May 2018; hopefully a better one can replace it soon.
The current version works as follows:
Problem 1: How can we associate function calls from the main thread with function definitions in the worker thread?
S1: A map of Promise objects: Map[Long, PromiseWrapper]() associates a method invocation ID with a promise that can process the result. This simple multiplexing mechanism evolved from another Stack Overflow question. Thanks again to Justin du Coeur.
Problem 2: How can we invoke functions in the worker thread from the main thread?
S1: Pass a text representation of the function to the worker, then parse it with eval and invoke the resulting function. Unfortunately, eval comes with security risks. Besides, having to write pure JavaScript code in string values defeats most of the advantages of Scala.js, namely type safety and Scala syntax.
S2: Storing function definitions in a lookup table in worker scope and invoking the functions by passing the keys. This could work, but feels clunky in Scala because different functions take parameters that vary in number and type.
S3: Wrap the functions into serializable case classes, then send the serialized bytes from the main scope to the worker scope and invoke the function there. You can think of these case classes as message classes. The current solution uses this approach. It relies on BooPickle by Otto Chrons. The serialized class wraps the method call and any trivial function parameters, e.g. numbers, short strings, and simple case classes. Large data, like the TypedArray values featured in this question transfer from the main thread to the worker thread through a mechanism discussed later. Unfortunately, this approach means that all operations on the TypedArray values must be defined before compile time because BooPickle relies on macros, not reflection, to serialize and deserialize classes.
Problem 3: How can we pass the values of the TypedArray class variable, xArr to and from the worker thread without duplicating it?
S1: Because xArr conforms to the Transferrable interface, it can transfer wholly between the main and worker scopes. At the same time, the serialized classes that wrap the function calls conform to a trait that specifies an apply method with this signature:
def apply(parameters: js.Array[Transferable]): js.Array[Transferable]
By convention, the parameters array contains a serialized version of the message case class in index 0. Subsequent indices contain the TypedArray values. Each message class has its own unique implementation of this apply method.
Problem 4: How can we pass the result of the computation back to the promise that waits for it in the main thread?
S1: The apply methods mentioned in Problem 3.S1 return a new array of Transferrable objects with another serialized message class at its head. That message class wraps the return value from the computation: p(xArr) and, with an apply method of its own, instructs the main thread on how to interpret the array. In cases where p(xArr) returns large objects like other TypedArray values, those occupy subsequent positions in the array.
Problem 5: What if statements in the main thread try to access xArr when it has transferred to the worker thread?
S1. Now, any code in the main thread can only access xArr through a checkOut method and must restore it by calling a checkIn method. The checkOut method returns a Future that completes when xArr returns from the worker thread. Concurrent calls to checkOut get pushed onto a queue of promises. Any code that calls checkOut must call checkIn to pass control of xArr on to the next Promise waiting in the queue. Unfortunately, this design burdens the programmer with the responsibility of restoring xArr to its encompassing class. Unfortunately, schemes like this resemble classical concurrency models with locks and memory allocation methods like malloc and free, and tend toward buggy code that freezes or crashes.
Problem 5: After p(xArr) executes in the worker thread, how can xArr return to the class that encapsulated it in the main thread?
S1. Message case classes meant to invoke p(xArr) now inherit from a trait called Boomerang. As the name implies, these messages transfer from the main thread to the worker thread, invoke p(xArr) while there, then return, unchanged, to the main thread. Once returned to the main thread, Boomerang objects call relevant checkIn methods to restore xArr values to their original encapsulating objects.
For simplicity, this answer leaves out details about different types of Transferrable parameters, operations that mutate xArr instead of simply reading it and restoring it, operations that don't take any parameters but still yield large TypedArray responses, and operations that take multiple large TypedArray parameters, but minor modifications to the five solutions articulated above met those objectives.
With this as a baseline, can we:
Simplify this design?
Incorporate user defined operations?
Find safer alternatives to the checkOut, checkIn methods?

throttling events in event queue

In a recent SO question, I outlined an OOM condition that I'm running into while processing a large number of csv files with millions of records in each.
The more I am looking into the problem and the more I'm reading up on Node.js the more convinced I become that the OOM isn't happening because of a memory leak but because I'm not throttling the data input into the system.
The code just blindly sucks in all data, creating a single callback event for each line. The events keep getting added to the main event loop, which eventually becomes so large that it exhausts all available memory.
What are Node's idiomatic patterns for dealing with this scenario? Should I be tying reading of csv files to a blocking queue of some sort that, once full, will block the file reader from parsing more of the data? Are there any good examples dealing with processing of large data sets?
Update: To put this differently and simpler, Node can process input faster than it can process output and the slack is being stored in memory (queued as events for the event queue). Because there is a lot of slack, the memory eventually gets exhausted. So the question is: what's the idiomatic way of throttling down input to the output's rate?

Your best bet is to set things up as streams, and rely on the built-in backpressure semantics to do so. The Streams Handbook as a really good overview on it.
Similar to unix, the node stream module's primary composition operator is called .pipe() and you get a backpressure mechanism for free to throttle writes for slow consumers.
Update
I've not used the readline module for anything other than a terminal input before, but reading the docs it looks like it accepts an input stream and an output stream. If you frame your DB writer as a writeable stream, you should be able to let readline pipe it for you internally.

Improving Performance on massive IndexedDB Insert

We are trying to pre-cache a large sum of data on load of our web application into indexed db. From my performance testing the speed is decent on a desktop browser (e.g. Internet Explorer) where I can insert 10,000 records in around 2 seconds. But comparing the exact same functionality on the iPad it drops to 30 seconds. That comparison just blew my mind.
Does anyone know of any hints or tricks to inserting large data sets into indexedDB. I dont know if it is possible at all but if we could build up a copy of an indexedDB server side with all the data prepopulated and then just shoot it over to the client and it just stores it down to the browser. Is anything along these lines doable?
Thanks

I had problems with massive bulk insert (100.000 - 200.000 records). I've solved all my IndexedDB performance problems using Dexie library. It has this important feature:
Dexie has a kick-ass performance. It's bulk methods take advantage of
a not well known feature in indexedDB that makes it possible to store
stuff without listening to every onsuccess event. This speeds up the
performance to a maximum.
Dexie: https://github.com/dfahlander/Dexie.js

Some pretty bad IndexedDB performance problems can be caused by a prolonged period of the browser just calling onsuccess callbacks and running into event loop overhead after the work is actually done. The performance pattern observed by my app which was doing this was that it did a bunch of work, then it just went answering thousands of callbacks very inefficiently:
The right hand part of this image is the callbacks on every request. The solution to doing that is, of course, to not put a callback on every request, but it was previously unclear to me how to do this.
The way that Dexie.js accomplishes this (for details, see src/dbcore/dbcore-indexeddb.ts) is that it saves the last request (e.g. IDBObjectStore.put, etc) sent and sets an onsuccess callback on that one, which then collects the results from the rest of the requests. Thus, it avoids the callback hell.
Another approach from this is to use the IDBTransaction.oncomplete event, and not worry about the callbacks on the individual requests at all.
(note: yes, I know how old this question is, I had this problem today and wanted to put something more useful for this question which is high in Google results)

How is your data stored in the indexeddb? Is everything in a single object store of do you use multiple objectstores. Do you need all the cached data immediatly?
If you only have a single object store you can start with storing all the data you initialy need, commit that transaction and start a new for all the rest. This way you can start retrieving the initial data while inserting the rest. IndexedDB is async so it should block you.
If you have multiple object stores you can use the same stratigy. First fill up the objectstore you need immediatly and delay the others.
Or maybe consider using the AppCache API instead of the indexeddb api. Using this you can just cache a javascriptfile containing all the json objects you want to cache. This is more the case when you don't need a lot of querying on the data.

We Keep Coding

JavaScript is the programming language of the Web.