How to share context between worker threads

How to share context between worker threads - javascript

I would like to create a worker thread in a node.js app and pass the current context to the new thread, so I would be able to access my variables and functions within the new thread, Is there is a library to support that? And if not can I a least pass an anonymous function between them?

There is no way to share a context with a worker thread. This isn't "an ideology of the Node.js team", instead it's a limitation of the JavaScript language, which doesn't allow concurrency (such as concurrent access to objects from a worker thread).
The one exception is that you can share numerical data between multiple threads by using a SharedArrayBuffer.
Aside from that, the way to send data to or receive it from a worker thread is to use postMessage. See also Node's full worker threads documentation.
For completeness: there is an early-stage proposal to add a new kind of cross-thread-shareable object to JavaScript. As with all early-stage proposals, there's no guarantee that it'll be finalized at all or how long that might take, but it does indicate that there's some interest in this space.

Related

Can we use JavaScript FileSystemAPI from a Webworker?

Can we use the JavaScript FileSystemAPI from a webworker?
https://developer.mozilla.org/en-US/docs/Web/API/FileSystem
In their documentation they have not mentioned anything but while using it its throwing window not defined.
Thanks!

Not really...
There used to be a self.webkitRequestFileSystemSync() method accessible in Worker scopes, but it's been deprecated. And the FileSystem object you'd get from a drop event can't be serialized, and thus can't be posted to Worker from the main thread.
However, I suspect you don't really want to work with the FileSystem API, which is not really useful in web contexts, but instead you may want to prefer the File System Access API, which gives your page access to the user's file system (even though it's still only available in Chromium based browsers).
But to use this API from a Web Worker is not simple either.
To make the request to the File System Access API, we need to be handling an user-gesture. Web Workers don't have access to the UI and thus they don't either have access to UI events (yet).
So we must make the request from the UI thread.
However, contrary to FileSystem objects, FileSystemHandles are serializable and can be posted though postMessage(), so once you get the handle, you can post it to your worker and do your work from there.
In UI thread
btn.onclick = async (evt) => {
const dirHandle = await showDirectoryPicker();
worker.postMessage( dirHandle );
};
Then in the Worker thread you can receive that handle in the MesageEvent.data, and work with it as you'd do from the main thread.
Here is a live demo, and its source.

Webworkers global object name is global, not window.
And there is no FileSystem API

WebWorkers and Asynchronous shared data access. How in Scala.js?

Please consider a Scala.js class that contains a large JavaScript typed array called xArr.
A process called p(xArr) consumes xArr as input but takes a long time to complete. In order to avoid script timeout warnings, p(xArr) runs in a Web Worker.
Recall these constraints on communication between the main thread and the Web Worker thread:
Communication in either direction takes the form of message passing.
Message data must conform to the requirements of JavaScript's structured clone algorithm.
Unless specified in the optional transfer list, message data gets duplicated instead of transfered to/from the main and Worker threads.
To transfer message data instead of copying it to/from the worker thread, the data must implement the Transferable interface and the transfer list must contain a reference to the transferable data.
If a transferable object transfers between threads, the sending thread loses access to it.
Because of xArr's size, sending a copy of it to the worker thread will incur severe memory costs, but becasue of p(xArr)'s run time, it can not run in the main thread.
Fortunately, typed arrays implement the Transferable interface, so to save compute and memory resources, the program invokes p(xArr) by transferring xArr to the WebWorker which invokes p(xArr) then transfers xArr back to the main thread.
Unfortunately, other asynchronous methods in the main thread must access xArr which may have transferred to the worker's scope at invocation time.
What Scala language features could govern access to xArr so that method calls execute immediately when the main thread owns xArr but wait for it to return to scope when the worker owns xArr?
In other words: How would you handle a class variable that continuously alternates between defined and undefined over time?
Would you suggest locks? Promise/Callback queues? Would you approach the problem in an entirely different way? If so, how?
Remember that this is a Scala.js library, so we must disqualify JVM specific features.

I understand your very real pain here. This used to work with SharedArrayBuffer but it is currently disabled in Chrome. Sadly there is no alternative for shared memory:
Note that SharedArrayBuffer was disabled by default in all major browsers on 5 January, 2018 in response to Spectre.
There are plans to re-add SharedArrayBuffer after proper security auditing will be complete. I guess we'll have to wait.
If you were running your code in Node - this would be hard but possible.

Thanks to all who considered this issue. A solution exists as of 19 May 2018; hopefully a better one can replace it soon.
The current version works as follows:
Problem 1: How can we associate function calls from the main thread with function definitions in the worker thread?
S1: A map of Promise objects: Map[Long, PromiseWrapper]() associates a method invocation ID with a promise that can process the result. This simple multiplexing mechanism evolved from another Stack Overflow question. Thanks again to Justin du Coeur.
Problem 2: How can we invoke functions in the worker thread from the main thread?
S1: Pass a text representation of the function to the worker, then parse it with eval and invoke the resulting function. Unfortunately, eval comes with security risks. Besides, having to write pure JavaScript code in string values defeats most of the advantages of Scala.js, namely type safety and Scala syntax.
S2: Storing function definitions in a lookup table in worker scope and invoking the functions by passing the keys. This could work, but feels clunky in Scala because different functions take parameters that vary in number and type.
S3: Wrap the functions into serializable case classes, then send the serialized bytes from the main scope to the worker scope and invoke the function there. You can think of these case classes as message classes. The current solution uses this approach. It relies on BooPickle by Otto Chrons. The serialized class wraps the method call and any trivial function parameters, e.g. numbers, short strings, and simple case classes. Large data, like the TypedArray values featured in this question transfer from the main thread to the worker thread through a mechanism discussed later. Unfortunately, this approach means that all operations on the TypedArray values must be defined before compile time because BooPickle relies on macros, not reflection, to serialize and deserialize classes.
Problem 3: How can we pass the values of the TypedArray class variable, xArr to and from the worker thread without duplicating it?
S1: Because xArr conforms to the Transferrable interface, it can transfer wholly between the main and worker scopes. At the same time, the serialized classes that wrap the function calls conform to a trait that specifies an apply method with this signature:
def apply(parameters: js.Array[Transferable]): js.Array[Transferable]
By convention, the parameters array contains a serialized version of the message case class in index 0. Subsequent indices contain the TypedArray values. Each message class has its own unique implementation of this apply method.
Problem 4: How can we pass the result of the computation back to the promise that waits for it in the main thread?
S1: The apply methods mentioned in Problem 3.S1 return a new array of Transferrable objects with another serialized message class at its head. That message class wraps the return value from the computation: p(xArr) and, with an apply method of its own, instructs the main thread on how to interpret the array. In cases where p(xArr) returns large objects like other TypedArray values, those occupy subsequent positions in the array.
Problem 5: What if statements in the main thread try to access xArr when it has transferred to the worker thread?
S1. Now, any code in the main thread can only access xArr through a checkOut method and must restore it by calling a checkIn method. The checkOut method returns a Future that completes when xArr returns from the worker thread. Concurrent calls to checkOut get pushed onto a queue of promises. Any code that calls checkOut must call checkIn to pass control of xArr on to the next Promise waiting in the queue. Unfortunately, this design burdens the programmer with the responsibility of restoring xArr to its encompassing class. Unfortunately, schemes like this resemble classical concurrency models with locks and memory allocation methods like malloc and free, and tend toward buggy code that freezes or crashes.
Problem 5: After p(xArr) executes in the worker thread, how can xArr return to the class that encapsulated it in the main thread?
S1. Message case classes meant to invoke p(xArr) now inherit from a trait called Boomerang. As the name implies, these messages transfer from the main thread to the worker thread, invoke p(xArr) while there, then return, unchanged, to the main thread. Once returned to the main thread, Boomerang objects call relevant checkIn methods to restore xArr values to their original encapsulating objects.
For simplicity, this answer leaves out details about different types of Transferrable parameters, operations that mutate xArr instead of simply reading it and restoring it, operations that don't take any parameters but still yield large TypedArray responses, and operations that take multiple large TypedArray parameters, but minor modifications to the five solutions articulated above met those objectives.
With this as a baseline, can we:
Simplify this design?
Incorporate user defined operations?
Find safer alternatives to the checkOut, checkIn methods?

Are nodejs data-structures thread-safe by design?

Read in a node.js related web document that it is a single threaded server. So it confuses me whether all data structures by default be thread-safe in a node server!
I have multiple call-backs accessing a global object like this :
callback1{
global_var['key'] = val;
}
callback2{
globalv_var['key'] = val;
}
'key' may be same at times and may be different as well. Will the global_var be thread-safe ?
callbacks, as intended gets called back as and when something is done, in no particular order.

Node.JS contains a "dispatcher." It accepts web requests and hands them off for asynchronous processing. That dispatcher is single threaded. But the dispatcher spins up a new thread for each task, and quickly hands off the task to the new thread, freeing the dispatcher's thread for servicing a new request.
To the extent that those task threads are kept separate (i.e. they don't modify each other's state), yes, they are threadsafe.

All of the javascript you write for your node.js applocation executes as if it were running in a single thread.
Any multithreading occurs behind the scenes, in the I/O code and in other native modules. So there's no need to worry about the thread safety of any application code, regardless.

Sandboxing Node.js modules - can it be done?

I'm learning Node.js (-awesome-), and I'm toying with the idea of using it to create a next-generation MUD (online text-based game). In such games, there are various commands, skills, spells etc. that can be used to kill bad guys as you run around and explore hundreds of rooms/locations. Generally speaking, these features are pretty static - you can't usually create new spells, or build new rooms. I however would like to create a MUD where the code that defines spells and rooms etc. can be edited by users.
That has some obvious security concerns; a malicious user could for example upload some JS that forks the child process 'rm -r /'. I'm not as concerned with protecting the internals of the game (I'm securing as much as possible, but there's only so much you can do in a language where everything is public); I could always track code changes wiki-style, and punish users who e.g. crash the server, or boost their power over 9000, etc. But I'd like to solidly protect the server's OS.
I've looked into other SO answers to similar questions, and most people suggest running a sandboxed version of Node. This won't work in my situation (at least not well), because I need the user-defined JS to interact with the MUD's engine, which itself needs to interact with the filesystem, system commands, sensitive core modules, etc. etc. Hypothetically all of those transactions could perhaps be JSON-encoded in the engine, sent to the sandboxed process, processed, and returned to the engine via JSON, but that is an expensive endeavour if every single call to get a player's hit points needs to be passed to another process. Not to mention it's synchronous, which I would rather avoid.
So I'm wondering if there's a way to "sandbox" a single Node module. My thought is that such a sandbox would need to simply disable the 'require' function, and all would be bliss. So, since I couldn't find anything on Google/SO, I figured I'd pose the question myself.

Okay, so I thought about it some more today, and I think I have a basic strategy:
var require = function(module) {
throw "Uh-oh, untrusted code tried to load module '" + module + "'";
}
var module = null;
// use similar strategy for anything else susceptible
var loadUntrusted = function() {
eval(code);
}
Essentially, we just use variables in a local scope to hide the Node API from eval'ed code, and run the code. Another point of vulnerability would be objects from the Node API that are passed into untrusted code. If e.g. a buffer was passed to an untrusted object/function, that object/function could work its way up the prototype chain, and replace a key buffer function with its own malicious version. That would make all buffers used for e.g. File IO, or piping system commands, etc., vulnerable to injection.
So, if I'm going to succeed in this, I'll need to partition untrusted objects into their own world - the outside world can call methods on it, but it cannot call methods on the outside world. Anyone can of course feel free to please tell me of any further security vulnerabilities they can think of regarding this strategy.

multi-core programming using JavaScript?

So I have this seriously recursive function that I would like to use with my code. The issue is it doesn't really take advantage of dual core machines because js is single threaded. I have tried using webworkers but don't really know much about multicore programming. Would someone point me to some material that could explain how it is done. I googled to find this sample link but its not really much help without documentation! =/
I would be glad if someone could show me how this could be done without webworkers though! That would be just awesome! =)
I came across this link on whatwg. This is really weird because it explains how to use multicore programming in webworkers etc, but on executing on my chrome browser it throws errors. Same goes with other browsers.
Error: 9Uncaught ReferenceError: Worker is not defined in worker.js

UPDATE (2018-06-21): For people coming here in search of multi-core programming in JavaScript, not necessarily browser JavaScript (for that, the answer still applies as-is): Node.js now supports multi-threading behind a feature flag (--experimental-workers): release info, relevant issue.
Writing this off the top of my head, no guarantees for source code. Please go easy on me.
As far as I know, you cannot really program in threads with JavaScript. Webworkers are a form of multi-programming; yet JavaScript is by its nature single-threaded (based on an event loop).
A webworker is seperate thread of execution in the sense that it doesn't share anything with the script that started it; there is no reference to the script's global object (typically called "window" in the browser), and no reference to any of your main script's variables other than data you send to the thread.
Think as the web worker as a little "server" that gets asked a question and provides an answer. You can only send strings to that server, and it can only parse the string and send back what it has computed.
// in the main script, one starts a worker by passing the file name of the
// script containing the worker to the constructor.
var w = new Worker("myworker.js");
// you want to react to the "message" event, if your worker wants to inform
// you of a result. The function typically gets the event as an argument.
w.addEventListener("message",
function (evt) {
// process evt.data, which is the message from the
// worker thread
alert("The answer from the worker is " + evt.data);
});
You can then send a message (a String) to this thread using its postMessage()-Method:
w.postMessage("Hello, this is my message!");
A sample worker script (an "echo" server) can be:
// this is another script file, like "myworker.js"
self.addEventListener("message",
function (evt) {
var data = JSON.parse(evt.data);
/* as an echo server, we send this right back */
self.postMessage(JSON.stringify(data))
})
whatever you post to that thread will be decoded, re-encoded, and sent back. of course you can do whatever processing you would want to do in between. That worker will stay active; you can call terminate() on it (in your main script; that'd be w.terminate()) to end it or calling self.close() in your worker.
To summarize: what you can do is you zip up your function parameters into a JSON string which gets sent using postMessage, decoded, and processed "on the other side" (in the worker). The computation result gets sent back to your "main" script.
To explain why this is not easier: More interaction is not really possible, and that limitation is intentional. Because shared resources (an object visible to both the worker and the main script) would be subject to two threads interfering with them at the same time, you would need to manage access (i.e., locking) to that resource in order to prevent race conditions.
The message-passing, shared-nothing approach is not that well-known mainly because most other programming languages (C and Java for example) use threads that operate on the same address space (while others, like Erlang, for instance, don't). Consider this:
It is really hard to code a larger project with mutexes (a mutual exclusion mechanism) because of the associated deadlock/race condition complexities. This is stuff that can make grown men cry!
It is really easy in comparison to do message-passing, shared-nothing semantics. The code is isolated; you know exactly what goes into your worker and what comes out of your worker. Deadlocks and race conditions are impossible to achieve!
Just try it out; it is capable of doing interesting things, probably all you want. Bear in mind that it is still implementation defined whether it takes advantage of multicore as far as I know.
NB. I just got informed that at least some implementations will handle JSON encoding of messages for you.
So, to give an answer to your question (it's all above; tl;dr version): No, you cannot do this without web workers. But there is nothing really wrong about web workers aside from browser support, as is the case with HTML5 in general.

As far as I remember this is only possible with the new HTML5 standard. The keyword is "Web-Worker"
See also:
HTML5: JavaScript Web Workers
JavaScript Threading With HTML5 Web Workers

Web workers are the answer to the client side. For NodeJS there are many approaches. Most popular - spawn several processes with pm2 or similar tool. Run single process and spawn/fork child processes. You can google around these and will find a lot of samples and tactics.
Web workers are already well supported by all browsers. https://caniuse.com/#feat=webworkers
API & samples: https://developer.mozilla.org/en-US/docs/Web/API/Web_Workers_API/Using_web_workers

We Keep Coding

JavaScript is the programming language of the Web.