Thread-safety when passing array as argument in WebAssembly? [Emscripten]

Thread-safety when passing array as argument in WebAssembly? [Emscripten] - javascript

AFAIK, when passing arrays from JS to Emscripten-compiled C/C++ functions, we are essentially putting the array into a JS simulated HEAP(like Module.HEAPU8), which is shared by JS code and C/C++ code.
This works fine in a single-threaded environment, but how about a multi-threaded environment, like worker threads? Is there some built-in mechanism to guarantee the thread safety for this simulated HEAP?
If not, does it mean we need to call Module._malloc() & Module._free() to dynamically manage heap space for each thread? If so, this sounds like a potential performance bottleneck, given the effort for array copy and space allocation/free might compromise the benefit we gain from using worker threads.
Reference: ref1 ref2

Your understanding is correct, but it is currently impossible to share a WebAssembly.Memory across workers. JavaScript has SharedArrayBuffer but WebAssembly doesn't yet support the equivalent (and compatible) WebAssembly.Memory with shared=true attribute.
Once it is supported you'll be able to postMessage a WebAssembly.Memory and use it to instantiate multiple modules with it across workers. You'll also be able to postMessage the underlying SharedArrayBuffer, and read / write to and from it using JavaScript, concurrently with WebAssembly.
In all these cases they memory won't be copied. The WebAssembly malloc / free implementation isn't specified, but what you'll get from e.g. Emscripten will be thread safe. It won't use grow_memory initially (the design currently disallows growing a shared memory), but will rather pre-allocate and make sure that's thread safe for you (like any multi-threaded C implementation does).

Related

Javascript getElement - what happens behind the scene? [duplicate]

I was thinking about this today and I realized I don't have a clear picture here.
Here are some statements I think to be true (please correct me if I'm wrong):
the DOM is a collection of interfaces specified by W3C.
when parsing HTML source code, the browser creates a DOM tree which has nodes that implement DOM interfaces.
the ECMAScript spec has no reference of browser host objects (DOM, BOM, HTML5 APIs etc.).
how the DOM is actually implemented depends on browser internals and is probably different among most of them.
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
I am curious about what happens behind the scenes when I call document.getElementById('foo'). Does the call get delegated to browser native code by the interpreter or does the browser have JS implementations of all host objects? Do you know about any optimizations they do in regard to this?
I read this overview of browser internals but it didn't mention anything about this. I will look through the Chrome and FF source when I have time, but I thought about asking here first. :)

All of your bullet points are correct, except:
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
should be "...and translate it to native code". SpiderMonkey (the JS engine in Firefox) worked as a bytecode interpreter for a long time before the current JS speed arms race.
On Mozilla's JS-to-DOM bridge:
The host objects are typically implemented in C++, though there is an experiment underway to implement DOM in JS. So when a web page calls document.getElementById('foo'), the actual work of retrieving the element by its ID is done in a C++ method, as hsivonen noted.
The specific way the underlying C++ implementation gets called depends on the API and also changed over time (note that I'm not involved in the development, so might be wrong about some details, here's a blog post by jst, who was actually involved in creating much of this code):
At the lowest level every JS engine provides APIs to define host objects. For example, the browser can call JS_DefineFunctions (as demonstrated in the SpiderMonkey User Guide) to let the engine know that whenever script calls a function with the specified name, a provided C callback should be called. Same for other aspects of the host objects (e.g. enumeration, property getters/setters, etc.)
For the core ECMAScript functionality and in some tricky DOM cases the JS engine/the browser uses these APIs directly to define host objects and their behaviors, but it requires a lot of common boilerplate code for e.g. checking parameter types, converting them to the appropriate C++ types, error handling etc.
For reasons I won't go into, let's say historically, Mozilla made heavy use of XPCOM for many of its objects, including much of the DOM. One feature of XPCOM is its binding to JS called XPConnect. Among other things, XPConnect can take an interface definition in IDL (such as nsIDOMDocument; or more precisely its compiled representation), expose an object with the specified properties to the script, and later, when a script calls getElementById, perform the necessary parameter checks/conversions and route the call directly to a C++ method (nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn))
The way XPConnect worked was quite inefficient: it registered generic functions as callbacks to be executed when a script accesses a host object, and these generic functions figured out what they needed to do in every particular case dynamically. This post about quickstubs walks you through one example.
"Quick stubs" mentioned in the previous link is a way to optimize JS->C++ calls time by trading some code size for it: instead of always using generic C++ functions that know how to make any kind of call, the specialized code is automatically generated at the Firefox build time for a pre-defined list of "hot" calls.
Later on the JIT (tracemonkey at that time) was taught to generate the code calling C++ methods as part of the native code generated for "hot" paths in JS. I'm not sure how the newer JITs (jaegermonkey) work in this regard.
With "paris bindings" the objects are exposed to webpage JS without any reliance on XPConnect, instead generating all the necessary glue JSClass code based on WebIDL (instead of XPCOM-era IDL). See also posts by developers who worked on this: jst and khuey. Also see How is the web-exposed DOM implemented?
I'm fuzzy on details of the three last points in particular, so take it with a grain of salt.
The most recent improvements are listed as dependencies of bug 622298, but I don't follow them closely.

JS calls to DOM methods like getElementById cause the JS engine to call into the C++ code that implements the DOM. For example, in Firefox, the call ends up in nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn).
As you can see, Firefox maintains a hashtable that maps ids to elements in C++ as an optimization in this case, so it doesn't walk the whole DOM tree looking for the id.

The DOM is implemented as a language-independent library pretty much in all major browser implementations, which means it's in a different library from the Javascript engine. For example in IE, the JS engine is implemented in jscript.dll while the DOM is implemented in mshtml.dll. Safari has Nitro(JS) and WebCore(DOM). Chrome has V8(JS) and WebCore(DOM), and Firefox has SpiderMonkey/TraceMonkey(JS) and Gecko(DOM).
What this means is that anytime your JS has to access the DOM, it has to reach over to the DOM library - which is inherently slow because of all the marshaling that has to take place. An analogy that has been used is 2 pieces of land connected by a toll bridge, any time you touch the DOM, you must cross over the bridge and cross back - paying a performance toll.
References
Video: Building High Performance Web Applications and Sites
Book: High Performance Javascript (Chapter 3 on the DOM)

Is web worker heavier or lighter than a native thread

Is web worker just a normal native thread created by browser to run and communicate with other browser threads with a messaging queue? Or doesn't it contain other things when created?
I'm looking at the experimental support of pthread in emscripten, multiple threads in C++ will be translated to web workers once compiled. But will it have the same level of performance as native code? After all fine grained multithreading is a key feature in C++.

At the moment, WebWorkers are pretty heavyweight because the VM duplicates a bunch of its internal state (sometimes even re-JITs code for a worker). That state is much bigger than a native thread's few MiBs of initial stack space and associated state.
Some of this can be fixed by implementations, and I expect that if SharedArrayBuffer or WebAssembly + threads become popular then browser engines will want to optimize things.
That being said, the end of your question hints at a misunderstanding of what thread overheads are, and how the proposal for SharedArrayBuffer (which Emscripten relies on to support pthreads) works. WebWorkers are heavy at the moment, but they can communicate through SABs in exactly the same way native code such as C++ can: by accessing exactly the same memory, at the same virtual address. SAB adds a new kind of ArrayBuffer to JavaScript which doesn't get neutered when you postMessage it to another worker. Multiple workers can then see other worker's updates to the buffer in exactly the same way C++ code when you use std::atomic.
At the same time, workers can't block the main thread by definition and therefore have a more "native" feel. Some web APIs aren't available to all workers, but that's been changing. This becomes relevant if you e.g. write a game and have network / audio / render / AI / input in different threads. The web is slowly finding its own way of doing these things.
The details are a bit trickier
SAB currently only supports non-atomic accesses, and sequentially-consistent accesses (i.e. the only available Atomic access at the moment is the same as C++'s std::memory_order_seq_cst). Doing non-atomic accesses should be about as performant as C++'s non-atomics (big compiler caveats here which I won't get into), and using Atomic should be about as performant as C++'s std::atomic default (which is std::memory_order_seq_cst). C++ has 5 other memory orders (relaxed, consume, acquire, release, acq_rel) which SAB doesn't support at the moment. These other memory orders allow native code to be faster in some circumstances, but are harder to use correctly and portably. They might be added to future updates to SAB, e.g. through this issue.
SAB also supports Futex which native programs rely on under the hood to implement efficient mutex.
There are even trickier details when comparing to C++, I've detailed some of them but there are even more.

Is it accurate to say JavaScript is a "single-thread" language?

I've heard this kind of statements many times, but personally I think this doesn't quite make sense. I think people are confusing JavaScript as a language specification and JavaScript in practice (browser, Node, etc.). Of course in most cases JavaScript are executed in a single-thread environment; but AFAIK nothing in the language specification requires it to be so. I think this is just like saying Python is "interpreted", while it's in fact entirely a matter of implementation.
So, is it accurate to say JavaScript is a "single-thread" language?

By JavaScript you seem to mean ECMAScript.
There's already multithreading in the browser, built with webworkers, and based on a strong isolation of data : workers only communicate by message passing, nothing is shared.
If you want more intricate multithreading, with data sharing, then that doesn't look possible now. Nothing in ECMAScript explicitely forbids multithreading but you can't do multithreading without
facilities to create "threads" (in a general sense, that could be coroutines)
mutexes and facilities to synchronize accesses
a low level support to ensure for example a property change won't break the data in case of simultaneous accesses. None of the current engines has been designed with this kind of strength (yes, some of them support multiple threads but in isolation).
The fact ECMAScript wasn't designed to include multi-threading is enough to prevent, currently, to support it (other than message-passing isolated multi-threading as is already done but it's a very limited kind of multi-threading).
You have to realize that
data sharing multi-threading is very expensive (not even speaking about simultaneous actions on the DOM)
you would rarely use it in JavaScript
Why do I say you would rarely use it ? Because most of the IO blocking tasks (file reading, requests, db queries, etc.), most of the low level tasks (for example image decoding or page rendering), most of the UI management (with event queue), most of the scheduling (timeouts and intervals) is done outside for you.

Multi-threading behavior is available in both HTML5 and node.js, BUT there is no native threading API in the Javascript language, so I guess the answer to your contrived question (I mean that in the nicest possible way, of course) is "yes, Javascript is a single-threaded language."

AFAIK nothing in the language specification requires it to be so.
In TC39 it says:
At any point in time, there is at most one execution context per agent that is actually executing code.
That appears to me to be the crucial guarantee that you never have to synchronize access to variables in ECMAScript. Which is what I expect is meant when someone says a language is single-threaded.
Of course, most ECMAScript host environments use more than one thread in their host environment implementation for things like garbage collection, etc. And, ECMAScript itself allows for multiple separate contexts of execution that could each be driven by their own thread--though the standard makes it clear you could also just drive all of them with the same thread.
The point, again, is you never have to protect any ECMAScript variable with a mutex, semaphore, or the like (which is why ECMAScript provides no such facilities) since the language promises there will never be two threads of control with simultaneous access to the same context.
I don't know of any JavaScript implementation that violates this promise either, though there certainly could be.

How are JavaScript host objects implemented?

I was thinking about this today and I realized I don't have a clear picture here.
Here are some statements I think to be true (please correct me if I'm wrong):
the DOM is a collection of interfaces specified by W3C.
when parsing HTML source code, the browser creates a DOM tree which has nodes that implement DOM interfaces.
the ECMAScript spec has no reference of browser host objects (DOM, BOM, HTML5 APIs etc.).
how the DOM is actually implemented depends on browser internals and is probably different among most of them.
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
I am curious about what happens behind the scenes when I call document.getElementById('foo'). Does the call get delegated to browser native code by the interpreter or does the browser have JS implementations of all host objects? Do you know about any optimizations they do in regard to this?
I read this overview of browser internals but it didn't mention anything about this. I will look through the Chrome and FF source when I have time, but I thought about asking here first. :)

JS calls to DOM methods like getElementById cause the JS engine to call into the C++ code that implements the DOM. For example, in Firefox, the call ends up in nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn).
As you can see, Firefox maintains a hashtable that maps ids to elements in C++ as an optimization in this case, so it doesn't walk the whole DOM tree looking for the id.

The DOM is implemented as a language-independent library pretty much in all major browser implementations, which means it's in a different library from the Javascript engine. For example in IE, the JS engine is implemented in jscript.dll while the DOM is implemented in mshtml.dll. Safari has Nitro(JS) and WebCore(DOM). Chrome has V8(JS) and WebCore(DOM), and Firefox has SpiderMonkey/TraceMonkey(JS) and Gecko(DOM).
What this means is that anytime your JS has to access the DOM, it has to reach over to the DOM library - which is inherently slow because of all the marshaling that has to take place. An analogy that has been used is 2 pieces of land connected by a toll bridge, any time you touch the DOM, you must cross over the bridge and cross back - paying a performance toll.
References
Video: Building High Performance Web Applications and Sites
Book: High Performance Javascript (Chapter 3 on the DOM)

What optimizations do modern JavaScript engines perform?

By now, most mainstream browsers have started integrating optimizing JIT compilers to their JavaScript interpreters/virtual machines. That's good for everyone. Now, I'd be hard-pressed to know exactly which optimizations they do perform and how to best take advantage of them. What are references on optimizations in each of the major JavaScript engines?
Background:
I'm working on a compiler that generates JavaScript from a higher-level & safer language (shameless plug: it's called OPA and it's very cool) and, given the size of applications I'm generating, I'd like my JavaScript code to be as fast and as memory-efficient as possible. I can handle high-level optimizations, but I need to know more about which runtime transformations are performed, so as to know which low-level code will produce best results.
One example, from the top of my mind: the language I'm compiling will soon integrate support for laziness. Do JIT engines behave well with lazy function definitions?

This article series discusses the optimisations of V8. In summary:
It generates native machine code - not bytecode (V8 Design Elements)
Precise garbage collection (Wikipedia)
Inline caching of called methods (Wikipedia)
Storing class transition information so that objects with the same properties are grouped together (V8 Design Elements)
The first two points might not help you very much in this situation. The third might show insight into getting things cached together. The last might help you create objects with same properties so they use the same hidden classes.
This blog post discusses some of the optimisations of SquirrelFish Extreme:
Bytecode optimizations
Polymorphic inline cache (like V8)
Context threaded JIT (introduction of native machine code generation, like V8)
Regular expression JIT
TraceMonkey is optimised via tracing. I don't know much about it but it looks like it detects the type of a variable in some "hot code" (code run in loops often) and creates optimised code based on what the type of that variable is. If the type of the variable changes, it must recompile the code - based off of this, I'd say you should stay away from changing the type of a variable within a loop.

I found an additional resource:
What does V8 do with that loop?

We Keep Coding

JavaScript is the programming language of the Web.