According to this W3C Working Draft, the ScriptProcessorNode is deprecated and will be replaced by AudioWorkerNode.
And Chrome recently has AudioWorklet implemented to replace ScriptProcessorNode.
Are those two APIs the same thing? Does Chrome just implement it with a different name?
Yes it has just been renamed AudioWorklet, even in the specs probably to mark the distinction between this API and Web Worker, since this API is more like the Worklets API proposed by CSS-WG.
They key points Worklets do differ to Workers are (according to houdini page)
Worklets are similar to web workers however they:
Are thread-agnostic. That is, they are not defined to run on a particular thread. Rendering engines may run them wherever they choose.
Are able to have multiple duplicate instances of the global scope created for the purpose of parallelism.
Are not event API based. Instead classes are registered on the global scope, whose methods are to be invoked by the user agent.
Have a reduced API surface on the global scope.
Have a lifetime for the global scope which is defined by subsequent specifications or user agents. They aren’t tied to the lifetime of the document.
As worklets have a relatively high overhead, they should be used sparingly. Due to this worklets are expected to be shared between separate scripts.
Related
I'm working on an application where I have multiple webworkers running together. These webworkers are developed by third parties, and are not trusted. They provide postmessage APIs to each other.
I would like to enable the webworkers to have safe access to local storage. IndexedDB is the standard choice, however I need to ensure that a malicious webworker cannot interfere with the data of another webworker.
My original idea was that I could 'domain' each webworker somehow. Each one gets access to its own piece of IndexedDB, and cannot see the storage put in other pieces by other webworkers. At the moment, I do not believe this is possible since I need the workers to exist together in one iframe.
My next idea was to have a single, trusted webworker that has IndexedDB access, and set up sandbox rules for all of the other webworkers such that they can't use IndexedDB at all, but instead must communicate with the API of the trusted webworker to store and retrieve local data. My current understanding is that I can get this to work if I use two iframes, where the first iframe has access to IndexedDB and runs the trusted webworker, and the second iframe is in a different domain where non-malicious webworkers know not to use the storage.
I am not a huge fan of the two iframe solution - it's complex, has performance overheads, and requires webworker devs to know they can't safely use localstorage even though they actually have access - and I'm looking for a better way to sandbox specific webworkers away from indexeddb.
I was thinking about this today and I realized I don't have a clear picture here.
Here are some statements I think to be true (please correct me if I'm wrong):
the DOM is a collection of interfaces specified by W3C.
when parsing HTML source code, the browser creates a DOM tree which has nodes that implement DOM interfaces.
the ECMAScript spec has no reference of browser host objects (DOM, BOM, HTML5 APIs etc.).
how the DOM is actually implemented depends on browser internals and is probably different among most of them.
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
I am curious about what happens behind the scenes when I call document.getElementById('foo'). Does the call get delegated to browser native code by the interpreter or does the browser have JS implementations of all host objects? Do you know about any optimizations they do in regard to this?
I read this overview of browser internals but it didn't mention anything about this. I will look through the Chrome and FF source when I have time, but I thought about asking here first. :)
All of your bullet points are correct, except:
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
should be "...and translate it to native code". SpiderMonkey (the JS engine in Firefox) worked as a bytecode interpreter for a long time before the current JS speed arms race.
On Mozilla's JS-to-DOM bridge:
The host objects are typically implemented in C++, though there is an experiment underway to implement DOM in JS. So when a web page calls document.getElementById('foo'), the actual work of retrieving the element by its ID is done in a C++ method, as hsivonen noted.
The specific way the underlying C++ implementation gets called depends on the API and also changed over time (note that I'm not involved in the development, so might be wrong about some details, here's a blog post by jst, who was actually involved in creating much of this code):
At the lowest level every JS engine provides APIs to define host objects. For example, the browser can call JS_DefineFunctions (as demonstrated in the SpiderMonkey User Guide) to let the engine know that whenever script calls a function with the specified name, a provided C callback should be called. Same for other aspects of the host objects (e.g. enumeration, property getters/setters, etc.)
For the core ECMAScript functionality and in some tricky DOM cases the JS engine/the browser uses these APIs directly to define host objects and their behaviors, but it requires a lot of common boilerplate code for e.g. checking parameter types, converting them to the appropriate C++ types, error handling etc.
For reasons I won't go into, let's say historically, Mozilla made heavy use of XPCOM for many of its objects, including much of the DOM. One feature of XPCOM is its binding to JS called XPConnect. Among other things, XPConnect can take an interface definition in IDL (such as nsIDOMDocument; or more precisely its compiled representation), expose an object with the specified properties to the script, and later, when a script calls getElementById, perform the necessary parameter checks/conversions and route the call directly to a C++ method (nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn))
The way XPConnect worked was quite inefficient: it registered generic functions as callbacks to be executed when a script accesses a host object, and these generic functions figured out what they needed to do in every particular case dynamically. This post about quickstubs walks you through one example.
"Quick stubs" mentioned in the previous link is a way to optimize JS->C++ calls time by trading some code size for it: instead of always using generic C++ functions that know how to make any kind of call, the specialized code is automatically generated at the Firefox build time for a pre-defined list of "hot" calls.
Later on the JIT (tracemonkey at that time) was taught to generate the code calling C++ methods as part of the native code generated for "hot" paths in JS. I'm not sure how the newer JITs (jaegermonkey) work in this regard.
With "paris bindings" the objects are exposed to webpage JS without any reliance on XPConnect, instead generating all the necessary glue JSClass code based on WebIDL (instead of XPCOM-era IDL). See also posts by developers who worked on this: jst and khuey. Also see How is the web-exposed DOM implemented?
I'm fuzzy on details of the three last points in particular, so take it with a grain of salt.
The most recent improvements are listed as dependencies of bug 622298, but I don't follow them closely.
JS calls to DOM methods like getElementById cause the JS engine to call into the C++ code that implements the DOM. For example, in Firefox, the call ends up in nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn).
As you can see, Firefox maintains a hashtable that maps ids to elements in C++ as an optimization in this case, so it doesn't walk the whole DOM tree looking for the id.
The DOM is implemented as a language-independent library pretty much in all major browser implementations, which means it's in a different library from the Javascript engine. For example in IE, the JS engine is implemented in jscript.dll while the DOM is implemented in mshtml.dll. Safari has Nitro(JS) and WebCore(DOM). Chrome has V8(JS) and WebCore(DOM), and Firefox has SpiderMonkey/TraceMonkey(JS) and Gecko(DOM).
What this means is that anytime your JS has to access the DOM, it has to reach over to the DOM library - which is inherently slow because of all the marshaling that has to take place. An analogy that has been used is 2 pieces of land connected by a toll bridge, any time you touch the DOM, you must cross over the bridge and cross back - paying a performance toll.
References
Video: Building High Performance Web Applications and Sites
Book: High Performance Javascript (Chapter 3 on the DOM)
Is web worker just a normal native thread created by browser to run and communicate with other browser threads with a messaging queue? Or doesn't it contain other things when created?
I'm looking at the experimental support of pthread in emscripten, multiple threads in C++ will be translated to web workers once compiled. But will it have the same level of performance as native code? After all fine grained multithreading is a key feature in C++.
At the moment, WebWorkers are pretty heavyweight because the VM duplicates a bunch of its internal state (sometimes even re-JITs code for a worker). That state is much bigger than a native thread's few MiBs of initial stack space and associated state.
Some of this can be fixed by implementations, and I expect that if SharedArrayBuffer or WebAssembly + threads become popular then browser engines will want to optimize things.
That being said, the end of your question hints at a misunderstanding of what thread overheads are, and how the proposal for SharedArrayBuffer (which Emscripten relies on to support pthreads) works. WebWorkers are heavy at the moment, but they can communicate through SABs in exactly the same way native code such as C++ can: by accessing exactly the same memory, at the same virtual address. SAB adds a new kind of ArrayBuffer to JavaScript which doesn't get neutered when you postMessage it to another worker. Multiple workers can then see other worker's updates to the buffer in exactly the same way C++ code when you use std::atomic.
At the same time, workers can't block the main thread by definition and therefore have a more "native" feel. Some web APIs aren't available to all workers, but that's been changing. This becomes relevant if you e.g. write a game and have network / audio / render / AI / input in different threads. The web is slowly finding its own way of doing these things.
The details are a bit trickier
SAB currently only supports non-atomic accesses, and sequentially-consistent accesses (i.e. the only available Atomic access at the moment is the same as C++'s std::memory_order_seq_cst). Doing non-atomic accesses should be about as performant as C++'s non-atomics (big compiler caveats here which I won't get into), and using Atomic should be about as performant as C++'s std::atomic default (which is std::memory_order_seq_cst). C++ has 5 other memory orders (relaxed, consume, acquire, release, acq_rel) which SAB doesn't support at the moment. These other memory orders allow native code to be faster in some circumstances, but are harder to use correctly and portably. They might be added to future updates to SAB, e.g. through this issue.
SAB also supports Futex which native programs rely on under the hood to implement efficient mutex.
There are even trickier details when comparing to C++, I've detailed some of them but there are even more.
I was thinking about this today and I realized I don't have a clear picture here.
Here are some statements I think to be true (please correct me if I'm wrong):
the DOM is a collection of interfaces specified by W3C.
when parsing HTML source code, the browser creates a DOM tree which has nodes that implement DOM interfaces.
the ECMAScript spec has no reference of browser host objects (DOM, BOM, HTML5 APIs etc.).
how the DOM is actually implemented depends on browser internals and is probably different among most of them.
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
I am curious about what happens behind the scenes when I call document.getElementById('foo'). Does the call get delegated to browser native code by the interpreter or does the browser have JS implementations of all host objects? Do you know about any optimizations they do in regard to this?
I read this overview of browser internals but it didn't mention anything about this. I will look through the Chrome and FF source when I have time, but I thought about asking here first. :)
All of your bullet points are correct, except:
modern JS interpreters use JIT to improve the code performance and translate it to bytecode
should be "...and translate it to native code". SpiderMonkey (the JS engine in Firefox) worked as a bytecode interpreter for a long time before the current JS speed arms race.
On Mozilla's JS-to-DOM bridge:
The host objects are typically implemented in C++, though there is an experiment underway to implement DOM in JS. So when a web page calls document.getElementById('foo'), the actual work of retrieving the element by its ID is done in a C++ method, as hsivonen noted.
The specific way the underlying C++ implementation gets called depends on the API and also changed over time (note that I'm not involved in the development, so might be wrong about some details, here's a blog post by jst, who was actually involved in creating much of this code):
At the lowest level every JS engine provides APIs to define host objects. For example, the browser can call JS_DefineFunctions (as demonstrated in the SpiderMonkey User Guide) to let the engine know that whenever script calls a function with the specified name, a provided C callback should be called. Same for other aspects of the host objects (e.g. enumeration, property getters/setters, etc.)
For the core ECMAScript functionality and in some tricky DOM cases the JS engine/the browser uses these APIs directly to define host objects and their behaviors, but it requires a lot of common boilerplate code for e.g. checking parameter types, converting them to the appropriate C++ types, error handling etc.
For reasons I won't go into, let's say historically, Mozilla made heavy use of XPCOM for many of its objects, including much of the DOM. One feature of XPCOM is its binding to JS called XPConnect. Among other things, XPConnect can take an interface definition in IDL (such as nsIDOMDocument; or more precisely its compiled representation), expose an object with the specified properties to the script, and later, when a script calls getElementById, perform the necessary parameter checks/conversions and route the call directly to a C++ method (nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn))
The way XPConnect worked was quite inefficient: it registered generic functions as callbacks to be executed when a script accesses a host object, and these generic functions figured out what they needed to do in every particular case dynamically. This post about quickstubs walks you through one example.
"Quick stubs" mentioned in the previous link is a way to optimize JS->C++ calls time by trading some code size for it: instead of always using generic C++ functions that know how to make any kind of call, the specialized code is automatically generated at the Firefox build time for a pre-defined list of "hot" calls.
Later on the JIT (tracemonkey at that time) was taught to generate the code calling C++ methods as part of the native code generated for "hot" paths in JS. I'm not sure how the newer JITs (jaegermonkey) work in this regard.
With "paris bindings" the objects are exposed to webpage JS without any reliance on XPConnect, instead generating all the necessary glue JSClass code based on WebIDL (instead of XPCOM-era IDL). See also posts by developers who worked on this: jst and khuey. Also see How is the web-exposed DOM implemented?
I'm fuzzy on details of the three last points in particular, so take it with a grain of salt.
The most recent improvements are listed as dependencies of bug 622298, but I don't follow them closely.
JS calls to DOM methods like getElementById cause the JS engine to call into the C++ code that implements the DOM. For example, in Firefox, the call ends up in nsDocument::GetElementById(const nsAString& aId, nsIDOMElement** aReturn).
As you can see, Firefox maintains a hashtable that maps ids to elements in C++ as an optimization in this case, so it doesn't walk the whole DOM tree looking for the id.
The DOM is implemented as a language-independent library pretty much in all major browser implementations, which means it's in a different library from the Javascript engine. For example in IE, the JS engine is implemented in jscript.dll while the DOM is implemented in mshtml.dll. Safari has Nitro(JS) and WebCore(DOM). Chrome has V8(JS) and WebCore(DOM), and Firefox has SpiderMonkey/TraceMonkey(JS) and Gecko(DOM).
What this means is that anytime your JS has to access the DOM, it has to reach over to the DOM library - which is inherently slow because of all the marshaling that has to take place. An analogy that has been used is 2 pieces of land connected by a toll bridge, any time you touch the DOM, you must cross over the bridge and cross back - paying a performance toll.
References
Video: Building High Performance Web Applications and Sites
Book: High Performance Javascript (Chapter 3 on the DOM)
Is there any way for me to share a variable between two web workers? (Web workers are basically threads in Javascript)
In languages like c# you have:
public static string message = "";
static void Main()
{
message = "asdf";
new Thread(mythread).Run();
}
public static void mythread()
{
Console.WriteLine(message); //outputs "asdf"
}
I know thats a bad example, but in my Javascript application, I have a thread doing heavy computations that can be spread across multiple threads [since I have a big chunk of data in the form of an array. All the elements of the array are independent of each other. In other words, my worker threads don't have to care about locking or anything like that]
I've found the only way to "share" a variable between two threads would be to create a Getter/setter [via prototyping] and then use postMessage/onmessage... although this seems really inefficient [especially with objects, which I have to use JSON for AFAIK]
LocalStorage/Database has been taken out of the HTML5 specification because it could result in deadlocks, so that isn't an option [sadly]...
The other possibility I have found was to use PHP to actually have a getVariable.php and setVariable.php pages, which use localstorage to store ints/strings... once again, Objects [which includes arrays/null] have to be converted to JSON... and then later, JSON.parse()'d.
As far as I know, Javascript worker threads are totally isolated from the main page thread [which is why Javascript worker threads can't access DOM elements
Although postMessage works, it is slow.
Web workers are deliberately shared-nothing -- everything in a worker is completely hidden from other workers and from pages in the browser. If there were any way to share non-"atomic" values between workers, the semantics of those values would be nearly impossible to use with predictable results. Now, one could introduce locks as a way to use such values, to a certain extent -- you acquire the lock, examine and maybe modify the value, then release the lock -- but locks are very tricky to use, and since the usual failure mode is deadlock you would be able to "brick" the browser pretty easily. That's no good for developers or users (especially when you consider that the web environment is so amenable to experimentation by non-programmers who've never even heard of threads, locks, or message-passing), so the alternative is no state shared between workers or pages in the browser. You can pass messages (which one can think of as being serialized "over the wire" to the worker, which then creates its own copy of the original value based on the serialized information) without having to address any of these problems.
Really, message-passing is the right way to support parallelism without letting the concurrency problems get completely out of control. Orchestrate your message handoffs properly and you should have every bit as much power as if you could share state. You really don't want the alternative you think you want.
There are two options to share data between dedicated workers:
1. Shared Workers
The SharedWorker interface represents a specific kind of worker that
can be accessed from several browsing contexts, such as several
windows, iframes or even workers.
Spawning a Shared Worker in a Dedicated Worker
2. Channel Messaging API
The Channel Messaging API allows two separate scripts running in
different browsing contexts attached to the same document (e.g., two
IFrames, or the main document and an IFrame, two documents via a
SharedWorker, or two workers) to communicate directly, passing
messages between one another through two-way channels (or pipes) with
a port at each end.
How to call shared worker from the web worker?
No, but you can send messages to web workers which can be arrays, objects, numbers, strings, booleans, and ImageData or any combination of these. Web workers can send messages back too.
I recently read about (but have not used), shared workers. According to Share the work! Opera comes with SharedWorker support, support is only in the newest browsers (Opera 10.6, Chrome 5, Safari 5).