I'm new to JavaScript so forgive me for being a n00b.
When there's intensive calculation required, it more than likely involves loops that are recursive or otherwise. Sometimes this may mean having am recursive loop that runs four functions and maybe each of those functions walks the entire DOM tree, read positions and do some math for collision detection or whatever.
While the first function is walking the DOM tree, the next one will have to wait its for the first one to finish, and so forth. Instead of doing this, why not launch those loops-within-loops separately, outside the programs, and act on their calculations in another loop that runs slower because it isn't doing those calculations itself?
Retarded or clever?
Thanks in advance!
Long-term computations are exactly what Web Workers are for. What you describe is the common pattern of producer and/or consumer threads. While you could do this using Web Workers, the synchronization overhead would likely trump any gains even on highly parallel systems.
JavaScript is not the ideal language for computationally demanding applications. Also, processing power of web browser machines can vary wildly (think a low-end smartphone vs. a 16core workstation). Therefore, consider calculating complex stuff on the server and sending the result to the client to display.
For your everyday web application, you should take a single-threaded approach and analyze performance once it becomes a problem. Heck, why not ask for help about your performance problem here?
JavaScript was never meant to do perform such computationally intensive tasks, and even though this is changing, the fact remains that JavaScript is inherently single-threaded. The recent web workers technology provides a limited form of multi-threading but these worker threads can't access the DOM directly; they can only send/receive messages to the main thread which can then access it on their behalf.
Currently, the only way to have real parallel processing in JS is to use Web Workers, but it is only supported by very recent browsers. And if your program requires such a thing, it could mean that you are not using the right tools (for example, walking the DOM tree is generally done by using DOM selectors like querySelectorAll).
Related
Is web worker just a normal native thread created by browser to run and communicate with other browser threads with a messaging queue? Or doesn't it contain other things when created?
I'm looking at the experimental support of pthread in emscripten, multiple threads in C++ will be translated to web workers once compiled. But will it have the same level of performance as native code? After all fine grained multithreading is a key feature in C++.
At the moment, WebWorkers are pretty heavyweight because the VM duplicates a bunch of its internal state (sometimes even re-JITs code for a worker). That state is much bigger than a native thread's few MiBs of initial stack space and associated state.
Some of this can be fixed by implementations, and I expect that if SharedArrayBuffer or WebAssembly + threads become popular then browser engines will want to optimize things.
That being said, the end of your question hints at a misunderstanding of what thread overheads are, and how the proposal for SharedArrayBuffer (which Emscripten relies on to support pthreads) works. WebWorkers are heavy at the moment, but they can communicate through SABs in exactly the same way native code such as C++ can: by accessing exactly the same memory, at the same virtual address. SAB adds a new kind of ArrayBuffer to JavaScript which doesn't get neutered when you postMessage it to another worker. Multiple workers can then see other worker's updates to the buffer in exactly the same way C++ code when you use std::atomic.
At the same time, workers can't block the main thread by definition and therefore have a more "native" feel. Some web APIs aren't available to all workers, but that's been changing. This becomes relevant if you e.g. write a game and have network / audio / render / AI / input in different threads. The web is slowly finding its own way of doing these things.
The details are a bit trickier
SAB currently only supports non-atomic accesses, and sequentially-consistent accesses (i.e. the only available Atomic access at the moment is the same as C++'s std::memory_order_seq_cst). Doing non-atomic accesses should be about as performant as C++'s non-atomics (big compiler caveats here which I won't get into), and using Atomic should be about as performant as C++'s std::atomic default (which is std::memory_order_seq_cst). C++ has 5 other memory orders (relaxed, consume, acquire, release, acq_rel) which SAB doesn't support at the moment. These other memory orders allow native code to be faster in some circumstances, but are harder to use correctly and portably. They might be added to future updates to SAB, e.g. through this issue.
SAB also supports Futex which native programs rely on under the hood to implement efficient mutex.
There are even trickier details when comparing to C++, I've detailed some of them but there are even more.
Not enough that JavaScript isn't multithreaded, apparently JavaScript doesn't even get its own but shares a thread with a load of other stuff. Even in most modern browsers JavaScript is typically in the same queue as painting, updating styles, and handling user actions.
Why is that?
From my experience an immensely improved user experience could be gained if JavaScript ran on its own thread, alone by JS not blocking UI rendering or the liberation of intricate or limited message queue optimization boilerplate (yes, also you, webworkers!) which the developer has to write themselves to keep the UI responsive all over the place when it really comes down to it.
I'm interested in understanding the motivation which governs such a seemingly unfortunate design decision, is there a convincing reason from a software architecture perspective?
User Actions Require Participation from JS Event Handlers
User actions can trigger Javascript events (clicks, focus events, key events, etc...) that participate and potentially influence the user action so clearly the single JS thread can't be executing while user actions are being processed because, if so, then the JS thread couldn't participate in the user actions because it is already doing something else. So, the browser doesn't process the default user actions until the JS thread is available to participate in that process.
Rendering
Rendering is more complicated. A typical DOM modification sequence goes like this: 1) DOM modified by JS, layout marked dirty, 2) JS thread finishes executing so the browser now knows that JS is done modifying the DOM, 3) Browser does layout to relayout changed DOM, 4) Browser paints screen as needed.
Step 2) is important here. If the browser did a new layout and screen painting after every single JS DOM modification, the whole process could be incredibly inefficient if the JS was actually going to make a bunch of DOM modifications. Plus, there would be thread synchronization issues because if you had JS modifying the DOM at the same time as the browser was trying to do a relayout and repaint, you'd have to synchronize that activity (e.g. block somebody so an operation could complete without the underlying data being changed by another thread).
FYI, there are some work-arounds that can be used to force a relayout or to force a repaint from within your JS code (not exactly what you were asking, but useful in some circumstances).
Multiple Threads Accessing DOM Really Complex
The DOM is essentially a big shared data structure. The browser constructs it when the page is parsed. Then loading scripts and various JS events have a chance to modify it.
If you suddenly had multiple JS threads with access to the DOM running concurrently, you'd have a really complicated problem. How would you synchronize access? You couldn't even write the most basic DOM operation that would involve finding a DOM object in the page and then modifying it because that wouldn't be an atomic operation. The DOM could get changed between the time you found the DOM object and when you made your modification. Instead, you'd probably have to acquire a lock on at least a sub-tree in the DOM preventing it from being changed by some other thread while you were manipulating or searching it. Then, after making the modifications, you'd have to release the lock and release any knowledge of the state of the DOM from your code (because as soon as you release the lock, some other thread could be changing it). And, if you didn't do things correctly, you could end up with deadlocks or all sorts of nasty bugs. In reality, you'd have to treat the DOM like a concurrent, multi-user datastore. This would be a significantly more complex programming model.
Avoid Complexity
There is one unifying theme among the "single threaded JS" design decision. Keep things simple. Don't require an understanding of a multiple-threaded environment and thread synchronization tools and debugging of multiple threads in order to write solid, reliable browser Javascript.
One reason browser Javascript is a successful platform is because it is very accessible to all levels of developers and it relatively easy to learn and to write solid code. While browser JS may get more advanced features over time (like we got with WebWorkers), you can be absolutely sure that these will be done in a way that simple things stay simple while more advanced things can be done by more advanced developers, but without breaking any of the things that keep things simple now.
FYI, I've written a multi-user web server application in node.js and I am constantly amazed at how much less complicated much of the server design is because of single threaded nature of nodejs Javascript. Yes, there are a few things that are more of a pain to write (learn promises for writing lots of async code), but wow the simplifying assumption that your JS code is never interrupted by another request drastically simplifies the design, testing and reduces the hard to find and fix bugs that concurrency design and coding is always fraught with.
Discussion
Certainly the first issue could be solved by allowing user action event handlers to run in their own thread so they could occur any time. But, then you immediately have multi-threaded Javascript and now require a whole new JS infrastructure for thread synchronization and whole new classes of bugs. The designers of browser Javascript have consistently decided not to open that box.
The Rendering issue could be improved if desired, but at a significant complication to the browser code. You'd have to invent some way to guess when the running JS code seems like it is no longer changing the DOM (perhaps some number of ms go by with no more changes) because you have to avoid doing a relayout and screen paint immediately on every DOM change. If the browser did that, some JS operations would become 100x slower than they are today (the 100x is a wild guess, but the point is they'd be a lot slower). And, you'd have to implement thread synchronization between layout, painting and JS DOM modifications which is doable, but complicated, a lot of work and a fertile ground for browser implementation bugs. And, you have to decide what to do when you're part-way through a relayout or repaint and the JS thread makes a DOM modification (none of the answers are great).
Disclaimer, my knowledge of node.js a few articles mostly summarized by this http://en.wikipedia.org/wiki/Node.js
That said, so my understanding is that it's supposed to be very quick because it avoids the overhead of threading. It puts everything into a single loop instead of doing the overhead of switching between processes.
I assume there is a reason why there is a sophisticated method of switching contexts completely in between threads. My question is, what is the benefit of having threads over the node.js approach?
Node.js is extremely fast with IO-intensive tasks, since its event model supports IO delays perfectly. On the other hand, it is completely incapable of doing CPU-intensive tasks without stopping everything. Thus, if you need some heavy calculation, you will want to fork off a worker to do it for you.
Threaded model switches contexts automatically, whatever the thread is doing, and thus can handle CPU-intensive jobs without impacting other threads negatively too much. (Or rather, they will still work, only slower if CPU capacity is reached.)
The new JavaScript engine takes advantage of multiple CPU cores through Windows to interpret, compile, and run code in parallel. - http://technet.microsoft.com/en-us/library/gg699435.aspx
and
The Chakra engine interprets, compiles, and executes code in parallel and takes advantage of multiple CPU cores, when available. - http://msdn.microsoft.com/en-us/ie/ff468705.aspx
Wait, what?!? Does this mean we've got multi-threaded parallel JavaScript code execution (outside of web-workers) in IE9?
I'm thinking this is just a bad marketing gimmick but would like to see some more info on this. Maybe they mean different browser windows/tabs/processes can utilize multiple CPUs?
Conclusions, based largely on the comments and hence provided as a community wiki answer so that this question ends up with an actual answer:
It's likely that Microsoft mean that the separate tasks of (i) interpreting and/or running; and (ii) compiling occur in parallel. It's probable that they've applied technology like Sun's old HotSpot JVM so that the Javascript virtual machine interprets code at the first instance because it can start doing that instantly. It also JIT compiles any code that appears to be used sufficiently frequently for doing so to be a benefit. It may even have different levels of compiler optimisation that it slowly dials up. In that case it may be using multiple cores to interpret or run one fragment of code while also compiling arbitrarily many others, or even while recompiling and better optimising the same piece of code that is being run.
However, it's also possible on a technical level that you could perform static analysis to determine when callbacks are mutually independent in terms of state, and allow those callbacks to execute in parallel if the triggering events prompted them to do so. In that way a Javascript virtual machine could actually interpret/run code in parallel without affecting the semantically serial nature of the language. Such a system would be logically similar to the operation of superscalar CPUs, albeit at a much greater remove and with significantly greater complexity.
Node.js server is works on event based models where callback functions are supported. But I am not able to understand how is it better than traditional thread based servers where threads wait for system IO. In case of thread based model, when a thread needs to wait for IO, it gets preempted so doesn't consume CPU cycles hence doesn't contribute to wait time.
How Node.js improves wait time?
when a thread needs to wait for IO, it gets preempted
Actually, it's not preempted. Preemption is something completely different. What happens is that the thread is blocked.
For an event based model something similar happens. Event based interpreters are basically state machines. Only, the state machine is abstracted away and is not visible to the user. When something is waiting for an event it passes the control back to the interpreter. When the interpreter has nothing else to process it blocks itself waiting for I/O. Only, unlike traditional threading code the interpreter waits for multiple I/O.
What's happening at the C level is that the interpreter is using something like select(), poll(), epoll() and friends (depends on the OS and library installed) to do the blocking and waiting for I/O.
Now, why does a select()/poll() based mechanism generally perform better? Actually, 'generally' here depends on what you mean. A select() based server executes all code in a single process/thread. The biggest performance gain from this is that it avoids context switching - every time the OS transfers control over from one thread to another it has to save all the relevant registers, memory map, stack pointers, FPU context etc. so that the other thread can resume execution where it left off. The overhead of doing this can be quite significant.
In fact, there is a historical example of how extreme the overhead can be. Back in the early 2000s someone started benchmarking web servers. To the surprise of everyone, tclhttpd outperformed Apache for serving static files. Now, tcl is not only an interpreted language, but back in 2000 it was a very slow interpreted language because it didn't have a seperate compilation phase (it sort of does now). Tcl scripts are interpreted directly in string form making it around 400x slower than C. Apache is obviously written in C so what's making tclhttpd faster?
It turned out that tclhttpd is event based running only on a single thread while Apache was multithreaded. The overhead of constant thread switching turned out to give tclhttpd enough advantage to perform better than Apache.
Of course, there is always a compromise. A single threaded server like tclhttpd or node.js cannot take advantage of multiple CPUs. Back in the early 2000s multiple CPUs were uncommon. These days they are almost default. Not to mention that most CPUs are also hyperthreaded (hyperthreading adds hardware to the CPU to make context switching cheap).
The best servers these days have learned from history and are a combination of both. Apache2, and Nginx use therad pools: they are multithreaded but each thread serves more than a single connection. This is a hybrid of the two approaches but is more complex to manage.
Read the following article for a more in-depth discussion on this topic: The C10K problem
Threads are relatively heavy-weight objects that have a resource footprint extending all the way into the kernel. When you park a thread in a blocking syscall or on a mutex or condition variable, you are tying up all those resources but doing nothing. Now the OS has to find more resources so your program can create another thread... Then you idle them too. It doesn't take long before the OS is struggling to scavenge more resources for your program to waste.
CPU time is just one small part of he bigger picture. :-)
Simply put:
In a threaded server, no matter how many threads you have, you can always have that many threads waiting for IO.
In node, no matter how many IO operations are pending, you always have your event loop ready to do the next thing.
When having a lot of threads you are going to have a lot of context switching which is going to be expensive. You want have this overhead when using node.js's Event loop
Context Switch
A context switch is the
computing process of storing and
restoring state (context) of a CPU so
that execution can be resumed from the
same point at a later time.
Event loop
In computer science, the event loop,
message dispatcher, message loop or
message pump is a programming
construct that waits for and
dispatches events or messages in a
program.
I think you are full of myths regarding to threads and cost of context switching.
Discover yourself the truth.