Is WebAssembly slow with big functions?

Is WebAssembly slow with big functions? - javascript

Emscripten has a flag OUTLINING_LIMIT to get smaller functions which can be optimized by the browser's JIT, unlike huge functions which get interpreted. Is this also applicable for WASM, in the sense that will WASM with big functions be slower?

The documentation you quote is inaccurate for current implementations of WebAssembly. Only Chakra has an interpreter right now, and any "hot" function gets JIT-compiled regardless of size. The JavaScriptCore implementation of WebAssembly only JIT-compiles, and "hot" functions get re-compiled at a higher optimization level.
That being said, outlining has a few advantages:
The .wasm binary can get smaller. That means it downloads faster.
In theory, engines could re-inline small outlined functions if we start seeing them a lot on the Web, so you wouldn't get a performance loss from outlining.
Big functions sometimes take longer to JIT-compile, often compilation is non-linear (though again, engines change over time and could handle large functions better if that becomes a widespread problem).
Engines often compile in parallel at a per-function boundary, so more small functions compile in parallel better and fill up the compilation pipeline more (especially towards the end of compilation, if you had only a few large functions left to compile your cores wouldn't be utilized). This is a pretty minor point, I wouldn't worry about it much.
All this is in flux though, engines implementors react to what we see on the Web, and tune the engine to better handle real-world code. It's often good to do what's right, and file bugs on each engine if you see pathologies. Here that might mean reducing download size by using outlining, and expecting good re-inlining to occur.

Related

Impact on performance with require node.js module in individual functions versus globally

I have a node.js file with several functions. Suppose each of this function require the underscore module. What is the impact on performance of adding var underS = require("underscore"); in each and every function versus declaring a single var underS = require("underscore"); globally at the top? Which is better performance?

I just want to add a generalized answer on top of a precise one.
Never let performance impact the broad organization of your codebase, unless you seriously have a good (measured) reason not to.
"Measured" in this context doesn't mean some micro-benchmark. It means profiled in the context of your actual project.
Ignoring this rule early on in my career bit me hard. It didn't aid me in writing efficient code, it simply diminished productivity to a minimum by making my codebase difficult to maintain, and difficult to stay in love with (a motivating human quality I think is often neglected to be mentioned).
So on top of the "no", you shouldn't worry about the performance of importing modules, I would suggest even more strongly, "no", you shouldn't worry about it anyway even if it did impact performance.
If you want to design for performance, performance-critical loopy code tends to make up a very small fraction of a codebase. It's tempting to think of it as a bunch of teeny inefficiencies accumulating to produce a bottleneck, but getting in the habit of profiling will quickly eliminate that superstition. In that little teeny section of your codebase, you might get very elaborate, maybe even implementing some of it in native code. And there, for this little teeny section of your codebase, you might study computer architecture, data-oriented design for cache locality, SIMD, etc. You can get extremely elaborate optimizing that little section of code which actually constitutes a profiling hotspot.
But the key to computational speed is first your own developer speed at producing and maintaining code efficiently by zooming out and focusing on more important priorities. That's what buys you time to zoom in on those little parts that actually do constitute measured hotspots, and affords you the time to optimize them as much as you want. First and foremost is productivity, and this is coming from one working in a field where competitive performance is often one of the sought-out qualities (mesh processing, raytracing, image processing, etc).

Modules are cached after the first time they are loaded. This means (among other things) that every call to require('foo') will get exactly the same object returned, if it would resolve to the same file.
So basically no, it'll not impact the performence.

Shouldn't asm.js be slow if it has to manage the heap itself?

So, I'm not sure if I'm understanding this correctly.
According to the asm.js spec, an asm.js module is given a single large typed array that acts as the heap. Why is it set up this way? Doesn't that slow things down, because the asm.js module itself has to keep track of the heap?
I'm asking because I had concerns about this when I first saw the spec but wasn't sure if I was missing something, but then found my emscripten-compiled code, which is very heavy on heap allocation/deallocation all the time, to be 10-20 times slower than native rather than the 1.5-2x slower that seems to be seen in all the benchmarks, with asm.js code being slightly slower than normal JS.

V8 javascript for 16mb ram ARM device

I'm part of a team developing embedded applications for ARM9-devices with 16mb ram and it's own OS. We're currently developing in C, but are all geared towards switching the language to something else.
Currently C++ and Haskell are good candidates, but I'm thinking on Coffee-script. The question is if Chrome's v8 engine would use to much ram for this to be a viable alternative? If so, is there any other that might fit the bill?
Forgot to mention, i need easy interop with the C libraries installed on the system. As most code we have today is C, and there will be a long re-writing period, using C functions should not be a hassle (having to create bindings etc.).
Unfortunatly, we are also bound by an old compiler (GCC 3.4.3).

Any language with automatic memory management will always have memory overhead and any dynamically typed language will always add some more overhead. So if you are limited to 16 MiB and want to squeeze out a lot of it, go with something with static typing and explicit memory management, which means C++.
Modern C++ (ok, no C++11 features in gcc 3.4.3, but the standard library was already there and boost should compile) will still do most memory management for you while still keeping the overhead low. And being almost backward compatible to C makes interoperating with existing libraries trivial.
If you don't need to squeeze out that much, many languages will do. Mono seems quite promising as it's one of the smallest managed runtimes, decently fast, portable and has multiple languages targeting it (C#, F#, boo etc.). But I suppose even JavaScript should do; it's interpreter is very small and if you don't need that many objects in memory, they will fit even with all the overhead of allocating everything separately.

Do comments affect performance?

Am I correct to say that JavaScript code isn't compiled, not even JIT? If so, does that mean that comments have an affect on performance, and I should be very careful where I put place my comments? Such as placing function comments above and outside the function definition when possible, and definitely avoid placing comments inside loops, if I wanted to maximize performance? I know that in most cases (at least in non-loop cases), the change in performance would be negligible, but I think that this would be something that is good to know and be aware of, especially for front-end/js developers. Also, a relevant question was asked on a js assessment I recently took.

Am I correct to say that JavaScript code isn't compiled, not even JIT?
No. Although JavaScript is traditionally an "interpreted" language (although it needn't necessarily be), most JavaScript engines compile it on-the-fly whenever necessary. V8 (the engine in Chrome and NodeJS) used to compile immediately and quickly, then go back and aggressively optimize any code that was used a lot (the old FullCodegen+TurboFan stack); a while back having done lots of real-world measurement, they switched to initially parsing to byteocde and interpreting, and then compiling if code is reused much at all (the new Ignition+TurboFan stack), gaining a significant memory savings by not compiling run-once setup code. Even engines that are less aggressive almost certainly at least parse the text into some form of bytecode, discarding comments early.
Remember that "interpreted" vs. "compiled" is usually more of an environmental thing than a language thing; there are C interpreters, and there are JavaScript compilers. Languages tend to be closely associated with environments (like how JavaScript tends to be associated with the web browser environment, even though it's always been used more widely than that, even back in 1995), but even then (as we've seen), there can be variation.
If so, does that mean that comments have an affect on performance...
A very, very, very minimal one, on the initial parsing stage. But comments are very easy to scan past, nothing to worry about.
If you're really worried about it, though, you can minify your script with tools like jsmin or the Closure Compiler (even with just simple optimizations). The former will just strip comments and unnecessary whitespace, stuff like that (still pretty effective); the latter does that and actually understands the code and does some inlining and such. So you can comment freely, and then use those tools to ensure that whatever minuscule impact those comments may have when the script is first loaded is bypassed by using minifying tools.
Of course, the thing about JavaScript performance is that it's hard to predict reliably cross-engine, because the engines vary so much. So experiments can be fun:
Here's an experiment which (in theory) reparses/recreates the function every time
Here's one that just parses/creates the function once and reuses it
Result? My take is that there's no discernable difference within the measurement error of the test.

The biggest effect that comments have is to bloat the file size and thereby slow down the download of the script. Hence why all professional sites use a minimizer for a productive version to cut the js down to as small as it gets.

It may have some effect. Very minimalistic effect, though (even IE6 handles comments correctly ! to be confirmed...).
However, most people use a minifier that strips off comments. So it's okay.
Also:
V8 increases performance by compiling JavaScript to native machine code before executing it.
Source

It can prevent functions from being inlined, which affects performance, though this shouldn't really happen.

In some perhaps isolated circumstances, comments definitely somehow bog down the code execution. I am writing a lengthy userscript, using in the latest Firefox on Mac using TamperMonkey, and several days' increasingly frustrated troubleshooting came to an end when I stripped the lengthy comments from the script and suddenly the script execution stopped hanging completely on Facebook. Multiple back-and-forth comparisons running the same exact script on a fresh user account, the only difference being the comments, proved this to be the case.

What optimizations do modern JavaScript engines perform?

By now, most mainstream browsers have started integrating optimizing JIT compilers to their JavaScript interpreters/virtual machines. That's good for everyone. Now, I'd be hard-pressed to know exactly which optimizations they do perform and how to best take advantage of them. What are references on optimizations in each of the major JavaScript engines?
Background:
I'm working on a compiler that generates JavaScript from a higher-level & safer language (shameless plug: it's called OPA and it's very cool) and, given the size of applications I'm generating, I'd like my JavaScript code to be as fast and as memory-efficient as possible. I can handle high-level optimizations, but I need to know more about which runtime transformations are performed, so as to know which low-level code will produce best results.
One example, from the top of my mind: the language I'm compiling will soon integrate support for laziness. Do JIT engines behave well with lazy function definitions?

This article series discusses the optimisations of V8. In summary:
It generates native machine code - not bytecode (V8 Design Elements)
Precise garbage collection (Wikipedia)
Inline caching of called methods (Wikipedia)
Storing class transition information so that objects with the same properties are grouped together (V8 Design Elements)
The first two points might not help you very much in this situation. The third might show insight into getting things cached together. The last might help you create objects with same properties so they use the same hidden classes.
This blog post discusses some of the optimisations of SquirrelFish Extreme:
Bytecode optimizations
Polymorphic inline cache (like V8)
Context threaded JIT (introduction of native machine code generation, like V8)
Regular expression JIT
TraceMonkey is optimised via tracing. I don't know much about it but it looks like it detects the type of a variable in some "hot code" (code run in loops often) and creates optimised code based on what the type of that variable is. If the type of the variable changes, it must recompile the code - based off of this, I'd say you should stay away from changing the type of a variable within a loop.

I found an additional resource:
What does V8 do with that loop?

We Keep Coding

JavaScript is the programming language of the Web.