why can't Javascript shellcode exploits be fixed via "data execution prevention"? - javascript

The "heap spraying" wikipedia article suggests that many javascript exploits involve positioning a shellcode somewhere in the script's executable code or data space memory and then having interpreter jump there and execute it. What I don't understand is, why can't the interpreter's entire heap be marked as "data" so that interpreter would be prevented from executing the shellcode by DEP? Meanwhile the execution of javascript derived bytecode would be done by virtual machine that would not allow it to modify memory belonging to the interpreter (this wouldn't work on V8 that seems to execute machine code, but probably would work on Firefox that uses some kind of bytecode).
I guess the above sounds trivial and probably something a lot like that is in fact being done. So, I am trying to understand where is the flaw in the reasoning, or the flaw in existing interpreter implementations. E.g. does the interpreter rely on system's memory allocation instead of implementing its own internal allocation when javascript asks for memory, hence making it unduly hard to separate memory belonging to interpreter and to javascript? Or why is it that the DEP based methods cannot completely eliminate shellcodes?

To answer your question we first need to define, Data Execution Prevention, Just In Time Compilation and JIT Spraying.
Data Execution Prevention is a security feature that prohibits the execution of code from a non-executable memory area. DEP can be implemented by hardware mechanisms such the NX bit and/or by software mechanism by adding runtime checks.
Just In Time (JIT) compilers are dynamic compilers that translate byte codes during run time to machine code. The goal is to combine the advantages of interpreted code and the speed of compiled code. It should compile methods only if the extra time spent in compilation can be amortized by the performance gain expected from the compiled code. [1]
JIT spraying is the process of coercing the JIT engine to write many executable pages with embedded shellcode.
[....]
For example, a Javascript statement such as “var x = 0x41414141 + 0x42424242;” might be compiled to contain two 4 byte constants in the executable image (for example, “mov eax, 0x41414141; mov ecx, 0x42424242; add eax, ecx”). By starting execution in the middle of these constants, a completely different instructions stream is revealed.
[....]
The key insight is that the JIT is predictable and must copy some constants to the executable page. Given a uniform statement (such as a long sum or any repeating pattern), those constants can encode small instructions and then control flow to the next constant's location. [2]
Advanced techniques, beyond the scope of this answer, must then be used to find the address of the JIT sprayed block and trigger the exploit.
It should now be clear that
If the attacker’s code is generated by JIT engine it will also reside in the executable area. In other words, DEP is not involved in the protection of code emitted by the JIT compiler. [3]
References
[1] A Dynamic Optimization Framework for a Java Just-in-Time Compiler
[2] Interpreter Exploitation: Pointer Inference and JIT Spraying
[3] JIT spraying and mitigations

Related

user-space specified to interpreted language

does program written in interpreted language (eg..Javascript) can have a low user space than a program written in some compiled language ?
my point is programs with compiled languages the os brings it into main memory give it Virtual space say 4 GB in contrast with program written in interpreted language who runs already on a top of process that has 4 GB memory allocated for it
to be clear enough i give you and example 2 identical programs written in c++ and the other in javascript , python whatever now the c++ program can allocate data up to 3 GB (Virtual memory after remove space specified to OS) and js programs can allocate data say 2 GB (virtual memory after removing os and interpreter headache)
is my point right ? or am i missing something ?
does the memory available for allocation for complied program the same as interpreted one ?
How much memory a program can allocate is generally independent from which language it was written in, and also independent from whether that language is typically compiled or interpreted. (Note that the latter distinction is not always clear, as some "interpreters" perform just-in-time compilation under the hood; and some languages are first compiled to a bytecode which is then interpreted.)
Some runtime environments, for example V8, impose fixed limits on the amount of memory a program can allocate. In case of V8, you can control the heap size with command-line flags. As a counter-example, as far as I know Python does not impose a fixed limit on programs (it takes system memory into account).
It's fair to assume that "interpreters" (i.e. runtime environments for "interpreted" languages) consume a bit of memory themselves, which can mean that the executed program itself has a little less memory to play with than the entire process does. However, nothing prevents "compiled" languages from performing certain allocations upon process startup (for libraries, or "system-provided" things, or whatever), so it really depends on what environments exactly you are comparing.
Also, in practice, the data structures used can vary wildly in their implementation, which can have a big impact on effective memory limits. For example, consider a "hash map": different implementations of the C++ standard library might make different tradeoffs affecting memory consumption for a given number of map entries; different JavaScript engines might have different implementations that might consume more or less memory for the same number of entries than a C++ version would.

Javascript - Compiled language?

I am new to Web development, and I am studying JavaScript.
From a course at Stanford:
JavaScript is an interpreted language, not a compiled language. A program such as C++ or Java needs to be compiled before it is run. The source code is passed through a program called a compiler, which translates it into bytecode that the machine understands and can execute. In contrast, JavaScript has no compilation step. Instead, an interpreter in the browser reads over the JavaScript code, interprets each line, and runs it. More modern browsers use a technology known as Just-In-Time (JIT) compilation, which compiles JavaScript to executable bytecode just as it is about to run.
And from You Don't Know JS: Scope & Closures by Kyle Simpson:
... but despite the fact that JavaScript falls under the general category of “dynamic” or “interpreted” languages, it is in fact a compiled language.
Let’s just say, for simplicity sake, that any snippet of JavaScript has to be compiled before (usually right before!) it’s executed. So, the JS compiler will take the program var a = 2; and compile it first, and then be ready to execute it, usually right away.
And from some questions at Stack Overflow, there are some ideas like: It depend on an actual implementation of the language.
Do you have any ideas?
Chrome browser uses V8 engine for compiling Javascript just as other browsers may use Rhino Or SpiderMonkey.
V8 is a JavaScript engine built by Google written in C++. It is used for compiling JS, in both client-side (Google Chrome) and server-side (node.js) applications. In order to obtain speed, V8 translates JavaScript code into more efficient machine code instead of using an interpreter.
V8 compiles JavaScript code into machine code at script execution by implementing a JIT (Just-In-Time) compiler like a lot of modern JavaScript engines such as SpiderMonkey or Rhino (Mozilla) are doing. The main difference with V8 is that it doesn’t produce bytecode or any intermediate code. It just compiles JavaScript on the fly.
Hope this helps!
Well, you can probably get into semantics and terminology differences, but two important points:
Javascript (in a web page) is distributed in its source code form (or at least in minimized text form) and not as a binary compiled ahead-of-time
Javascript is not compiled into executable machine code even by the browser (although some parts of it may be these days as a performance optimization), but executed via a virtual machine
The term "compiled" used in this context refers to a language that is normally assumed to be translated directly into the language or format native to the machine it is going to be run on. Typical cases include both C and C++. But, understand that both of these languages can also be interpreted. An example of a C interpreter is Pico C, which handles a subset of C.
So, the real question is about environments. A language may be run in a compiled or interpreted environment. A clear distinction between the two cases can be made by the following test:
Does the language possess a "command level" mode
in which forward references are inherently impossible?
Think about this for a moment. A language that is interpreted is reading its specification in real time. A forward reference is to something that does not exist at the time the specification is made. Since machines have not (yet) been endowed with the facility of precognition or time travel (i.e. "time loop logic"), then such references are inherently unresolvable.
If such a level is defined as a mandatory part of the language, then the language may be said to be interpreted; otherwise, it may be said to be compiled. BASIC is interpreted, as some of its commands make direct reference to this layer (e.g. the "list" command). Similarly, the high-level AI language, Prolog, is - by this criterion - an interpreted language, since it also possesses commands that make direct reference to this layer. The "?-" command, itself, is an actual prompt, for instance; but its database commands also refer to and maintain the current state of the command-level layer.
However, this does not preclude parts of an interpreted language from being subject to compilation or to the methods used by compilers, or a compiled language from being run at a command mode level. In effect, that's what a debugger for a language like C or C++ already is, just to give an example.
Most languages that are defined to have a command level layer, normally have to compile to something. In particular, if the language satisfies the following condition, then it is almost mandatory that at least parts of it compile into something:
Does the language possess a facility for user-defined codelets,
for instance: subroutines, functions, lambdas, etc.?
The reason is simple: where are you going to put that code, after it's defined before it's used, and in what format? It is extremely inefficient to save and run it verbatim, so normally it will be translated into another form that is either: (a) a language-internal normal form (which which case, the rest of the language may be considered as "syntactic sugar" for the reduced subset language that the normal forms reside in), (b) into a language-external normal form (i.e. "byte-code"), or (c) a combination of both - it may do language-internal normalization first, before translating it into byte code.
So, most "interpreted" languages are compiled - into something. The only real question is: (1) what they are compiled into, and (2) when/how does the code that it is compiled into run - which is connected to the issue of the above-mentioned "command level" mode.
If the codelets are being compiled into a target-independent form - which is what is normally what is referred to when speaking of "byte code" - then it is not "compiled" in the sense the term is normally taken to refer to. The term compiled normally refers to translation into the language that is native to the machine that the language is being run on. In that case, there will be as many translators are there are types of machines that the language may run on - the translator is inherently machine-dependent.
The two cases are not mutually exclusive. So, a byte-code translator may appear as a stage for native-code compilation, whereby the codelets of an interpreted language are translated and stored directly in the native language of the machine that the language is being run on. That's called "Just In Time" compilation (or JIT).
The distinction is a bit blurry. Even compiled languages, like C or C++, may run on systems that have codelets that are either compiled or even pre-compiled that are loaded while the program is running.
I don't know enough about JS (yet) to say anything definitive about it - other than what can be inferred from observation.
First, since JS code is stored as codelets and is normally run in web clients on a need-to-use basis, it is likely that an implementation of will compile (or pre-compile) the codelets into an intermediate byte-code form.
Second, for reasons of security, it is unlikely that it will compile directly into the native code of the machine it is running on, since this may compromise the security of the machine by providing leaks through which malicious code can be sneaked into and through. That's the "sandbox" feature that browsers are supposed to adhere to.
Third, it is not normally used directly by a person on the other end as a language like Basic or even Prolog is used. However, in many (or even most) implementations it does have a "debug" mode. The browser, for instance, may allow even an ordinary user to both view and edit/debug JS code. Notwithstanding that, there really isn't a command-layer, per se, other than what appears in a web browser itself. Unresolved here is the question of whether the browser allows forward references in JS code. If it does, then it's not really a command level environment. But it may be browser-dependent. It might, for instance, load in an entire web page before ever starting up any JS code, rather than trying to run the JS in real time while a page is loading, in which case forward references would be possible.
Fourth, if the language wants to be efficient in terms of its execution speed, it will have some form of JIT - but this would require stringent validation of the JS compiler itself to ensure that nothing can slip out of the "sandbox" through the JIT into forbidden code on the host machine.
I'm pretty sure there are JS editors/interpreters out there, simply to have a way to develop JS. But I don't know if any references to a command-layer are a mandatory part of the specification for JS. If such specifications exist, then we can call it a bona fide interpreted language. Otherwise, it straddles the border line between the two language types as a language meant to be run in real time like an interpreted language, but which permits compilation directly to the native code of the machine it is running on.
The issue came to a head, for me, recently when I tried to directly translate an old stand-by text-based game (lunar lander) directly from the (interpreted) language FOCAL into C-BC (a C-like extension to POSIX BC whose source is located on GitHub here https://github.com/RockBrentwood/CBC). C-BC, like POSIX BC, is interpreted but allows user-defined codelets, so that implementations of BC normally define a "byte code" language to go with it (historically: this was "dc").
The FOCAL language has a run-time language - which theoretically could be compiled, but also a command-layer subset (e.g. the "library" or "erase" commands) which does not permit forward references that haven't yet been defined, though the run-time language permits forward references.
Unlike GNU-BC, C-BC has goto statements and labels, so it is possible to directly translate the game. However, at the command level (which in a BC file is the top level in the file's scope), this is not possible, since the the top-level of a file's code is - as far as a BC interpreter is concerned - making reference to things that might not yet exist, since the program could just as well have been being entered by a user in real-time. Instead, the entire source would have to be enclosed into { ... } brackets - which gets compiled, in its entirety, to byte-code first, before being executed. So, that's an example of a user-defined codelet, and a text-book example of why most interpreted languages have to have some facility for compiling into something.

Javascript Performance: While vs For Loops

The other day during a tech interview, one of the question asked was "how can you optimize Javascript code"?
To my own surprise, he told me that while loops were usually faster than for loops.
Is that even true? And if yes, why is that?
You should have countered that a negative while loop would be even faster! See: JavaScript loop performance - Why is to decrement the iterator toward 0 faster than incrementing.
In while versus for, these two sources document the speed phenomenon pretty well by running various loops in different browsers and comparing the results in milliseconds:
https://blogs.oracle.com/greimer/entry/best_way_to_code_a and:
http://www.stoimen.com/blog/2012/01/24/javascript-performance-for-vs-while/.
Conceptually, a for loop is basically a packaged while loop that is specifically geared towards incrementing or decrementing (progressing over the logic according to some order or some length). For example,
for (let k = 0; k < 20; ++k) {…}
can be sped up by making it a negative while loop:
var k = 20;
while (--k) {…}
and as you can see from the measurements in the links above, the time saved really does add up for very large numbers.
While this is a great answer in minute detection of speed and efficiency I'd have to digress to #Pointy original statement.
The right answer would have been that it's generally pointless to
worry about such minutia, since any effort you put into such
optimizations could be rendered a complete waste by the next checkin
to V8 or SpiderMonkey
Since Javascript is client side determined and was originally having to be coded per browser for full cross-browser compatibility (back before ECMA was even involved it was worse) the speed difference may not even be a logical answer at this point due to the significant optimization and adoption of Javascript on browsers and their compiler engines.
We're not even talking about about non-strict script only writing such as applications in GAS, so while the answers and questions are fun they would most likely be more trivial than useful in real world application.
To expound on this topic you first need to understand where this topic is originally coming from and compiling vs interpreting. Let's take a brief history of the evolution of languages and then jump back to compiling vs interpreting. While not required reading you can just read Compiling vs Interpeting for the quick answer but for in-depth understanding I'd recommend reading through both Compiling vs Interpreting and the Evolution of Programming (showing how they are applied today).
COMPILING VS INTERPRETING
Compiled language coding is a method of programming in which you write your code in a compilable manner that a compiler understands, some of the more recognized languages today are Java, C++ and C#. These languages are written with the intent that a compiler program then translates the code into the machine code or bytecode used by your target machine.
Interpreted code
is code that is processed Just In Time (JIT) at the time of the execution without compiling first, it skips this step and allows for quicker writing, debugging, additions/changes, etc. It also will never store the script's interpretation for future use, it will re-interpret the script each time a method is called. The interpreted code is ran within a defined and intended program runtime environment (for javascript is usually a browser) to which once interpreted by the environment is then output to the desired result. Interpreted scripts are never meant to be stand-alone software and are always looking to plug into a valid runtime environment to be interpreted. This is why a script is not executable. They'll never communicate directly a operating system. If you look at the system processes occurring you'll never see your script being processed, instead you see the program being processed which is processing your script in its runtime environment.
So writing a hello script in Javascript means that the browser interprets the code, defines what hello is and while this occurs the browser is translating this code back down to machine level code saying I have this script and my environment wants to display the word hello so the machine then processes that into a visual representation of your script. It's a constant process, which why you have processors in computers and a constant action of processing occurring on the system. Nothing is ever static, processes are constantly being performed no matter the situation.
Compilers
usually compile the code into a defined bytecode system, or machine code language, that is now a static version of your code. It will not be re-interpreted by the machine unless the source code is recompiled. This is why you will see a runtime error post compilation which a programmer then has to debug in the source and recompile. Interpreter intended scripts (like Javascript or PHP) are simply instructions not compiled before being ran so the source code is easily edited and fixed without the need for additional compiling steps as the compilation is done in real-time.
Not All Compiled Code is Created Equal
An easy way to illustrate this is video game systems. The Playstation vs Xbox. Xbox system are built to support the .net framework to optimize coding and development. C# utilizes this framework in conjunction with a Common Language Runtime in order to compile the code into bytecode. Bytecode is not a strict definition of compiled code, it's a intermediate step placed in the process that allows the writing of code quicker and on a grander scale for programs, that is then interpreted when the code is executed at runtime using, you guessed it, Just In Time (JIT). The difference is this code is only interpreted once, once compiled the program will not re-interpret that code again unless restarted.
Interpreted script languages will never compile the code, so a function in an interpreted script is constantly being re-processed while a compiled bytecode's function is interpreted once and the instructions are stored until the program's runtime is stopped. The benefit is that the bytecode can be ported to another machine's architecture provided you have the necessary resources in place. This is why you have to install .net and possibly updates and frameworks to your system in order for a program to work correctly.
The Playstation does not use a .net framework for its machine. You will need to code in C++, C++ is meant to be compiled and assembled for a particular system architecture. The code will never be interpreted and will need to be exactly correct in order to run. You can never easily move this type language like you could an intermediate language. It's made specifically for that machine's architecture and will never be interpreted otherwise.
So you see even compiled languages are not inherently finalized versions of a compiled language. Compiled languages are meant, in their strict definition, to be compiled fully for use. Interpreted languages are meant to be interpreted by a program but are also the most portable languages in programming due to only needing a program installed that understand the script but they also use the most resources due to constantly being interpreted. Intermediate languages (such as Java and C#) are hybrids of these 2, compiling in part but also requiring outside resources in order to still be functional. Once ran they then compile again, which is a one time interpretation while in runtime.
Evolution of Programming
Machine Code
The lowest form of coding, this code is strictly binary in its representation (I won't get into ternary computation as it's based on theory and practical application for this discussion). Computers understand the natural values, on/off true/false. This is machine level numerical code, which is different from the next level, assembly code.
Assembly Code
The direct next level of code is assembly language. This is the first point in which a language is interpreted to be used by a machine. This code is meant to interpret mnemonics, symbols and operands that are then sent to the machine in machine level code. This is important to understand because when you first start programming most people make the assumption it's either this or that meaning either I compile or interpret. No coding language beyond low level machine code is either compile only instructions or interpret only instructions!!!
We went over this in "Not All Compiled Code is Created Equal". Assembly language is the first instance of this. Machine code is what the machine reads but assembly language is what a human can read. As computers process faster, through better technological advancements, our lower level languages begin to become more condensed in nature and not needed to be manually implemented. Assembly language used to be the high level coding language as it was the quicker method to coding a machine. It was essentially a syntax language that once assembled (the lowest form of compiling) directly converted to machine language. An assembler is a compiler but not all compilers are assemblers.
High Level Coding
High level coding languages are languages that are one step above assembly but may even contain an even higher level (this would be Bytecode/Intermediate languages). These languages are compiled from there defined syntax structure into either the machine code needed, the bytecode to be interpreted or a hybrid of either of the previous method combined with a special compiler that allows for assembly to be written inline. High level coding like it's predecessor, Assembly, is meant to reduce the workload of the developer and remove any chance for critical errors in redundant tasks, like the building of executable programs. In today's world rarely will you see a developer work in assembly for the sake of crunching data in for the benefit of size alone. More often than a developer may have a situation, like in video game console development, where they need a speed increase in the process. Because high level coding compilers are tools that seek to ease the development process they may not 100% of the compile the code in the most efficient manner for that system architecture. In that case Assembly code would be written to maximize the system's resources. But you'll never see a person writing in machine code, unless you just meet a weirdo.
THE SUMMARY
If you made it this far, congratulations! You just listened more in one sitting than my wife can, about this stuff, for a lifetime. The OP's question was about performance of while vs for loops. The reason this is a moot point in today's standards is two-fold.
Reason One
The days of interpreting Javascript are gone. All major browsers (Yes, even Opera and Netscape) utilize a Javascript Engine that is made to compile the script before implementing it. The performance tweaks discussed by JS developers in terms of non-call out methods are obsolete methods of study when looking at native functions within the language. The code is already compiled and optimized for that before ever being a part of the DOM. It's not interpreted again while that page is up because that page is the runtime environment. Javascript has really become an intermediate language more so than interpreted script. The reason it will never be called an intermediate scripting language is because Javascript is never compiled. That's the only reason. Besides that it's function in a browser environment a minified version of what happens with Bytecode.
Reason Two
The chances of you writing a script, or library of scripts, that would ever require as much processing power as an desktop application on a website is almost nill. Why? Because Javascript was never created with the intent to be an all encompassing language. It's creation was simply to provide a medium level language programming method that would allow processes to be done that weren't provided by HTML and CSS, while the alleviating development struggles of requiring dedicated high level coding languages, specifically Java.
CSS and JS was not supported for most of the early ages of web development. Till around 1997 CSS was not a safe integration and JS fought even longer. Everything besides HTML is a supplemental language in the web world.
HTML is specific for being the building blocks for a site. You'd never write javascript to fully frame a website. At most you'd do DOM manipulation but building a site.
You'd never style your site in JS as it's just not practical. CSS handles that process.
You'd never store, besides temporarily, using Javascript. You'd use a database.
So what are we left with then? Increasingly just functions and processes. CSS3 and its future iterations are going to take all methods of styling from Javascript. You see that already with animations and psuedo states(hover, active, etc.).
The only valid argument of optimization of code in Javascript at this point is for badly written functions, methods and operations that could be helped by optimization of the user's formula/code pattern. As long as you learn proper and efficient coding patterns Javascript, in today's age, has no loss of performance from its native functions.
for(var k=0; ++k; k< 20){ ... }
can be sped up by making it a negative
while loop:
var k = 20; while(--k){ ... };
A more accurate test would be to use for to the same extent as while. The only difference will be that using for loops offers more description. If we wanted to be super crazy we could just forgo the entire block;
var k = 0;
for(;;){doStuff till break}
//or we could do everything
for (var i=1, d=i*2, f=Math.pow(d, i); f < 1E9; i++, d=i*2, f=Math.pow(d,i)){console.log(f)}
Either way...in NodeJS v0.10.38 I'm handling a JavaScript loop of 109 in a quarter second with for being on average about 13% faster. But that really has no affect on my future decisions with which loop to use or the amount I choose to describe in a loop.
> t=Date.now();i=1E9;
> while(i){--i;b=i+1}console.log(Date.now()-t);
292
> t=Date.now();i=1E9;
> while(--i){b=i+1}console.log(Date.now()-t);
285
> t=Date.now();i=1E9;
> for(;i>0;--i){b=i+1}console.log(Date.now()-t);
265
> t=Date.now();i=1E9;
> for(;i>0;){--i;b=i+1}console.log(Date.now()-t);
246
2016 Answer
In JavaScript the reverse for loop is the fastest. For loops are trivially faster than while loops. Be more focused on readability.
Here is some bench marking.
The following loops where tested:
var i,
len = 100000,
lenRev = len - 1;
i = 0;
while (i < len) {
1 + 1;
i += 1;
}
i = lenRev;
while (-1 < i) {
1 + 1;
i -= 1;
}
for (i = 0; i < len; i += 1) {
1 + 1;
}
for (i = lenRev; - 1 < i; i -= 1) {
1 + 1;
}
2017 Answer
jsperf for vs foreach on Chrome 59
Here you can see Array.forEach has become fastest on the latest version of Chrome (59) as of the date written (7/31/17). You can find average times for other browser versions here: https://jsperf.com/for-vs-foreach/66.
This shows to prove that ES engine optimization changes what is more efficient at any time.
My recommendation is that you use whichever is more expressive for your use case.
Performance differences within the same magnitude will mostly be irrelevant in the future as computers become faster exponentially by Moore's Law.

Microsoft says IE9 has Parallel Javascript Rendering and Execution

The new JavaScript engine takes advantage of multiple CPU cores through Windows to interpret, compile, and run code in parallel. - http://technet.microsoft.com/en-us/library/gg699435.aspx
and
The Chakra engine interprets, compiles, and executes code in parallel and takes advantage of multiple CPU cores, when available. - http://msdn.microsoft.com/en-us/ie/ff468705.aspx
Wait, what?!? Does this mean we've got multi-threaded parallel JavaScript code execution (outside of web-workers) in IE9?
I'm thinking this is just a bad marketing gimmick but would like to see some more info on this. Maybe they mean different browser windows/tabs/processes can utilize multiple CPUs?
Conclusions, based largely on the comments and hence provided as a community wiki answer so that this question ends up with an actual answer:
It's likely that Microsoft mean that the separate tasks of (i) interpreting and/or running; and (ii) compiling occur in parallel. It's probable that they've applied technology like Sun's old HotSpot JVM so that the Javascript virtual machine interprets code at the first instance because it can start doing that instantly. It also JIT compiles any code that appears to be used sufficiently frequently for doing so to be a benefit. It may even have different levels of compiler optimisation that it slowly dials up. In that case it may be using multiple cores to interpret or run one fragment of code while also compiling arbitrarily many others, or even while recompiling and better optimising the same piece of code that is being run.
However, it's also possible on a technical level that you could perform static analysis to determine when callbacks are mutually independent in terms of state, and allow those callbacks to execute in parallel if the triggering events prompted them to do so. In that way a Javascript virtual machine could actually interpret/run code in parallel without affecting the semantically serial nature of the language. Such a system would be logically similar to the operation of superscalar CPUs, albeit at a much greater remove and with significantly greater complexity.

What optimizations do modern JavaScript engines perform?

By now, most mainstream browsers have started integrating optimizing JIT compilers to their JavaScript interpreters/virtual machines. That's good for everyone. Now, I'd be hard-pressed to know exactly which optimizations they do perform and how to best take advantage of them. What are references on optimizations in each of the major JavaScript engines?
Background:
I'm working on a compiler that generates JavaScript from a higher-level & safer language (shameless plug: it's called OPA and it's very cool) and, given the size of applications I'm generating, I'd like my JavaScript code to be as fast and as memory-efficient as possible. I can handle high-level optimizations, but I need to know more about which runtime transformations are performed, so as to know which low-level code will produce best results.
One example, from the top of my mind: the language I'm compiling will soon integrate support for laziness. Do JIT engines behave well with lazy function definitions?
This article series discusses the optimisations of V8. In summary:
It generates native machine code - not bytecode (V8 Design Elements)
Precise garbage collection (Wikipedia)
Inline caching of called methods (Wikipedia)
Storing class transition information so that objects with the same properties are grouped together (V8 Design Elements)
The first two points might not help you very much in this situation. The third might show insight into getting things cached together. The last might help you create objects with same properties so they use the same hidden classes.
This blog post discusses some of the optimisations of SquirrelFish Extreme:
Bytecode optimizations
Polymorphic inline cache (like V8)
Context threaded JIT (introduction of native machine code generation, like V8)
Regular expression JIT
TraceMonkey is optimised via tracing. I don't know much about it but it looks like it detects the type of a variable in some "hot code" (code run in loops often) and creates optimised code based on what the type of that variable is. If the type of the variable changes, it must recompile the code - based off of this, I'd say you should stay away from changing the type of a variable within a loop.
I found an additional resource:
What does V8 do with that loop?

Categories