Javascript Performance: While vs For Loops - javascript

The other day during a tech interview, one of the question asked was "how can you optimize Javascript code"?
To my own surprise, he told me that while loops were usually faster than for loops.
Is that even true? And if yes, why is that?

You should have countered that a negative while loop would be even faster! See: JavaScript loop performance - Why is to decrement the iterator toward 0 faster than incrementing.
In while versus for, these two sources document the speed phenomenon pretty well by running various loops in different browsers and comparing the results in milliseconds:
https://blogs.oracle.com/greimer/entry/best_way_to_code_a and:
http://www.stoimen.com/blog/2012/01/24/javascript-performance-for-vs-while/.
Conceptually, a for loop is basically a packaged while loop that is specifically geared towards incrementing or decrementing (progressing over the logic according to some order or some length). For example,
for (let k = 0; k < 20; ++k) {…}
can be sped up by making it a negative while loop:
var k = 20;
while (--k) {…}
and as you can see from the measurements in the links above, the time saved really does add up for very large numbers.

While this is a great answer in minute detection of speed and efficiency I'd have to digress to #Pointy original statement.
The right answer would have been that it's generally pointless to
worry about such minutia, since any effort you put into such
optimizations could be rendered a complete waste by the next checkin
to V8 or SpiderMonkey
Since Javascript is client side determined and was originally having to be coded per browser for full cross-browser compatibility (back before ECMA was even involved it was worse) the speed difference may not even be a logical answer at this point due to the significant optimization and adoption of Javascript on browsers and their compiler engines.
We're not even talking about about non-strict script only writing such as applications in GAS, so while the answers and questions are fun they would most likely be more trivial than useful in real world application.
To expound on this topic you first need to understand where this topic is originally coming from and compiling vs interpreting. Let's take a brief history of the evolution of languages and then jump back to compiling vs interpreting. While not required reading you can just read Compiling vs Interpeting for the quick answer but for in-depth understanding I'd recommend reading through both Compiling vs Interpreting and the Evolution of Programming (showing how they are applied today).
COMPILING VS INTERPRETING
Compiled language coding is a method of programming in which you write your code in a compilable manner that a compiler understands, some of the more recognized languages today are Java, C++ and C#. These languages are written with the intent that a compiler program then translates the code into the machine code or bytecode used by your target machine.
Interpreted code
is code that is processed Just In Time (JIT) at the time of the execution without compiling first, it skips this step and allows for quicker writing, debugging, additions/changes, etc. It also will never store the script's interpretation for future use, it will re-interpret the script each time a method is called. The interpreted code is ran within a defined and intended program runtime environment (for javascript is usually a browser) to which once interpreted by the environment is then output to the desired result. Interpreted scripts are never meant to be stand-alone software and are always looking to plug into a valid runtime environment to be interpreted. This is why a script is not executable. They'll never communicate directly a operating system. If you look at the system processes occurring you'll never see your script being processed, instead you see the program being processed which is processing your script in its runtime environment.
So writing a hello script in Javascript means that the browser interprets the code, defines what hello is and while this occurs the browser is translating this code back down to machine level code saying I have this script and my environment wants to display the word hello so the machine then processes that into a visual representation of your script. It's a constant process, which why you have processors in computers and a constant action of processing occurring on the system. Nothing is ever static, processes are constantly being performed no matter the situation.
Compilers
usually compile the code into a defined bytecode system, or machine code language, that is now a static version of your code. It will not be re-interpreted by the machine unless the source code is recompiled. This is why you will see a runtime error post compilation which a programmer then has to debug in the source and recompile. Interpreter intended scripts (like Javascript or PHP) are simply instructions not compiled before being ran so the source code is easily edited and fixed without the need for additional compiling steps as the compilation is done in real-time.
Not All Compiled Code is Created Equal
An easy way to illustrate this is video game systems. The Playstation vs Xbox. Xbox system are built to support the .net framework to optimize coding and development. C# utilizes this framework in conjunction with a Common Language Runtime in order to compile the code into bytecode. Bytecode is not a strict definition of compiled code, it's a intermediate step placed in the process that allows the writing of code quicker and on a grander scale for programs, that is then interpreted when the code is executed at runtime using, you guessed it, Just In Time (JIT). The difference is this code is only interpreted once, once compiled the program will not re-interpret that code again unless restarted.
Interpreted script languages will never compile the code, so a function in an interpreted script is constantly being re-processed while a compiled bytecode's function is interpreted once and the instructions are stored until the program's runtime is stopped. The benefit is that the bytecode can be ported to another machine's architecture provided you have the necessary resources in place. This is why you have to install .net and possibly updates and frameworks to your system in order for a program to work correctly.
The Playstation does not use a .net framework for its machine. You will need to code in C++, C++ is meant to be compiled and assembled for a particular system architecture. The code will never be interpreted and will need to be exactly correct in order to run. You can never easily move this type language like you could an intermediate language. It's made specifically for that machine's architecture and will never be interpreted otherwise.
So you see even compiled languages are not inherently finalized versions of a compiled language. Compiled languages are meant, in their strict definition, to be compiled fully for use. Interpreted languages are meant to be interpreted by a program but are also the most portable languages in programming due to only needing a program installed that understand the script but they also use the most resources due to constantly being interpreted. Intermediate languages (such as Java and C#) are hybrids of these 2, compiling in part but also requiring outside resources in order to still be functional. Once ran they then compile again, which is a one time interpretation while in runtime.
Evolution of Programming
Machine Code
The lowest form of coding, this code is strictly binary in its representation (I won't get into ternary computation as it's based on theory and practical application for this discussion). Computers understand the natural values, on/off true/false. This is machine level numerical code, which is different from the next level, assembly code.
Assembly Code
The direct next level of code is assembly language. This is the first point in which a language is interpreted to be used by a machine. This code is meant to interpret mnemonics, symbols and operands that are then sent to the machine in machine level code. This is important to understand because when you first start programming most people make the assumption it's either this or that meaning either I compile or interpret. No coding language beyond low level machine code is either compile only instructions or interpret only instructions!!!
We went over this in "Not All Compiled Code is Created Equal". Assembly language is the first instance of this. Machine code is what the machine reads but assembly language is what a human can read. As computers process faster, through better technological advancements, our lower level languages begin to become more condensed in nature and not needed to be manually implemented. Assembly language used to be the high level coding language as it was the quicker method to coding a machine. It was essentially a syntax language that once assembled (the lowest form of compiling) directly converted to machine language. An assembler is a compiler but not all compilers are assemblers.
High Level Coding
High level coding languages are languages that are one step above assembly but may even contain an even higher level (this would be Bytecode/Intermediate languages). These languages are compiled from there defined syntax structure into either the machine code needed, the bytecode to be interpreted or a hybrid of either of the previous method combined with a special compiler that allows for assembly to be written inline. High level coding like it's predecessor, Assembly, is meant to reduce the workload of the developer and remove any chance for critical errors in redundant tasks, like the building of executable programs. In today's world rarely will you see a developer work in assembly for the sake of crunching data in for the benefit of size alone. More often than a developer may have a situation, like in video game console development, where they need a speed increase in the process. Because high level coding compilers are tools that seek to ease the development process they may not 100% of the compile the code in the most efficient manner for that system architecture. In that case Assembly code would be written to maximize the system's resources. But you'll never see a person writing in machine code, unless you just meet a weirdo.
THE SUMMARY
If you made it this far, congratulations! You just listened more in one sitting than my wife can, about this stuff, for a lifetime. The OP's question was about performance of while vs for loops. The reason this is a moot point in today's standards is two-fold.
Reason One
The days of interpreting Javascript are gone. All major browsers (Yes, even Opera and Netscape) utilize a Javascript Engine that is made to compile the script before implementing it. The performance tweaks discussed by JS developers in terms of non-call out methods are obsolete methods of study when looking at native functions within the language. The code is already compiled and optimized for that before ever being a part of the DOM. It's not interpreted again while that page is up because that page is the runtime environment. Javascript has really become an intermediate language more so than interpreted script. The reason it will never be called an intermediate scripting language is because Javascript is never compiled. That's the only reason. Besides that it's function in a browser environment a minified version of what happens with Bytecode.
Reason Two
The chances of you writing a script, or library of scripts, that would ever require as much processing power as an desktop application on a website is almost nill. Why? Because Javascript was never created with the intent to be an all encompassing language. It's creation was simply to provide a medium level language programming method that would allow processes to be done that weren't provided by HTML and CSS, while the alleviating development struggles of requiring dedicated high level coding languages, specifically Java.
CSS and JS was not supported for most of the early ages of web development. Till around 1997 CSS was not a safe integration and JS fought even longer. Everything besides HTML is a supplemental language in the web world.
HTML is specific for being the building blocks for a site. You'd never write javascript to fully frame a website. At most you'd do DOM manipulation but building a site.
You'd never style your site in JS as it's just not practical. CSS handles that process.
You'd never store, besides temporarily, using Javascript. You'd use a database.
So what are we left with then? Increasingly just functions and processes. CSS3 and its future iterations are going to take all methods of styling from Javascript. You see that already with animations and psuedo states(hover, active, etc.).
The only valid argument of optimization of code in Javascript at this point is for badly written functions, methods and operations that could be helped by optimization of the user's formula/code pattern. As long as you learn proper and efficient coding patterns Javascript, in today's age, has no loss of performance from its native functions.

for(var k=0; ++k; k< 20){ ... }
can be sped up by making it a negative
while loop:
var k = 20; while(--k){ ... };
A more accurate test would be to use for to the same extent as while. The only difference will be that using for loops offers more description. If we wanted to be super crazy we could just forgo the entire block;
var k = 0;
for(;;){doStuff till break}
//or we could do everything
for (var i=1, d=i*2, f=Math.pow(d, i); f < 1E9; i++, d=i*2, f=Math.pow(d,i)){console.log(f)}
Either way...in NodeJS v0.10.38 I'm handling a JavaScript loop of 109 in a quarter second with for being on average about 13% faster. But that really has no affect on my future decisions with which loop to use or the amount I choose to describe in a loop.
> t=Date.now();i=1E9;
> while(i){--i;b=i+1}console.log(Date.now()-t);
292
> t=Date.now();i=1E9;
> while(--i){b=i+1}console.log(Date.now()-t);
285
> t=Date.now();i=1E9;
> for(;i>0;--i){b=i+1}console.log(Date.now()-t);
265
> t=Date.now();i=1E9;
> for(;i>0;){--i;b=i+1}console.log(Date.now()-t);
246

2016 Answer
In JavaScript the reverse for loop is the fastest. For loops are trivially faster than while loops. Be more focused on readability.
Here is some bench marking.
The following loops where tested:
var i,
len = 100000,
lenRev = len - 1;
i = 0;
while (i < len) {
1 + 1;
i += 1;
}
i = lenRev;
while (-1 < i) {
1 + 1;
i -= 1;
}
for (i = 0; i < len; i += 1) {
1 + 1;
}
for (i = lenRev; - 1 < i; i -= 1) {
1 + 1;
}

2017 Answer
jsperf for vs foreach on Chrome 59
Here you can see Array.forEach has become fastest on the latest version of Chrome (59) as of the date written (7/31/17). You can find average times for other browser versions here: https://jsperf.com/for-vs-foreach/66.
This shows to prove that ES engine optimization changes what is more efficient at any time.
My recommendation is that you use whichever is more expressive for your use case.
Performance differences within the same magnitude will mostly be irrelevant in the future as computers become faster exponentially by Moore's Law.

Related

Does Javascript simplify statements before execution?

I have recently been working on a small to medium sized game in JavaScript to familiarize myself with it and come from a background in C and Java. In my game I am using a few constant numbers to render all of my objects and I was wondering about the performance impact those would have. Both C and Java are languages with compilers that will automatically simplify simple statements such as 2 + 3 into 5 before the code is actually run. This is great because it means I can make my code more readable by having something like WINDOW_MAX_Y - SHIP_Y in my code and understand what it does later. Looking over my code recently however, I found a few lines of code that become very long when written in this style, and because JavaScript does not have a compiler I was wondering whether it would improve performance if I simplified everything into "magic numbers" or if JavaScript had some function that would automatically simplify the lines before execution like the C compiler. I would like to keep my code readable, however, if JavaScript does not simplify those statements I would love to know as they are run thousands of times per second.
Not necessarily before execution but when the interpreter detects that the code is being used heavily.
You mention Java and this is good because I think you may be familiar with the concept since Java was one of the first languages to implement it. It's called just-in-time compilation (or JIT for short).
There is no standard that covers how JIT is to be implemented in Javascript. And there is no standard that mandates that a javascript interpreter must implement JIT (this is exactly the same for Java by the way, it is perfectly legal to implement a JVM that does not implement a JIT compiler). However, market forces have ensured that all current major javascript interpreters have implemented JIT (which may seem bit odd since all major javascript interpreters are free but they are competing for market share, not profit).
Similarly you mentioned C, which is nice because it also means you should be familiar with another concept. In C there is no standard defining how a = 1 + 2 should be compiled to. There is a concept in the C standard that specifies the behaviour of such optimisations though: the as-if rule. It basically states that if a complier was to implement optimisations, the end result must be exactly the same as if there was no optimisation.
All major javascript interpreters currently have most of the common optimisations that compilers typically have. They are all slightly different (just as all C compilers are slightly different) but they all compile code just-in-time. In addition, most interpreters also perform optimisations during compilation to byte code.
Yes, there is byte code, just like Java. While there is no compilation step performed by the programmer, all current javascript interpreters compile to bytecode before execution just like Java. This is true for all modern "scripting" languages such as Python and Ruby.
My advice is basically if you don't worry about such things in C or Java then you should not need to worry about it in javascript. For things that you need to worry about in C (such as micro-optimisations) there is a subset of javascript called asm.js that is guaranteed to be compiled down to machine code instead of bytecode (if set up correctly).

why can't Javascript shellcode exploits be fixed via "data execution prevention"?

The "heap spraying" wikipedia article suggests that many javascript exploits involve positioning a shellcode somewhere in the script's executable code or data space memory and then having interpreter jump there and execute it. What I don't understand is, why can't the interpreter's entire heap be marked as "data" so that interpreter would be prevented from executing the shellcode by DEP? Meanwhile the execution of javascript derived bytecode would be done by virtual machine that would not allow it to modify memory belonging to the interpreter (this wouldn't work on V8 that seems to execute machine code, but probably would work on Firefox that uses some kind of bytecode).
I guess the above sounds trivial and probably something a lot like that is in fact being done. So, I am trying to understand where is the flaw in the reasoning, or the flaw in existing interpreter implementations. E.g. does the interpreter rely on system's memory allocation instead of implementing its own internal allocation when javascript asks for memory, hence making it unduly hard to separate memory belonging to interpreter and to javascript? Or why is it that the DEP based methods cannot completely eliminate shellcodes?
To answer your question we first need to define, Data Execution Prevention, Just In Time Compilation and JIT Spraying.
Data Execution Prevention is a security feature that prohibits the execution of code from a non-executable memory area. DEP can be implemented by hardware mechanisms such the NX bit and/or by software mechanism by adding runtime checks.
Just In Time (JIT) compilers are dynamic compilers that translate byte codes during run time to machine code. The goal is to combine the advantages of interpreted code and the speed of compiled code. It should compile methods only if the extra time spent in compilation can be amortized by the performance gain expected from the compiled code. [1]
JIT spraying is the process of coercing the JIT engine to write many executable pages with embedded shellcode.
[....]
For example, a Javascript statement such as “var x = 0x41414141 + 0x42424242;” might be compiled to contain two 4 byte constants in the executable image (for example, “mov eax, 0x41414141; mov ecx, 0x42424242; add eax, ecx”). By starting execution in the middle of these constants, a completely different instructions stream is revealed.
[....]
The key insight is that the JIT is predictable and must copy some constants to the executable page. Given a uniform statement (such as a long sum or any repeating pattern), those constants can encode small instructions and then control flow to the next constant's location. [2]
Advanced techniques, beyond the scope of this answer, must then be used to find the address of the JIT sprayed block and trigger the exploit.
It should now be clear that
If the attacker’s code is generated by JIT engine it will also reside in the executable area. In other words, DEP is not involved in the protection of code emitted by the JIT compiler. [3]
References
[1] A Dynamic Optimization Framework for a Java Just-in-Time Compiler
[2] Interpreter Exploitation: Pointer Inference and JIT Spraying
[3] JIT spraying and mitigations

Javascript - Compiled language?

I am new to Web development, and I am studying JavaScript.
From a course at Stanford:
JavaScript is an interpreted language, not a compiled language. A program such as C++ or Java needs to be compiled before it is run. The source code is passed through a program called a compiler, which translates it into bytecode that the machine understands and can execute. In contrast, JavaScript has no compilation step. Instead, an interpreter in the browser reads over the JavaScript code, interprets each line, and runs it. More modern browsers use a technology known as Just-In-Time (JIT) compilation, which compiles JavaScript to executable bytecode just as it is about to run.
And from You Don't Know JS: Scope & Closures by Kyle Simpson:
... but despite the fact that JavaScript falls under the general category of “dynamic” or “interpreted” languages, it is in fact a compiled language.
Let’s just say, for simplicity sake, that any snippet of JavaScript has to be compiled before (usually right before!) it’s executed. So, the JS compiler will take the program var a = 2; and compile it first, and then be ready to execute it, usually right away.
And from some questions at Stack Overflow, there are some ideas like: It depend on an actual implementation of the language.
Do you have any ideas?
Chrome browser uses V8 engine for compiling Javascript just as other browsers may use Rhino Or SpiderMonkey.
V8 is a JavaScript engine built by Google written in C++. It is used for compiling JS, in both client-side (Google Chrome) and server-side (node.js) applications. In order to obtain speed, V8 translates JavaScript code into more efficient machine code instead of using an interpreter.
V8 compiles JavaScript code into machine code at script execution by implementing a JIT (Just-In-Time) compiler like a lot of modern JavaScript engines such as SpiderMonkey or Rhino (Mozilla) are doing. The main difference with V8 is that it doesn’t produce bytecode or any intermediate code. It just compiles JavaScript on the fly.
Hope this helps!
Well, you can probably get into semantics and terminology differences, but two important points:
Javascript (in a web page) is distributed in its source code form (or at least in minimized text form) and not as a binary compiled ahead-of-time
Javascript is not compiled into executable machine code even by the browser (although some parts of it may be these days as a performance optimization), but executed via a virtual machine
The term "compiled" used in this context refers to a language that is normally assumed to be translated directly into the language or format native to the machine it is going to be run on. Typical cases include both C and C++. But, understand that both of these languages can also be interpreted. An example of a C interpreter is Pico C, which handles a subset of C.
So, the real question is about environments. A language may be run in a compiled or interpreted environment. A clear distinction between the two cases can be made by the following test:
Does the language possess a "command level" mode
in which forward references are inherently impossible?
Think about this for a moment. A language that is interpreted is reading its specification in real time. A forward reference is to something that does not exist at the time the specification is made. Since machines have not (yet) been endowed with the facility of precognition or time travel (i.e. "time loop logic"), then such references are inherently unresolvable.
If such a level is defined as a mandatory part of the language, then the language may be said to be interpreted; otherwise, it may be said to be compiled. BASIC is interpreted, as some of its commands make direct reference to this layer (e.g. the "list" command). Similarly, the high-level AI language, Prolog, is - by this criterion - an interpreted language, since it also possesses commands that make direct reference to this layer. The "?-" command, itself, is an actual prompt, for instance; but its database commands also refer to and maintain the current state of the command-level layer.
However, this does not preclude parts of an interpreted language from being subject to compilation or to the methods used by compilers, or a compiled language from being run at a command mode level. In effect, that's what a debugger for a language like C or C++ already is, just to give an example.
Most languages that are defined to have a command level layer, normally have to compile to something. In particular, if the language satisfies the following condition, then it is almost mandatory that at least parts of it compile into something:
Does the language possess a facility for user-defined codelets,
for instance: subroutines, functions, lambdas, etc.?
The reason is simple: where are you going to put that code, after it's defined before it's used, and in what format? It is extremely inefficient to save and run it verbatim, so normally it will be translated into another form that is either: (a) a language-internal normal form (which which case, the rest of the language may be considered as "syntactic sugar" for the reduced subset language that the normal forms reside in), (b) into a language-external normal form (i.e. "byte-code"), or (c) a combination of both - it may do language-internal normalization first, before translating it into byte code.
So, most "interpreted" languages are compiled - into something. The only real question is: (1) what they are compiled into, and (2) when/how does the code that it is compiled into run - which is connected to the issue of the above-mentioned "command level" mode.
If the codelets are being compiled into a target-independent form - which is what is normally what is referred to when speaking of "byte code" - then it is not "compiled" in the sense the term is normally taken to refer to. The term compiled normally refers to translation into the language that is native to the machine that the language is being run on. In that case, there will be as many translators are there are types of machines that the language may run on - the translator is inherently machine-dependent.
The two cases are not mutually exclusive. So, a byte-code translator may appear as a stage for native-code compilation, whereby the codelets of an interpreted language are translated and stored directly in the native language of the machine that the language is being run on. That's called "Just In Time" compilation (or JIT).
The distinction is a bit blurry. Even compiled languages, like C or C++, may run on systems that have codelets that are either compiled or even pre-compiled that are loaded while the program is running.
I don't know enough about JS (yet) to say anything definitive about it - other than what can be inferred from observation.
First, since JS code is stored as codelets and is normally run in web clients on a need-to-use basis, it is likely that an implementation of will compile (or pre-compile) the codelets into an intermediate byte-code form.
Second, for reasons of security, it is unlikely that it will compile directly into the native code of the machine it is running on, since this may compromise the security of the machine by providing leaks through which malicious code can be sneaked into and through. That's the "sandbox" feature that browsers are supposed to adhere to.
Third, it is not normally used directly by a person on the other end as a language like Basic or even Prolog is used. However, in many (or even most) implementations it does have a "debug" mode. The browser, for instance, may allow even an ordinary user to both view and edit/debug JS code. Notwithstanding that, there really isn't a command-layer, per se, other than what appears in a web browser itself. Unresolved here is the question of whether the browser allows forward references in JS code. If it does, then it's not really a command level environment. But it may be browser-dependent. It might, for instance, load in an entire web page before ever starting up any JS code, rather than trying to run the JS in real time while a page is loading, in which case forward references would be possible.
Fourth, if the language wants to be efficient in terms of its execution speed, it will have some form of JIT - but this would require stringent validation of the JS compiler itself to ensure that nothing can slip out of the "sandbox" through the JIT into forbidden code on the host machine.
I'm pretty sure there are JS editors/interpreters out there, simply to have a way to develop JS. But I don't know if any references to a command-layer are a mandatory part of the specification for JS. If such specifications exist, then we can call it a bona fide interpreted language. Otherwise, it straddles the border line between the two language types as a language meant to be run in real time like an interpreted language, but which permits compilation directly to the native code of the machine it is running on.
The issue came to a head, for me, recently when I tried to directly translate an old stand-by text-based game (lunar lander) directly from the (interpreted) language FOCAL into C-BC (a C-like extension to POSIX BC whose source is located on GitHub here https://github.com/RockBrentwood/CBC). C-BC, like POSIX BC, is interpreted but allows user-defined codelets, so that implementations of BC normally define a "byte code" language to go with it (historically: this was "dc").
The FOCAL language has a run-time language - which theoretically could be compiled, but also a command-layer subset (e.g. the "library" or "erase" commands) which does not permit forward references that haven't yet been defined, though the run-time language permits forward references.
Unlike GNU-BC, C-BC has goto statements and labels, so it is possible to directly translate the game. However, at the command level (which in a BC file is the top level in the file's scope), this is not possible, since the the top-level of a file's code is - as far as a BC interpreter is concerned - making reference to things that might not yet exist, since the program could just as well have been being entered by a user in real-time. Instead, the entire source would have to be enclosed into { ... } brackets - which gets compiled, in its entirety, to byte-code first, before being executed. So, that's an example of a user-defined codelet, and a text-book example of why most interpreted languages have to have some facility for compiling into something.

Why transformed the bytecode in the SpiderMonkey & JSC?

Javascript engine is usually used to transform bytecode from source code.then, the bytecode transforms to native code.
1) Why transformed bytecode ?? source code directly transforming native code is poor performance ?
2) If source code is very simple (ex. a+b function), source code directly transforming native code is good ?
Complexity and portability.
Transforming from source code to and kind of object code, whether it's bytecode for a virtual machine or machine code for a real machine, is a complex process. Bytecode more closely mimics what most real machines do, and so it's easier to work with: better for optimizing the code to run faster, transforming to machine code for an even bigger boost, or even turning into other formats if the situation calls for it.
Because of this, it usually turns out to be easier to write a front end whose only job is to transform the source code to bytecode (or some other intermediate language), and then a back end that works on the intermediate language: optimizes it, outputs machine code, and all that jazz. More traditional compilers for languages like C have done this for a long time. Java could be considered an unusual application of this principle: its build process usually stops with the intermediate representation (i.e. Java bytecode), and then developers ship that out, so that the JVM can "finish the job" when the user runs it.
There are two big benefits to working this way, aside from making the code easier to work with. The first big advantage is that you can reuse the backend to work with other languages. This doesn't matter so much for JavaScript (which doesn't have a standardized backend), but it's how projects like LLVM and GCC eventually grow to cover so many different languages. Writing the frontend is hard work, but let's say I made, for example, a Lua frontend for Mozilla's JavaScript backend. Then I could tap into all of the optimization work that Mozilla had put into that backend. This saves me a lot of work.
The other big advantage is that you can reuse the frontend to work with more machines. This one does have practical implications for JavaScript. If I were to write a JavaScript interpreter, I'd probably write my first backend for x86 -the architecture most PCs use- because that's where I'd probably be doing the development work. But most cell phones don't use an x86-based architecture -ARM is more common these days- so if I wanted to run fast on cell phones, I'd need to add an ARM backend. But I could do that, without having to rewrite the whole frontend, so once again, I've saved myself a lot of work. If I wanted to run on the Wii U (or the previous generation of game consoles, or older Macs) then I'd need a POWER backend, but again, I could do that without rewriting the frontend.
The bottom line is that while it seems more complex to do two transformations, in the long run it actually turns out to be easier. This is one of those strange and unintuitive things that pops up sometimes in software design, but the benefits are real.

Tools for compiling Python / Boo / Ruby like syntax to C / C++ / LLVM / Javascript (using JS ArrayBuffer for speed)

I'm trying to automatically compile / convert code written with Pythonic semantics into native and fast Javascript code.
What tools can do this, with nice debugging support possible like with Java etc?
Has anyone done this?
Why?
I'm trying to write some visualisation code with a complex main loop, a timeline, some physics simulation, and some complex interaction. I.E: it IS an actual CPU bound problem.
Writing with Javascript and testing in it's browser environment is harder to debug than say, Java, .NET or Python running in a decent IDE.
But for doing actual large scale web development with complex client side code, it's necessary to at least compile to Javascript, if not directly write in it.
Background: Recent advances
Emscripten allows compiling C/C++ to Javascript, that can run with increasing efficiency in the browser due to ArrayBuffer's typed array support and new browser JS engines, as ASM.js and LLJS take advantage of Mozilla's recent speed improvements (that other venders will likely soon follow).
Altjs.org has a laundry list of Javascript alternaltives, but doesn't yet focus on the recent speed improvements or nice semantics specifically but it is becoming common place for people to code for browsers with better tools. Emscripten in particular has loads of amazing demos.
Possible options already considered:
Shedskin - Currently I have tried getting Shedskin working but I have limited C++/C skills (Emscripten only exposes a C API for the Boehm inspired garbage collector it uses, and Shedskin needs a C++ garbage collection class for it's objects, which doesn't exist yet).
Unladen Swallow / RPython, to LLVM - have not been able to setup correctly on Ubuntu yet
Boo to Java then to LLVM (not been able to setup on my Ubuntu system yet)
Additional constraints:
I need to use this on my Ubuntu system.
The compiled Javascript should probably be less than 1 MB
Debugging in the native language which is also cross compiled, should still be possible, allowing taking advantage of existing debug tools.
"This process of constructing instruction tables should be very fascinating. There need be no real danger of it ever becoming a drudge, for any processes that are quite mechanical may be turned over to the machine itself." -- Alan M. Turing, 1946
You want a high level dynamic language that compiles down to efficient low level JavaScript? There is no such thing. If dynamic languages were fast we would not need asm.js in the first place.
If you want to write code that compiles to efficient JavaScript your will have to learn a lower level language. The reason why Emscripten is fast is because it compiles from a low level language (C/C++) that allows for greater compiler optimization than regular JavaScript. That is also the reason why asm.js and LLVM can be faster. They get their speed from not have dynamic types, garbage collection (this specifically is what makes it possible to use ArrayBuffer for memory) and other high-level features.
Bottom line is. There exists no tools for compiling a language with Pythonic semantics into native and fast Javascript code. And depending on what you mean by semantics it is unlikely that such a thing will appear since Python is a slow language in itself.
The best option right now for generating fast JavaScript is Emscripten. You could also consider LLJS or writing fast JavaScript by hand (Chrome has debugging tools for this).
Also, considering the title of your question you are very concerned about syntax. You should not be. When choosing the right language for the job the syntax is one of the least important factors.
Since you mentioned shedskin yourself, I would imaging that you can share some of your experience (and explain what exactly in your opinion shedskin is missing, except that its input is a restricted python grammar). I could also assume that Cython/Pyrex are not acceptable (due too grammar restrictions).
If shedskin is too much in alpha stage for you, then you might be looking for something like Numba project, which includes a compiler of dynamic python into LLVM as well as llvm-py which allows to link LLVM exposed bytecode similar as ctypes allows to link shared-libraries and build LLVM IR compilers.
Here is a cut from the blog where it is shown how one can use Numba as JIT for numpy (incl. performance comparison with equivalent Cython code):
import numpy as np
from numba import double
from numba.decorators import jit
#jit(arg_types=[double[:,:], double[:,:]])
def pairwise_numba(X, D):
M = X.shape[0]
N = X.shape[1]
for i in range(M):
for j in range(M):
d = 0.0
for k in range(N):
tmp = X[i, k] - X[j, k]
d += tmp * tmp
D[i, j] = np.sqrt(d)
Emscripten should allow you to expose and call your python -> llvm -> JS code as described here: https://github.com/kripken/emscripten/wiki/Interacting-with-code

Categories