Instead of having V8 compile JavaScript on the fly and then execute it, isn't it possible to just compile the JavaScript beforehand and then embed the machine code in the page instead of embedding JavaScript in the page?
There are two main problems with shipping machine code on the web:
Portability. No server can afford providing appropriate machine code for all possible system architectures out there (present and future). E.g., V8 already supports 10 different CPU architectures.
Security. No client can afford to run random machine code on their machine without knowing if it can be trusted.
To address (1) you'd generally need to cross-compile machine code, which is more difficult and costly than compiling down from a high-level language. To address (2), you'd need to validate the machine code you receive, which is more difficult and costly than compiling a high-level language.
Machine code also tends to be much larger than high-level code, so there is a bandwidth issue as well.
Now, JavaScript may not be a particularly great choice of high-level language. But it is what we are stuck with as the language of the web.
The way I understand it, the V8 JavaScript engine compiles to machine code anyway so why not just do it beforehand?
According to the W3C HTML5 Scripting specification, there's no standards-based reason why a browser couldn't support machine code with special type attributes (as Chrome does with the Dart language):
The following lists the MIME type strings that user agents must recognize, and the languages to which they refer:
"application/ecmascript"
"application/javascript"
...
User agents may support other MIME types for other languages...
Currently, no browser has implemented such a feature.
I suspect the primary shortcoming of such an approach is that each chip architecture would require a machine-code version of the script compiled for it specifically. This means that in order to support three architectures, a page would need to include a compiled script three times. (And it should be included a fourth time, as plain JavaScript, as a fallback for architectures that you didn't include, or for browsers that can't/don't support compiled code.) This could significantly bloat the size of the page with data that is mostly useless. The increase in load time would seem to significantly offset or completely outweigh whatever time you save on compilation.
An architecture-independent compromise solution like bytecode seems pretty poor: you still need to include the script twice (once for the bytecode, once normally for scripts that don't support it) and you need to do some kind of run-time processing on the bytecode to turn it into machine code.
The multiple-includes-with-fallback problem is exactly why other scripting languages have not made it into the Web environment: they would need coordinated cross-vendor support to be useful. Google is trying with Dart, but it remain to be seen what degree of success they see.
Note that Chrome does cache compiled versions of scripts so a script only needs to be compiled once and then the compiled code is cached for reuse when the user re-visits the page.
Related
I recently started some web development, with ASP.NET and some Javascript, and something is confusing me alot.
I always read that JavaScript used to be interpreted until JIT slowly made it so chunks are compiled to machine code (which made browsers alot faster).
This makes no sense to me. How can JavaScript compile to native machine code, if traditional JavaScript apps don't target the machine/CPU to begin with?
I understand if an electron.js app gets compiled to machine code using the NodeJS runtime. That I get. Because it natively compiles to machine code and as far as I understand it, doesn't run in a browser.
If traditional JavaScript apps run in a browser, why must it be compiled to machine code? The browser is responsible for running the code, not the CPU. The CPU runs the browser itself. I actually don't see how the native OS can influence anything that happens in the browser at all or vise versa. Seems like a security issue as well.
Sorry if it's a stupid question, but I can't find any resource that will go beyond saying "Javascript uses JIT"
Thank you!
Lauren
At the end of the day, the CPU has to run the code.
JIT-compiling it down to machine code is one way to make that faster.
How can JavaScript compile to native machine code, if traditional JavaScript apps don't target the machine/CPU to begin with?
It is not "Javascript" that is doing it, it is the browser (or rather, the Javascript execution engine inside the browser), and since it is "JIT" it knowns exactly which CPU to target (this is not done in a generic way, this is done for the specific CPU that the browser is currently running on).
So, yes, there is some mismatch, since Javascript will not use low-level primitive types that the CPU can work with directly, which is why there is a lot of indirection and speculative type inference guess-work. The resulting machine code is much different than you would get from hand-coded assembly, but it can still be a net positive. To help with this, WASM was developed, which is closer to "normal" machine code.
Other intermediate, non-CPU specific formats like JVM bytecode or CLR bytecode or LLVM bitcode are in a similar situation (in that can also be compiled to machine code they do not themselves target directly) -- but they have been "lowered" already from language source code to something close to machine code.
Seems like a security issue as well.
Yes, it can be. The browser has to be careful in what it is doing here, and the OS should sandbox the browser as much as possible.
Executing instructions is easier than running an interpreter, and JIT seeks to take advantage of this for a performance boost. All programs running on your computer become machine code at some point, the only question is which instructions are be executed.
let x=0;
for (let i=0;i<100;++i) {
x+=2;
}
Since it is clear that there are no side effects in a block of code like this, it is faster to compile instructions directly, rather than interpreting each line.
// NIOS 2 assembly, sorry its the only one i know
movi r2,0
movi r3,0
movi r4,100
loop:
addi r2,2
addi r3,1
blt r3,r4,loop
Executing this will be faster than executing the parsing logic for each individual instruction.
TLDR: All programs are always running CPU instructions, so it is faster to minimize the number of instructions by skipping the parsing stage when possible
As far as I understood, JavaScript cannot be compiled ahead of time because of it's dynamic nature. So Interpretation and just in time compilation happens at run time, that affects JavaScript performance. So WebAssembly comes into picture. Languages can be compiled ahead of time into intermediate format (WASM). This gives good performance since there is less runtime overhead.
My question is why JVM cannot be used in place of WebAssembly VM. Java compiled into intermediate format (bytecode). This byte code can be given to browser and JVM can execute it. JVM also supports JIT which helps to achieve near native performance.
So what is the need for new WebAssembly. Why can't JVM be integrated into browser and achieve high performance by leveraging the existing most popular Java language.
There are a great many reasons why the JVM was not deemed a suitable runtime in place of WebAssembly ...
WebAssembly was designed with delivery-over-HTTP and browser-based in mind. For that reason, it supports streaming compilation - i.e. you can start compiling the code while downloading.
WebAssembly was designed to have rapid compilation times (resulting in web pages that load quickly), this is supported by having very simple validation rules compared to Java / JVM languages.
WebAssembly was designed with the concept of a 'host' environment, i.e. the browser.
WebAssembly was designed to be secure and simple, minimising the overall attack surface.
WebAssembly was designed to support a great many languages (C, C++, Rust, ...), whereas the JVM was initially design for a single language, Java.
As a general observation, WebAssembly was designed to support multiple languages on the web. The JVM was designed to support Java on the desktop. It doesn't make either one better than the other in a more general sense.
Finally, the JVM was integrated with the browser (Java Applets), but that didn't work out in the end!
A quote from the High-Level Goals of WebAssembly:
a Minimum Viable Product (MVP) for the standard with roughly the same functionality as asm.js, primarily aimed at C/C++;
So their original goal was running C/C++ program in a web browser, not running Java code.
JVM can run:
JavaScript
Python (Jython)
Ruby (JRuby)
Groovy
Scala
C++ (using JNI)
unfortunately the support for java was removed from the browser, because Sun (former maintainer of java), could not provide adequate support.
Just like the Flash ended up losing.
How to run compiled files directly using Google Native Client (PNaCl)? It tried checking their documentation. It said that -
Native Client is a sandbox for running compiled C and C++ code in the browser efficiently and securely, independent of the user’s operating system.
But in their documentation, they only deal with sources of the application. Is there any way to run compiled code directly? I want to run files with .exe and .deb extensions
I'm not limiting the answer to Native Client. Any mechanism which can do that sort of work will work for me.
You can't run pre-compiled code within NaCl or PNaCl. You have to use the compilers provided by the SDK. There are three main reasons for this:
NaCl is an execution sandbox which relies on crafting machine code (x86-32, x86-64, ARM, MIPS) in a very particular way. This is regular machine code from the CPU's point of view, but allows the sandbox to run a validator and make sure that the code can't do anything malicious. This is called Software Fault Isolation, and is explained in this paper. The other ISA sandboxes are also documented.
PNaCl targets NaCl, but is an architecture-agnostic intermediate representation. This means that you ship what can be thought of as bytecode, and the browser figures out which type of machine code (x86-32, x86-64, ARM, MIPS) to generate based on the user's machine. The developer doesn't generate 4 binaries.
In both above cases, the code can execute as-is on Windows, MacOSX, Linux, ChromeOS, and (while not usually shipping) Android. This means that the NaCl sandbox presents itself as an operating system, and offers the same APIs. These APIs are different from other OSes, though they're pretty close to POSIX especially if you use nacl_io.
The above points require that you use the compilers provided by the SDK.
It is technically possible to run binaries built for other architectures or operating systems since the system is Turing-complete. That's what QEMU does, what Rosetta did, what Transmeta did, and what the Android Runtime for Chome (ARC) enables. This usually requires binary translation and emulation of all operating system calls. This is technically difficult to implement, and often has severe performance cost. I do not recommend exploring this option.
As #JFBastien pointed out, emulation is the only option to execute pre-compiled native code in the browser environment. But it is an option nevertheless. Depending on your demands on performance, it might even be a viable option.
Click here for example to boot up an emulator running Windows (a very old version though) in your browser.
From the menu pick, for example, notepad.exe (using the cursor down key on your keyboard) and hit enter. There you have it: an unmodified, precompiled, native notepad.exe running inside your browser! (and probably even faster than back in the day when this OS was new).
There are a lot of emulators written in Javascript all around the web. Running a small Linux distribution with usable performance and even with networking(!), graphics and sound is actually possible. Check out the OpenRISC emulator. You can even run an ssh daemon and log into it from your local machine!
Firefox has a SpiderMonkey javascript engine. Chrome has V8 javascript engine.
Obviously those engines are a separate products and browsers utilize some kind of interface API to interact with them.
On the other hand programmers longed for a long time for their favorite language in browser. So much so, that we have products like GWT (for java), parenscript (for common lisp), HJScript (for haskell), and i'm sure many other libraries for many other languages that allow programmers stay with their favorite language and generate client side code as well.
The idea is so obvious that i am surprised that there's no implementation of it yet. Why not publish the interface API of browser to language engine and allow web sites to provide custom language engines as downloadable bundles. With current internet speeds 3-4 megabytes one time download is not a problem for majority of applications, even more so for intranet usage.
So where's our pluggable engines ?
You don't need pluggable engines really, just an agreed upon byte-code format. Google is going down that path now with NaCl and PNaCl which is based on LLVM. So any program that compiled down to a safe subset of LLVM byte-code could be run in the browser.
Browser vendors can't even agree on a common video format (see the html5 <video> debate) or on how the document DOM object should look like, and you want them to agree on a whole language interface?
Good luck.
I guess you forgot about applets and embed's. Both offer exactly what you want. And both suck for the very same reason.
We've been down this route in the past.
Older versions of IE supported VBScript as a scripting language in addition to JScript.
The result was a whole load of sites that only worked in IE.
This isn't what the web needs again. As a developer, I may desperately want to write code using my favourite language, but as a user I want to be able to browse all sites on the web without having to worry about which plug-ins I need for any given site, or whether my preferred browser can even use those plug-ins.
This is the problem that Microsoft's Silverlight has had. It might be a marvellous technology, but to the end user, why do I want another plug-in? Silverlight has managed to gain some market share thanks to the sheer power of Microsoft, but really not that much.
Now, if the code reaching the end user is consistent, regardless of the language it's written in, then the language doesn't matter. But this effectively means compiled code (or at least bytecode), which is a whole different kettle of fish to running a scripting language in the browser.
I would like to know how browser allow viruses to pass through to our computers. Response we receive is a text response.. Only executable thing in the response is JavaScript which does not have much privileges, what makes browser favor certain files to be passed to computer?
The short list:
Browser plugins. ActiveX* in general and Flash in particular are notorious for having holes.
Buffer overflows. Forming either HTML pages or Javascript in a specific way can lead to being able to write anything you want into memory... which can then lead to remote execution.
Other errors. I recall bugs in the past where the browser could be tricked into downloading files into a known location, then execute them.
*Google is working on expanding this particular kind of hole to other browsers with Native Client.
Things like ActiveX controls allow native code to be executed on local machines with essentially full privileges. Most viruses propagate through known security holes in unpatched browsers and don't use Javascript directly.
Browser bugs and misconfiguration can allow sites that should be in the "Internet" (secure) security zone execute code as if they were trusted. They can then use ActiveX components to install malware.
Exploiting software bugs. Commonly, when rendering images, interpreting html/css/javascript, loading ActiveX components or Flash files.
Once a bug is exploited, the procedure is to inject "shell code" (a chunk of native compiled code), into the process memory to get executed.