Java-based interpreter for JavaScript

Java-based interpreter for JavaScript - javascript

As a project in school i have to make a JavaScript interpreter. (Everything incl. the entire backend has to be made by me).
Everything has to be written in Java - i use ANTLR for parsing and generating ASTs.
currently i can parse some .js code into an AST - and therefore need to translate this AST into som kind of intermediate-representation that can be executed on a bytecode machine.
I have some experience writing compilers for statically typed languages, but im very much in doubt how to proceed from here since JS is a dynamically typed language.
If you can give me some good advices on how to proceed i would be gratefull!
Personally i think i have to make the bytecode-machine first and then make the IR fit this machine afterwards. Unfortunatly i cant really find any good tutorials on how to write a bytecode-machine.
PS. im familiar with following books on the topic :
"modern compiler implementation in Java (Appel)",
"Programming language processors in Java (Watt & Brown)",
"Language implementation patterns (Parr)"
Regards Sune

If you only want to execute the javascript you do not need to transform the AST into IR and then into (some?) bytecode that would also force you to do a bytecode executer.
Why not just execute the javascript AST in a java "engine"? You can store all values as a Map<String, Object> and interpret them as you walk the AST. A new function get an environment/context (a new Map<...>).
If you cannot find a value in the current context you will have to fall back on the global context (=Map).
For the "dynamic" behaviour: If you need an double for an addition you only parse the Object.toString() value to an double using the standard way (more dynamic than that is hard to get :) ):
String value = contextMap.get(key);
Double dvalue = Double.parseDouble(value.toString());
....

Related

Add syntax to JavaScript with Clojure

I want to add new syntax to my JavaScript files much like Sweet.js, but using Clojure/ClojureScript to do the transformation.
Here is what I would like to do:
//original
specialFunction add(a, b) {
return a + b;
}
//transformed
function add(a, b) {
return a + b;
}
registerSpecialFunction({
name: "add",
args: ["a", "b"],
func: add
});
Correct me if I'm wrong, but I think it would work best by:
parse the JS file into an AST
do the transformation
print the resulting AST back out as JavaScript
Any idea on how to do parts 1 and 3? Do I even have the right idea here?

Just as a heads up, this may be a non-trivial task, but probably a great learning exercise.
You'll have to re-implement the concept or macros from Sweet.js or the idea of custom parser extensions from Babylon (parser for Babel) if you want the same control from Clojure.
Either way, you'll need to write a parser that understands a superset of JavaScript's syntax. You might want to look at parser generators such as instaparse (Clojure) and peg.js (JavaScript).
Then you need to decide whether you want to make a fixed number of additions to the language grammar (like Babel) or allow macros to define their own grammar extensions/replacement rules (like Sweet.js). At this point, you'll need to write some kind of engine for transforming the AST generated by your parser.
Macros can be implemented in a number of ways, everything from replacement rules like you'd find in C and C++ to full blown compile-time evaluated functions that work directly with the AST like you'd find in Clojure.
After parsing and transforming the AST with this new tool, you'll need to transform it into a valid JavaScript AST. It'll make things easier to maintain compatibility with the ESTree specification as this will allow you to use tools like escodegen to actually generate the JavaScript code from the AST itself.
Of course, piggybacking tools like peg.js and escodegen is only possible if you're writing your tool as ClojureScript and compiling and running it against NodeJS. The other option is to find compatible tools within the JVM ecosystem and use them with JVM compiled Clojure instead.
The JavaScript ecosystem has a range of good tools available for parsing, transforming and generating ES code (have a look through the Babel packages for example/inspiration) but you'll have to remember that if you are writing ClojureScript and running it under Node, you are in fact creating a JavaScript executable and it might have just been easier to start with JavaScript in the first place.

How to add a custom class in v8?

Thank you for reading my question.
I am trying to embbed google V8 to my game engine.
Now I need to add a Bytes(or Buffer, in node.js) class to my program. I have read the implentation of Buffer in node.js, but, that's too ugly in my mind.
The class what I want is like bytes class in python, which can decode to string and be encoded to from string, and have its own operator such as '+' and '*'.
It seems the only way is to modify the V8 itself?
I spent 2 days reading the code in V8, but getting more and more chaos. For example, the String class has 2 declarations: in v8.h and objects.h, and big diffirences exist in both. The terrible big macros are making me crazy, too.
My question is the same as this: How to add a new class to Google V8? , but yiding thinks he/she does not need to modify V8.
Then I asked the same questions, too:
Where can I find guides about modifying V8's code?
Or where can I find docs about V8's design architecture?

Are there any javascript frameworks for parsing/auto-completing a domain specific language?

I have a grammar for a domain specific language, and I need to create a javascript code editor for that language. Are there any tools that would allow me to generate
a) a javascript incremental parser
b) a javascript auto-complete / auto-suggest engine?
Thanks!

An Example of implementing content assist (auto-complete)
using Chevrotain Javascript Parsing DSL:
https://github.com/SAP/chevrotain/tree/master/examples/parser/content_assist
Chevrotain was designed specifically to build parsers used (as part of) language services tools in Editors/IDEs.
Some of the relevant features are:
Automatic Error Recovery / Fault tolerance because editors and IDEs need to be able to handle 'mostly valid' inputs.
Every Grammar rule may be used as the starting rule as an Editor/IDE may only want to implement incremental parsing for performance reasons.

You may want jison, a js parser generator. In terms of auto-complete / auto-suggest...most of the stuff out there I know if more based on word completion rather than code completion. But once you have a parser running I don't think that part is too difficult..

This is difficult. I'm doing the same sort of thing myself.
One approach is:
You need is a parser which will give you an array of the currently possible ASTs for the text up until the token before the current cursor position.
From there you can see the next token can be of a number of types (usually just one), and do the completion, based on the partial text.
If I ever get my incremental parser working, I'll send a link.
Good luck, and let me know if you find a package which does this.
Chris.

Parse JavaScript to instrument code

I need to split a JavaScript file into single instructions. For example
a = 2;
foo()
function bar() {
b = 5;
print("spam");
}
has to be separated into three instructions. (assignment, function call and function definition).
Basically I need to instrument the code, injecting code between these instructions to perform checks. Splitting by ";" wouldn't obviously work because you can also end instructions with newlines and maybe I don't want to instrument code inside function and class definitions (I don't know yet). I took a course about grammars with flex/Bison but in this case the semantic action for this rule would be "print all the descendants in the parse tree and put my code at the end" which can't be done with basic Bison I think. How do I do this? I also need to split the code because I need to interface with Python with python-spidermonkey.
Or... is there a library out there already which saves me from reinventing the wheel? It doesn't have to be in Python.

Why not use a JavaScript parser? There are lots, including a Python API for ANTLR and a Python wrapper around SpiderMonkey.

JavaScript is tricky to parse; you need a full JavaScript parser.
The DMS Software Reengineering Toolkit can parse full JavaScript and build a corresponding AST.
AST operators can then be used to walk over the tree to "split it". Even easier, however, is to apply source-to-source transformations that look for one surface syntax (JavaScript) pattern, and replace it by another. You can use such transformations to insert the instrumentation into the code, rather than splitting the code to make holds in which to do the insertions. After the transformations are complete, DMS can regenerate valid JavaScript code (complete with the orignal comments if unaffected).

Why not use an existing JavaScript interpreter like Rhino (Java) or python-spidermonkey (not sure whether this one is still alive)? It will parse the JS and then you can examine the resulting parse tree. I'm not sure how easy it will be to recreate the original code but that mostly depends on how readable the instrumented code must be. If no one ever looks at it, just generate a really compact form.
pyjamas might also be of interest; this is a Python to JavaScript transpiler.
[EDIT] While this doesn't solve your problem at first glance, you might use it for a different approach: Instead of instrumenting JavaScript, write your code in Python instead (which can be easily instrumented; all the tools are already there) and then convert the result to JavaScript.
Lastly, if you want to solve your problem in Python but can't find a parser: Use a Java engine to add comments to the code which you can then search for in Python to instrument the code.

Why not try a javascript beautifier?
For example http://jsbeautifier.org/
Or see Command line JavaScript code beautifier that works on Windows and Linux

Forget my parser. https://bitbucket.org/mvantellingen/pyjsparser is great and complete parser. I've fixed a couple of it's bugs here: https://bitbucket.org/nullie/pyjsparser

How to do localizable javascript?

I have a web application that uses TONS of javascript, and as usual, there are a lot of textual constants that will be displayed to the user embedded in the code itself.
What do you think is the best way to make this localizable?
I know I need to take those strings off of the code and replace them with constants, which will be defined into some external place.
For the server side, ASP.Net provides some very neat capabilities for dealing with this.
What's the best to do this in Javascript?
The best idea I have is to have a JS file with ALL the string constants of the JS of the site (i'd have different copies of this, for each language), and then on each page, I include this script first, before all the others.
This seems like the most centralized way, that also wastes the least bandwidth.
Are there any other better approaches?
Thanks!

here's how we did it (in ASP.net), pretty much along the lines of what you've mentioned:
1) Created two javascript files: one which defines all javascript functions/DOM manipulations as required by the site, and, second called Messages.js: this defines all the string literals that need to be localized, something like var ALERT_MSG = "Alert message in english".
2) Created different version of the Messages.js, one for each locale that we are supporting and localized the strings. The localized js files were named using messages.locale.js naming convention (for eg. messages.fr-FR.js).
3) Included the js files within the "ScriptManager" and provided the ResourceUICultures for the Messages.js file: this ensures that the correct localized file is embedded in the html output (if you are not using ASP.net you can build this basic functionality by doing some culture sniffing and including the appropriate js file).
4) Voila!

Your approach makes sense. Details:
I'd have the strings for each language in an object.
localized={"cat":"chat","dog":"chien"};
Then in code:
localized["cat"]
The quotations around of the keys and the array notation (rather than the more common object dot notation) are to avoid collisions with JavaScript reserved words.

There is a gettext library but I haven't used it.

Your approach sounds good enough.
If you have lots of strings and you are concerned about the bulkiness of the file you may want to consider a script that creates a single javascript file for each language by concatenating the code javascript and the locale javascript and then applying something like Minify.
You'll waste some CPU cycles on publishing but you'll save some round trips...

There's a library for localizing JavaScript applications: https://github.com/wikimedia/jquery.i18n
The strings are stored in JSON files, as pretty much everybody else suggests, but it has a few more features:
It can do parameter replacement, supports gender (clever he/she handling), number (clever plural handling, including languages that have more than one plural form), and custom grammar rules that some languages need.
The only requirement is jQuery.

We Keep Coding

JavaScript is the programming language of the Web.