I am wondering how to do sort of transpilation of source to source using LLVM at a high level. Given that LLVM converts while loops to using br and the like, I am wondering how to then take that specific IR chunk and convert it back to a while loop in a language such as JavaScript.
C while loop -> LLVM IR -> JavaScript while loop
This article suggests that Emscripten converts LLVM code to JavaScript, so it probably handles this somewhere.
I'm wondering just the general strategy for converting it, if there is one. It seems a bit tricky from a distance, figuring out the statements to piece together a while loop from IR.
During the translation from C to LLVM IR, instructions that are deemed necessary can be decorated with Metadata, this metadata can then be used to convert LLVM IR to JavaScript, e.g indicating if the circular branching between basic blocks is a while loop or not (This information is present in the C context). See Intrinsics & Metadata Attributes.
For more information regarding LLVM Metadata see LLVM-Metadata.
In Emscripten the algorithm of recreating high-level language structures is called Relooping and is described in this parer. I'm not sure, though, that it is up-to-date information, but it probably answers your question.
Related
While using the heap-stats tool for V8, I found that the memory is represented by some internal terms (possibly the V8 representations) under the JS sections.
Where can i find the descriptions of what are these?
Also, what do these represent?
SCRIPT_SOURCE_EXTERNAL_TWO_BYTE_TYPE and SCRIPT_SOURCE_EXTERNAL_ONE_BYTE_TYPE under Code section. Are these my source code? Why are they represented in 2byte and 1 byte type separately?
The data types under JS section.
Is there any documentation to find out what datatypes (in JS) they represent?
Where can i find the descriptions of what are these?
They're "instance types", and they're essentially the way V8 distinguishes different types of (internal) objects on its heap. Being an internal implementation detail, they're not publicly described or documented. They can (and do!) change at any time, and unless you work on V8, you're not supposed to have a reason to care about them. They're defined here and here.
SCRIPT_SOURCE_EXTERNAL_TWO_BYTE_TYPE and SCRIPT_SOURCE_EXTERNAL_ONE_BYTE_TYPE under Code section. Are these my source code?
Yes.
Why are they represented in 2byte and 1 byte type separately?
Special-casing one-byte strings is a memory-saving optimization.
The data types under JS section. Is there any documentation to find out what datatypes (in JS) they represent?
No, but in most cases it should be pretty obvious from their names, e.g. JS_ARRAY_BUFFER_TYPE is for ArrayBuffer objects.
data-types in V8 Compiler
To be nitpicky: this doesn't have anything to do with the compiler; it's a part of the internal object model.
I am creating a sorting comparison workbench using AWS to host sorting lambdas.
I have a bubble sort algorithm implemented in python (python3.8) and javascript (nodejs12.x) lambdas. Both have 512MB memory allocated.
When I run these against eachother with array lengths N of 1 to 5500 I get the following graph. N is on the x axis and time taken in MS on the y axis:
While I expect bubble sort to be slow, I didn't expect Python to be 100x slower compared with Javascript. The max milliseconds reached for JS is ~120, vs Python at ~11100.
Perhaps there is an AWS related explanation, or my implementation is very slow?
Update:
I switched the runtime over from CPython to PyPy and this has reduced the time by x100 for running bubble sort, the graphs are now much closer together, thus the time difference was due to the compiler:
cpython and node differ in how the code is interpreted:
Python (the standard cpython) is compiled to bytecode and then interpreted. So the interpreter spends most of it's time in a loop containing a big switch-case. (See https://github.com/python/cpython/blob/5f18c223391eef8c7d01241b51a7b2429609dd84/Python/ceval.c#L1622)
node.js (using V8) and PyPy use just in time compilation. This kind of Problem (tight loop, homogeneous types) is what JITs are best at: if the interpreter sees that some piece of code is run very often with the same types, the JIT generates and optimizes native code for it.
Python needs to run CPython to interpret and compile the code and run it. It creates somehow a layer to process it and therefore more time.
You can see more here
NodeJS can compile using native Javascript instructions and it takes few more seconds to run the code.
Of course, "the problem is that you are implementing a bubble(!) sort!"
Why-the-heck are you implementing your own 'sort algorithm,' when either of the languages that you are now doing can already "sort things?"
"You are not in school anymore!"
I want to add new syntax to my JavaScript files much like Sweet.js, but using Clojure/ClojureScript to do the transformation.
Here is what I would like to do:
//original
specialFunction add(a, b) {
return a + b;
}
//transformed
function add(a, b) {
return a + b;
}
registerSpecialFunction({
name: "add",
args: ["a", "b"],
func: add
});
Correct me if I'm wrong, but I think it would work best by:
parse the JS file into an AST
do the transformation
print the resulting AST back out as JavaScript
Any idea on how to do parts 1 and 3? Do I even have the right idea here?
Just as a heads up, this may be a non-trivial task, but probably a great learning exercise.
You'll have to re-implement the concept or macros from Sweet.js or the idea of custom parser extensions from Babylon (parser for Babel) if you want the same control from Clojure.
Either way, you'll need to write a parser that understands a superset of JavaScript's syntax. You might want to look at parser generators such as instaparse (Clojure) and peg.js (JavaScript).
Then you need to decide whether you want to make a fixed number of additions to the language grammar (like Babel) or allow macros to define their own grammar extensions/replacement rules (like Sweet.js). At this point, you'll need to write some kind of engine for transforming the AST generated by your parser.
Macros can be implemented in a number of ways, everything from replacement rules like you'd find in C and C++ to full blown compile-time evaluated functions that work directly with the AST like you'd find in Clojure.
After parsing and transforming the AST with this new tool, you'll need to transform it into a valid JavaScript AST. It'll make things easier to maintain compatibility with the ESTree specification as this will allow you to use tools like escodegen to actually generate the JavaScript code from the AST itself.
Of course, piggybacking tools like peg.js and escodegen is only possible if you're writing your tool as ClojureScript and compiling and running it against NodeJS. The other option is to find compatible tools within the JVM ecosystem and use them with JVM compiled Clojure instead.
The JavaScript ecosystem has a range of good tools available for parsing, transforming and generating ES code (have a look through the Babel packages for example/inspiration) but you'll have to remember that if you are writing ClojureScript and running it under Node, you are in fact creating a JavaScript executable and it might have just been easier to start with JavaScript in the first place.
JavaScript does not support operator overloading. Matrix libraries in JavaScript could not simplify notation. I would like to create operator overloading with a simple trick by adding using syntax like z = x++y. This is not a valid statement in JavaScript.
That is why I would like to create an include method which will parse existing JavaScript files and replace those statements with actual JavaScript code. This is somehow related to coffescript where the compiler is inside JavaScript. What would be the best way to approach that.
I have string manipulation solution:
"z=x++y;".replace(/(.*)=(.*)\+\+(.*)/i,"for(var _i_=0;_i_<$1.length;_i_++){ $1[i] = $2[i]+$3[i]}")
Example run:
for(var _i_=0;_i_<c.length;_i_++){ c[i] = data[0][i]+data[1][i];}
Obtaining Matlab, numpy like environment in JavaScript would be very convenient for easily deploying scientific models as web applications and avoiding computational burden in server side. Also, parallelization would be as easy as opening another browser tab from somewhere.
As a project in school i have to make a JavaScript interpreter. (Everything incl. the entire backend has to be made by me).
Everything has to be written in Java - i use ANTLR for parsing and generating ASTs.
currently i can parse some .js code into an AST - and therefore need to translate this AST into som kind of intermediate-representation that can be executed on a bytecode machine.
I have some experience writing compilers for statically typed languages, but im very much in doubt how to proceed from here since JS is a dynamically typed language.
If you can give me some good advices on how to proceed i would be gratefull!
Personally i think i have to make the bytecode-machine first and then make the IR fit this machine afterwards. Unfortunatly i cant really find any good tutorials on how to write a bytecode-machine.
PS. im familiar with following books on the topic :
"modern compiler implementation in Java (Appel)",
"Programming language processors in Java (Watt & Brown)",
"Language implementation patterns (Parr)"
Regards Sune
If you only want to execute the javascript you do not need to transform the AST into IR and then into (some?) bytecode that would also force you to do a bytecode executer.
Why not just execute the javascript AST in a java "engine"? You can store all values as a Map<String, Object> and interpret them as you walk the AST. A new function get an environment/context (a new Map<...>).
If you cannot find a value in the current context you will have to fall back on the global context (=Map).
For the "dynamic" behaviour: If you need an double for an addition you only parse the Object.toString() value to an double using the standard way (more dynamic than that is hard to get :) ):
String value = contextMap.get(key);
Double dvalue = Double.parseDouble(value.toString());
....