Here is the code for a small syntax highlighter I made. It's supposed to search through every code tag in my .html file and make highlights to certain parts of it. Once this runs in the browser, I get this weird "keyword"> text in front of every line I have a keyword in. After some debugging, I still haven't came across a solution or a reason to why this problem occurs. I did notice that when I removed the ids from the code tag, the "keyword"> disappears, but the text still doesn't get highlighted.
If I code everything manually rather than using JavaScript to do it for me, everything works fine, but the problem is that the process is very time consuming and I have many more code snippets to do this for. Is there any way that I can get the matching keyword from keywords and include it in somehow? Is there a way to iterate over regular expressions like in arrays? Thanks in advance :)
image 1 (partial success):
I had to manually add span tags (removed unwanted text), but then highlighting doesn't work
image 2 (error image):
CSS code (inline)
.keyword {color: #c792ed;}
.identifier {color: #a6acce;}
.string {color: #6401e5;}
.int {color: #f78c6a;}
.comment {color: #69aa00;}
/*
#code0, #code1
^-----> ^-----> not defined
*/
Html code
<p>
<code id="code0">
let text = "I'd like 2 pizzas please"; // This is a string. It holds text characters
</code>
<br>
<code id="code1">
let ovenOn = true; // This is a Boolean. Its values are either 'true' or 'false'.
</code>
</p>
JavaScript code (inline)
let d = document;
let i = 0;
//----------------> Defining variables
function highlight() {
let tags = d.getElementsByTagName("code");
let reserved = "abstract arguments await boolean break byte case catch char class const continue debugger default delete do double else enum eval export extends false final finally float for function goto if implements import in instanceof int interface let long native new null package private protected public return short static super switch synchronized this throw throws transient true try typeof var void volatile while with yield";
for(i; i<tags.length; i++){
let code = tags[i].innerHTML;
let keywords = new RegExp(reserved.split(' ').join('|'));
// ^------- keywords = ["abstract", "await", "etc..."]
// ^------- then becomes: keywords = /abstract|await|etc.../
code = code.replace(keywords, '<span class="keyword">$1</span>');
// ^---> Should capture the keyword in keywords (fails)
// ^---> see image2
code = code.replace(/"(.*?)"/g, '<span class="string">"$1"</span>');
// ^---> captures strings and places contents between quotes (works)
code = code.replace(/\/\/(.*)/g,'<span class="comment">// $1 </span>');
// ^---> captures single-line comments (works)
tags[i].innerHTML = code;
}
}
window.addEventListener('load', highlight);
// ^-----> Calling highlight();
I fixed some things in the function.
function highlight() {
let tags = d.getElementsByTagName("code");
// I moved the reserved and keyword variables out the for loop so it will not be reinitialized for every code element
// Its best to wrap the reserved words in word boundaries so if the reserved word is inside another word, it won't get replaced.
// Example: 'with' as a reserved word will affect 'without'
let reserved = "\\b(abstract arguments await Boolean break byte case catch char class const continue debugger default delete do double else enum eval export extends false final finally float for function goto if implements import in instanceof int interface let long native new null package private protected public return short static super switch synchronized this throw throws transient true try typeof var void volatile while with yield)\\b";
//'I passed a global flag to the RegExp so all occurences of a reserved word will be replaced'
const keywords = new RegExp(reserved.split(' ').join('|'), 'g');
// The for loop doesn't run because i wasn't initialized.
for (let i = 0; i < tags.length; i++) {
let code = tags[i];
code = code.replace(keywords, `<span class="keyword">$&</span>`);
// After it the matches and replaces the reserved word, the class of the span tag is wrapped in quotes, so i fixed the regex to skip any match that is wrapped in the html tags <>
code = code.replace(/(?!.*>)"(.*?)"/g, '<span class="string">"$1"</span>');
code = code.replace(/\/\/(.*)/g, '<span class="comment">// $1 </span>');
tags[i].innerHTML = code;
}
}
// One of your examples should return this.
// <span class="keyword">let</span> text = <span class="string">"I'd like 2 pizzas please"</span>; <span class="comment">// This is a string. It holds text characters</span>
I think the problem is in how you are writing the HTML string. If you use a template literal (backtick) instead of quotes I think you get the functionality you need.
Here's an example which I think works?
let reserved =
'abstract arguments await boolean'; // have simplified
const keywords = new RegExp(reserved.split(' ').join('|'));
const sample = `<code>let a = new boolean(true);</code>`;
const example = `<span class="keyword">$1</span>`;
console.log(sample.replace(keywords, example));
/* Returns:
<code>
let a = new <span class="keyword">$1</span>(true);
</code>
*/
I am trying to create a JS multiline template literal like this:
function _on_message_arrived(_m) {
// Feedback.
console.log(
`FUNCTION: "_on_message_arrived()":
String (glyphs): ${_m.payloadString}
String (hex): ${_m.payloadBytes}`
);
}
But because there are tabs in 2nd and 3rd lines, these tabs are also printed in the browser's console (I colored them in red):
How can I format the JS source code so that it resembles what I get in multiple lines? That is when I am using tabs in the source code to indent the code and I am also using template literal.
You will have to resort to wrecking your code's indentation, unfortunately:
function _on_message_arrived(_m) {
// Feedback.
console.log(
`FUNCTION: "_on_message_arrived()":
String (glyphs): ${_m.payloadString}
String (hex): ${_m.payloadBytes}`
);
}
Get rid of whitespaces and try to make poor man's content management system.
const data = {
payloadString: '26.9',
payloadBytes: '50,54,46,57'
}
function _on_message_arrived(_m) {
const padStuff = (match, offset, string) => {
return Array(Number(match.replace('<', '').replace('>', ''))).fill('\t').join('');
}
// Feedback.
console.log(
`FUNCTION: "_on_message_arrived()":
<1>String (glyphs): ${_m.payloadString}
<2>String (hex): ${_m.payloadBytes}`.replace(/ +?/g, '').replace(/\<(?<year>\d+)\>/g, padStuff));
}
_on_message_arrived(data);
Using ANTLR4 v4.8
I am in the process of writing transpiler exploring use of ANTLR (javascript target with visitor).
Grammar -> lex/parse is fine and I now sit on parse tree.
Grammar
grammar Mygrammar;
/*
* parser rules
*/
progm : stmt+;
stmt
: progdecl
| print
;
progdecl : PROGDECLKW ID '..';
print : WRITEKW STRLIT '..';
/*
* lexer rules
*/
PROGDECLKW : 'DECLAREPROGRAM';
WRITEKW : 'PRINT';
// Literal
STRLIT : '\'' .*? '\'' ;
// Identifier
ID : [a-zA-Z0-9]+;
// skip
LINE_COMMENT : '*' .*? '\n' -> skip;
TERMINATOR : [\r\n]+ -> skip;
WS : [ \t\n\r]+ -> skip;
hw.mg
***************
* Hello world
***************
DECLAREPROGRAM hw..
PRINT 'Hello World!'..
index.js
...
const myVisitor = require('./src/myVisitor').myVisitor;
const input = './src_sample/hw.mg';
const chars = new antlr4.FileStream(input);
...
parser.buildParseTrees = true;
const myVisit = new myVisitor();
myVisit.visitPrint(parser.print());
Use of visitor didn't seem straightforward, and this SO post helps to an extent.
On use of context. Is there a good way to track ctx, when I hit each node?
Using myVisit.visit(tree) as starting context is fine. When I start visiting each node, using non-root context
myVisit.visitPrint(parser.print()) throws me error.
Error:
PrintContext {
parentCtx: null,
invokingState: -1,
ruleIndex: 3,
children: null,
start: CommonToken {
source: [ [MygrammarLexer], [FileStream] ],
type: -1,
channel: 0,
start: 217,
together with exception: InputMismatchException [Error]
I believe it is because children is null instead of being populated.
Which, in turn, is due to
line 9:0 mismatched input '<EOF>' expecting {'DECLAREPROGRAM', 'PRINT'}
Question:
Is above the only way to pass the context or am I doing this wrong?
If the use is correct, then I incline towards looking at reporting this as bug.
edit 17.3 - added grammar and source
When you invoke parser.print() but feed it the input:
***************
* Hello world
***************
DECLAREPROGRAM hw..
PRINT 'Hello World!'..
it will not work. For print(), the parser expects input like this PRINT 'Hello World!'... For the entire input, you will have to invoke prog() instead. Also, it is wise to "anchor" your starting rule with the EOF token which will force ANTLR to consume the entire input:
progm : stmt+ EOF;
If you want to parse and visit an entire parse tree (using prog()), but are only interested in the print node/context, then it is better to use a listener instead of a visitor. Check this page how to use a listener: https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md
EDIT
Here's how a listener works (a Python demo since I don't have the JS set up properly):
import antlr4
from playground.MygrammarLexer import MygrammarLexer
from playground.MygrammarParser import MygrammarParser
from playground.MygrammarListener import MygrammarListener
class PrintPreprocessor(MygrammarListener):
def enterPrint_(self, ctx: MygrammarParser.Print_Context):
print("Entered print: `{}`".format(ctx.getText()))
if __name__ == '__main__':
source = """
***************
* Hello world
***************
DECLAREPROGRAM hw..
PRINT 'Hello World!'..
"""
lexer = MygrammarLexer(antlr4.InputStream(source))
parser = MygrammarParser(antlr4.CommonTokenStream(lexer))
antlr4.ParseTreeWalker().walk(PrintPreprocessor(), parser.progm())
When running the code above, the following will be printed:
Entered print: `PRINT'Hello World!'..`
So, in short: this listener accepts the entire parse tree of your input, but only "listens" when we enter the print parser rule.
Note that I renamed print to print_ because print is protected in the Python target.
Someone recently asked if there was a simple way to transform custom markup as follows, including nested markings. Examples included...
for \k[hello] the output will be <b>hello</b>
for \i[world], the output will be <em>world</em>
for hello \k[dear \i[world]], the output will be hello <b>dear <em>world</em></b>
for \b[some text](url), the output will be <a href=”url”>some text</a>
for \r[some text](url), the output will be <img alt=”some text” src=”url” />
Interestingly enough, transpiling the above to javascript, including consideration for nesting, is remarkably straightforward, particularly if the markup grammar is consistent.
//
// Define the syntax and translation to javascript.
//
const grammar = {
syntax: {
k: {markUp: `\k[`, javascript: `"+grammar.oneArg("k","`, pre: `<b>`, post: `</b>`},
i: {markUp: `\i[`, javascript: `"+grammar.oneArg("i","`, pre: `<em>`, post: `</em>`},
b: {markUp: `\b[`, javascript: `"+grammar.twoArgs("b","`, pattern: `$1`},
r: {markUp: `\r[`, javascript: `"+grammar.twoArgs("r","`, pattern: `<img alt="$1" src="$2"/>`},
close0: {markUp: `](`, javascript: `","`},
close1: {markUp: `)`, javascript: `")+"`},
close2: {markUp: `]`, javascript: `")+"`}
},
oneArg: function( command, arg1 ) {
return grammar.syntax[ command ].pre + arg1 + grammar.syntax[ command ].post;
},
twoArgs: function( command, arg1, arg2 ) {
return grammar.syntax[ command ].pattern.split( `$1` ).join( arg1 ).split( `$2` ).join( arg2 );
}
}
function transpileAndExecute( markUpString ) {
// Convert the markUp to javascript.
for ( command in grammar.syntax ) {
markUpString = markUpString.split( grammar.syntax[ command ].markUp ).join( grammar.syntax[ command ].javascript );
}
// With the markUp now converted to javascript, let's execute it!
return new Function( `return "${markUpString}"` )();
}
var markUpTest = `Hello \k[dear \i[world!]] \b[\i[Search:] \k[Engine 1]](http://www.google.com) \r[\i[Search:] \k[Engine 2]](http://www.yahoo.com)`;
console.log( transpileAndExecute( markUpTest ) );
Note that there are obviously pre-processing issues that must also be addressed, such as how to handle the inclusion of tokens in normal text. Eg, including a ']' as part of a text string will throw the transpiler a curve ball, so enforcing a rule such as using '\]' to represent a ']', and then replacing all such occurrences of '\]' with innocuous text before transpiling and then re-replacing afterwards solves this problem simply...
In terms of transpiling, using the grammar defined above, the following markup...
Hello \k[dear \i[world!]] \b[\i[Search:] \k[Engine 1]](http://www.google.com) \r[\i[Search:] \k[Engine 2]](http://www.yahoo.com)
...is transpiled to...
"Hello world! "+grammar.oneArg("k","dear "+grammar.oneArg("i","world")+"")+" "+grammar.twoArgs("b",""+grammar.oneArg("i","Search:")+" "+grammar.oneArg("k","Engine 1")+"","http://www.google.com")+" "+grammar.twoArgs("r",""+grammar.oneArg("i","Search:")+" "+grammar.oneArg("k","Engine 2")+"","http://www.yahoo.com")+""
...and once executed as a javascript Function, results in...
Hello <b>dear <em>world!</em></b> <em>Search:</em> <b>Engine 1</b> <img alt="<em>Search:</em> <b>Engine 2</b>" src="http://www.yahoo.com"/>
The real challenge though is the handling of syntax errors, particularly if one has large amounts of markup to transpile. The crystal clear answer by CertainPerformance (see Find details of SyntaxError thrown by javascript new Function() constructor ) provides a means of capturing the line number and character number of a syntax error from a dynamically compiled javascript function, but am not quite sure of the best means of mapping a syntax error of the transpiled code back to the original markup.
Eg, if an extra ']' is out of place (after "Goodbye")...
Hello World! \b[\i[Goodbye]]] \k[World!]]
...this transpiles to...
"Hello World! "+grammar.twoArgs("b",""+grammar.oneArg("i","Goodbye")+"")+"")+" "+grammar.oneArg("k","World!")+"")+""
^
...and CertainPerformance's checkSyntax function returns "Error thrown at: 1:76", as expected, marked above with the "^".
The question is, how to map this back to the original markup to aid in narrowing down the error in the markup? (Obviously in this case, it's simple to see the error in the markup, but if one has pages of markup being transpiled, then assistance in narrowing down the syntax error is a must.) Maintaining a map between the markup and the transpiled code seems tricky, as the transpiler is mutating the markup to javascript code step-by-step as it walks the grammar transformation matrix. My gut tells me there's a simpler way... Thanks for looking.
I would suggest you write a syntax checker, kinda like jsonlint or jslint etc... that checks if everything is checked and closed properly, before actually compiling the text to human readable text.
This allows for debugging, and prevents from malformed code running haywire, and allows you to provide an error highlighted document editor when they are editing the text.
Below a proof of concept which just checks if brackets are closed properly.
var grammarLint = function(text) {
var nestingCounter = 0;
var isCommand = char => char == '\\';
var isOpen = char => char == '[';
var isClose = char => char == ']';
var lines = text.split('\n');
for(var i = 0; i < lines.length; i++) {
text = lines[i];
for(var c = 0; c < text.length; c++) {
var char = text.charAt(c);
if(isCommand(char) && isOpen(text.charAt(c+2))) {
c += 2;
nestingCounter++;
continue;
}
if(isClose(char)) {
nestingCounter--;
if(nestingCounter < 0) {
throw new Error('Command closed but not opened at on line '+(i+1)+' char '+(c+1));
}
}
}
}
if(nestingCounter > 0) {
throw new Error(nestingCounter + ' Unclosed command brackets found');
}
}
text = 'Hello World! \\b[\\i[Goodbye]]] \\k[World!]]';
try {
grammarLint(text);
}
catch(e) {
console.error(e.message);
}
text = 'Hello World! \\b[\\i[Goodbye \\k[World!]]';
try {
grammarLint(text);
}
catch(e) {
console.error(e.message);
}
Chased down the ability to leverage the javascript compiler to capture syntax errors in the transpiled code, and reference this back to the original markup. In short, this involves a scheme of incorporating comments in the transpiled code to permit references back to the markup, providing the means of narrowing down the markup error. (There is a bit of shortcoming in that the error message is really a transpiler syntax error, and doesn't necessarily correspond exactly to the markup error, but gives one a fighting chance to figure out where the markup issue lies.)
This algorithm also leverages the concepts of CertainPerformance's technique ( Find details of SyntaxError thrown by javascript new Function() constructor ) of using setTimeout to capture the syntax errors of the transpiled code. I have interspersed a javascript Promise to smooth the flow.
"use strict";
//
// Define the syntax and translation to javascript.
//
class Transpiler {
static _syntaxCheckCounter = 0;
static _syntaxCheck = {};
static _currentSyntaxCheck = null;
constructor() {
this.grammar = {
syntax: {
k: {markUp: `\k[`, javascript: `"►+grammar.oneArg("k",◄"`, pre: `<b>`, post: `</b>`},
i: {markUp: `\i[`, javascript: `"►+grammar.oneArg("i",◄"`, pre: `<em>`, post: `</em>`},
b: {markUp: `\b[`, javascript: `"►+grammar.twoArgs("b",◄"`, pattern: `$1`},
r: {markUp: `\r[`, javascript: `"►+grammar.twoArgs("r",◄"`, pattern: `<img alt="$1" src="$2"/>`},
close0: {markUp: `](`, javascript: `"►,◄"`},
close1: {markUp: `)`, javascript: `"►)+◄"`},
close2: {markUp: `]`, javascript: `"►)+◄"`}
},
marker: { // https://www.w3schools.com/charsets/ref_utf_geometric.asp
begMarker: `►`, // 25ba
endMarker: `◄`, // 25c4
begComment: `◆`, // 25c6
endComment: `◇`, // 25c7
fillerChar: `●` // 25cf
},
oneArg: function( command, arg1 ) {
return this.syntax[ command ].pre + arg1 + this.syntax[ command ].post;
},
twoArgs: function( command, arg1, arg2 ) {
return this.syntax[ command ].pattern.split( `$1` ).join( arg1 ).split( `$2` ).join( arg2 );
}
};
};
static transpilerSyntaxChecker(err) {
// Uncomment the following line to disable default console error message.
//err.preventDefault();
let transpiledLine = Transpiler._syntaxCheck[ Transpiler._currentSyntaxCheck ].transpiledFunction.split(`\n`)[1];
let lo = parseInt( transpiledLine.substr( transpiledLine.substr( 0, err.colno ).lastIndexOf( `●` ) + 1 ) );
let hi = parseInt( transpiledLine.substr( transpiledLine.substr( err.colno ).indexOf( `●` ) + err.colno + 1 ) );
let markUpLine = Transpiler._syntaxCheck[ Transpiler._currentSyntaxCheck ].markUp;
let errString = markUpLine.substring( lo - 40, hi + 40 ).split(`\n`).join(`↵`) + `\n`;
errString += ( `.`.repeat( lo ) + `^`.repeat( hi - lo ) ).substring( lo - 40, hi + 40 );
Transpiler._syntaxCheck[Transpiler._currentSyntaxCheck].rejectFunction( new Error(`'${ err.message }' in transpiled code, corresponding to character range ${ lo }:${ hi } in the markup.\n${ errString }`) );
window.removeEventListener('error', Transpiler.transpilerSyntaxChecker);
delete Transpiler._syntaxCheck[Transpiler._currentSyntaxCheck];
};
async transpileAndExecute( markUpString ) {
// Convert the markUp to javascript.
console.log( markUpString );
let gm = this.grammar.marker;
let markUpIndex = markUpString;
let transpiled = markUpString;
for ( let n in this.grammar.syntax ) {
let command = this.grammar.syntax[ n ];
let markUpIndexSplit = markUpIndex.split( command.markUp );
let transpiledSplit = transpiled.split( command.markUp );
if ( markUpIndexSplit.length !== transpiledSplit.length ) {
throw `Ambiguous grammar when searching for "${ command.markUp }" to replace with "${ command.javascript }".`;
}
for ( let i = 0; i < markUpIndexSplit.length; i++ ) {
if ( i === 0 ) {
markUpIndex = markUpIndexSplit[ 0 ];
transpiled = transpiledSplit[ 0 ];
} else {
let js = command.javascript.replace( gm.begMarker, gm.begComment + gm.fillerChar + markUpIndex.length + gm.endComment );
markUpIndex += gm.fillerChar.repeat( command.markUp.length );
js = js.replace( gm.endMarker, gm.begComment + gm.fillerChar + markUpIndex.length + gm.endComment );
markUpIndex += markUpIndexSplit[ i ];
transpiled += js + transpiledSplit[ i ];
}
}
};
transpiled = transpiled.split( gm.begComment ).join( `/*` );
transpiled = transpiled.split( gm.endComment ).join( `*/` );
transpiled = `/*${ gm.fillerChar }0*/"${ transpiled }"/*${ gm.fillerChar }${ markUpIndex.length + 1 }*/`;
console.log( markUpIndex );
console.log( transpiled );
let self = this;
var id = ++Transpiler._syntaxCheckCounter;
Transpiler._syntaxCheck[id] = {};
let transpiledFunction = `"use strict"; if ( run ) return\n${ transpiled.split(`\n`).join(` `) }`;
Transpiler._syntaxCheck[id].markUp = markUpString;
Transpiler._syntaxCheck[id].transpiledFunction = transpiledFunction;
//
// Here's where it gets tricky. (See "CertainPerformance's" post at
// https://stackoverflow.com/questions/35252731
// for details behind the concept.) In this implementation a Promise
// is created, which on success of the JS compiler syntax check, is resolved
// immediately. Otherwise, if there is a syntax error, the transpilerSyntaxChecker
// routine, which has access to a reference to the Promise reject function,
// calls the reject function to resolve the promise, returning the error back
// to the calling process.
//
let checkSyntaxPromise = new Promise((resolve, reject) => {
setTimeout( () => {
Transpiler._currentSyntaxCheck = id;
window.addEventListener('error', Transpiler.transpilerSyntaxChecker);
// Perform the syntax check by attempting to compile the transpiled function.
new Function( `grammar`, `run`, transpiledFunction )( self.grammar );
resolve( null );
window.removeEventListener('error', Transpiler.transpilerSyntaxChecker);
delete Transpiler._syntaxCheck[id];
});
Transpiler._syntaxCheck[id].rejectFunction = reject;
});
let result = await checkSyntaxPromise;
// With the markUp now converted to javascript and syntax checked, let's execute it!
return ( new Function( `grammar`, `run`, transpiledFunction.replace(`return\n`,`return `) )( this.grammar, true ) );
};
}
Here are some sample runs with botched markup, and the corresponding console output. The following markup has an extra ]...
let markUp = `Hello World \k[Goodbye]] World`;
new Transpiler().transpileAndExecute( markUp ).then(result => console.log( result )).catch( err => console.log( err ));
...resulting in transpiled code of...
/*●0*/""/*●0*/+grammar.oneArg("i",/*●2*/"Hello World"/*●13*/)+/*●14*/" "/*●15*/+grammar.oneArg("k",/*●17*/""/*●17*/+grammar.oneArg("i",/*●19*/"Goodbye"/*●26*/)+/*●27*/" World"/*●34*/
Note the interspersed comments, which point back to the character position in the original markup. Then, when the javascript compiler throws an error, it is trapped by transpilerSyntaxChecker which uses the embedded comments to identify the location in the markup, dumping the following results to the console...
Uncaught SyntaxError: Unexpected token )
at new Function (<anonymous>)
at markUp.html:127
Error: 'Uncaught SyntaxError: Unexpected token )' in transpiled code, corresponding to character range 22:23 in the markup.
Hello World k[Goodbye]] World
......................^
at transpilerSyntaxChecker (markUp.html:59)
Note that the Unexpected token ) message refers to the transpiled code, not the markup script, but the output points to the offending ].
Here's another sample run, in this case missing a close ]...
let markUp = `\i[Hello World] \k[\i[Goodbye] World`;
new Transpiler().transpileAndExecute( markUp ).then(result => console.log( result )).catch(err => console.log( err ));
...which produces the following transpiled code...
/*●0*/""/*●0*/+grammar.oneArg("i",/*●2*/"Hello World"/*●13*/)+/*●14*/" "/*●15*/+grammar.oneArg("k",/*●17*/""/*●17*/+grammar.oneArg("i",/*●19*/"Goodbye"/*●26*/)+/*●27*/" World"/*●34*/
...throwing the following error...
Uncaught SyntaxError: missing ) after argument list
at new Function (<anonymous>)
at markUp.html:127
Error: 'Uncaught SyntaxError: missing ) after argument list' in transpiled code, corresponding to character range 27:34 in the markup.
i[Hello World] k[i[Goodbye] World
...........................^^^^^^^
at transpilerSyntaxChecker (markUp.html:59)
Maybe not the best solution, but a lazy man's solution. Tschallacka's response has merit (ie, a custom syntax checker or using something like Jison) in performing a true syntax check against the markup, without the setTimeout / Promise complexities nor the somewhat imprecise method of using the transpiler error messages to refer to the original markup...
For example, if you have an expression like this:
Expression<Func<int, int>> fn = x => x * x;
Is there anything that will traverse the expression tree and generate this?
"function(x) { return x * x; }"
It's probably not easy, but yes, it's absolutely feasible. ORMs like Entity Framework or Linq to SQL do it to translate Linq queries into SQL, but you can actually generate anything you want from the expression tree...
You should implement an ExpressionVisitor to analyse and transform the expression.
EDIT: here's a very basic implementation that works for your example:
Expression<Func<int, int>> fn = x => x * x;
var visitor = new JsExpressionVisitor();
visitor.Visit(fn);
Console.WriteLine(visitor.JavaScriptCode);
...
class JsExpressionVisitor : ExpressionVisitor
{
private readonly StringBuilder _builder;
public JsExpressionVisitor()
{
_builder = new StringBuilder();
}
public string JavaScriptCode
{
get { return _builder.ToString(); }
}
public override Expression Visit(Expression node)
{
_builder.Clear();
return base.Visit(node);
}
protected override Expression VisitParameter(ParameterExpression node)
{
_builder.Append(node.Name);
base.VisitParameter(node);
return node;
}
protected override Expression VisitBinary(BinaryExpression node)
{
base.Visit(node.Left);
_builder.Append(GetOperator(node.NodeType));
base.Visit(node.Right);
return node;
}
protected override Expression VisitLambda<T>(Expression<T> node)
{
_builder.Append("function(");
for (int i = 0; i < node.Parameters.Count; i++)
{
if (i > 0)
_builder.Append(", ");
_builder.Append(node.Parameters[i].Name);
}
_builder.Append(") {");
if (node.Body.Type != typeof(void))
{
_builder.Append("return ");
}
base.Visit(node.Body);
_builder.Append("; }");
return node;
}
private static string GetOperator(ExpressionType nodeType)
{
switch (nodeType)
{
case ExpressionType.Add:
return " + ";
case ExpressionType.Multiply:
return " * ";
case ExpressionType.Subtract:
return " - ";
case ExpressionType.Divide:
return " / ";
case ExpressionType.Assign:
return " = ";
case ExpressionType.Equal:
return " == ";
case ExpressionType.NotEqual:
return " != ";
// TODO: Add other operators...
}
throw new NotImplementedException("Operator not implemented");
}
}
It only handles lambdas with a single instruction, but anyway the C# compiler can't generate an expression tree for a block lambda.
There's still a lot of work to do of course, this is a very minimal implementation... you probably need to add method calls (VisitMethodCall), property and field access (VisitMember), etc.
Script# is used by Microsoft internal developers to do exactly this.
Take a look at Lambda2Js, a library created by Miguel Angelo for this exact purpose.
It adds a CompileToJavascript extension method to any Expression.
Example 1:
Expression<Func<MyClass, object>> expr = x => x.PhonesByName["Miguel"].DDD == 32 | x.Phones.Length != 1;
var js = expr.CompileToJavascript();
Assert.AreEqual("PhonesByName[\"Miguel\"].DDD==32|Phones.length!=1", js);
Example 2:
Expression<Func<MyClass, object>> expr = x => x.Phones.FirstOrDefault(p => p.DDD > 10);
var js = expr.CompileToJavascript();
Assert.AreEqual("System.Linq.Enumerable.FirstOrDefault(Phones,function(p){return p.DDD>10;})", js);
More examples here.
The expression has already been parsed for you by the C# compiler; all that remains is for you to traverse the expression tree and generate the code. Traversing the tree can be done recursively, and each node could be handled by checking what type it is (there are several subclasses of Expression, representing e.g. functions, operators, and member lookup). The handler for each type can generate the appropriate code and traverse the node's children (which will be available in different properties depending on which expression type it is). For instance, a function node could be processed by first outputting "function(" followed by the parameter name followed by ") {". Then, the body could be processed recursively, and finally, you output "}".
A few people have developed open source libraries seeking to solve this problem. The one I have been looking at is Linq2CodeDom, which converts expressions into a CodeDom graph, which can then be compiled to JavaScript as long as the code is compatible.
Script# leverages the original C# source code and the compiled assembly, not an expression tree.
I made some minor edits to Linq2CodeDom to add JScript as a supported language--essentially just adding a reference to Microsoft.JScript, updating an enum, and adding one more case in GenerateCode. Here is the code to convert an expression:
var c = new CodeDomGenerator();
c.AddNamespace("Example")
.AddClass("Container")
.AddMethod(
MemberAttributes.Public | MemberAttributes.Static,
(int x) => "Square",
Emit.#return<int, int>(x => x * x)
);
Console.WriteLine(c.GenerateCode(CodeDomGenerator.Language.JScript));
And here is the result:
package Example
{
public class Container
{
public static function Square(x : int)
{
return (x * x);
}
}
}
The method signature reflects the more strongly-typed nature of JScript. It may be better to use Linq2CodeDom to generate C# and then pass this to Script# to convert this to JavaScript. I believe the first answer is the most correct, but as you can see by reviewing the Linq2CodeDom source, there is a lot of effort involved on handling every case to generate the code correctly.