ctx in ANTLR4 javascript visitor - javascript

Using ANTLR4 v4.8
I am in the process of writing transpiler exploring use of ANTLR (javascript target with visitor).
Grammar -> lex/parse is fine and I now sit on parse tree.
Grammar
grammar Mygrammar;
/*
* parser rules
*/
progm : stmt+;
stmt
: progdecl
| print
;
progdecl : PROGDECLKW ID '..';
print : WRITEKW STRLIT '..';
/*
* lexer rules
*/
PROGDECLKW : 'DECLAREPROGRAM';
WRITEKW : 'PRINT';
// Literal
STRLIT : '\'' .*? '\'' ;
// Identifier
ID : [a-zA-Z0-9]+;
// skip
LINE_COMMENT : '*' .*? '\n' -> skip;
TERMINATOR : [\r\n]+ -> skip;
WS : [ \t\n\r]+ -> skip;
hw.mg
***************
* Hello world
***************
DECLAREPROGRAM hw..
PRINT 'Hello World!'..
index.js
...
const myVisitor = require('./src/myVisitor').myVisitor;
const input = './src_sample/hw.mg';
const chars = new antlr4.FileStream(input);
...
parser.buildParseTrees = true;
const myVisit = new myVisitor();
myVisit.visitPrint(parser.print());
Use of visitor didn't seem straightforward, and this SO post helps to an extent.
On use of context. Is there a good way to track ctx, when I hit each node?
Using myVisit.visit(tree) as starting context is fine. When I start visiting each node, using non-root context
myVisit.visitPrint(parser.print()) throws me error.
Error:
PrintContext {
parentCtx: null,
invokingState: -1,
ruleIndex: 3,
children: null,
start: CommonToken {
source: [ [MygrammarLexer], [FileStream] ],
type: -1,
channel: 0,
start: 217,
together with exception: InputMismatchException [Error]
I believe it is because children is null instead of being populated.
Which, in turn, is due to
line 9:0 mismatched input '<EOF>' expecting {'DECLAREPROGRAM', 'PRINT'}
Question:
Is above the only way to pass the context or am I doing this wrong?
If the use is correct, then I incline towards looking at reporting this as bug.
edit 17.3 - added grammar and source

When you invoke parser.print() but feed it the input:
***************
* Hello world
***************
DECLAREPROGRAM hw..
PRINT 'Hello World!'..
it will not work. For print(), the parser expects input like this PRINT 'Hello World!'... For the entire input, you will have to invoke prog() instead. Also, it is wise to "anchor" your starting rule with the EOF token which will force ANTLR to consume the entire input:
progm : stmt+ EOF;
If you want to parse and visit an entire parse tree (using prog()), but are only interested in the print node/context, then it is better to use a listener instead of a visitor. Check this page how to use a listener: https://github.com/antlr/antlr4/blob/master/doc/javascript-target.md
EDIT
Here's how a listener works (a Python demo since I don't have the JS set up properly):
import antlr4
from playground.MygrammarLexer import MygrammarLexer
from playground.MygrammarParser import MygrammarParser
from playground.MygrammarListener import MygrammarListener
class PrintPreprocessor(MygrammarListener):
def enterPrint_(self, ctx: MygrammarParser.Print_Context):
print("Entered print: `{}`".format(ctx.getText()))
if __name__ == '__main__':
source = """
***************
* Hello world
***************
DECLAREPROGRAM hw..
PRINT 'Hello World!'..
"""
lexer = MygrammarLexer(antlr4.InputStream(source))
parser = MygrammarParser(antlr4.CommonTokenStream(lexer))
antlr4.ParseTreeWalker().walk(PrintPreprocessor(), parser.progm())
When running the code above, the following will be printed:
Entered print: `PRINT'Hello World!'..`
So, in short: this listener accepts the entire parse tree of your input, but only "listens" when we enter the print parser rule.
Note that I renamed print to print_ because print is protected in the Python target.

Related

csv-parse is throwing Invalid Opening Quote: a quote is found inside a field at line

I know there are other posts out there but none of them seem to fix my issues. I am using csv-parse with node js. This is the CSV header and record that I'm trying to parse.
sku,description,productUnitOfMeasure,unitCost,retailPrice,wholesalePrice,dropShipPrice,assemblyCost,planner,comments,productId,fileUpdate,SkuStatus,Master Planning Family,Category,Sub-Category,Brand,ShortCode,Import/Domestic,Inventory Value,Master Pack Quantity,Pallet / TI / HI,40HC Quantity,Product Group ID
032406021945-GreenBay,TFAL B2080264 INIT TNS GRY SAUTE PN 8",EA,7.72,13.99,0.00,0.00,0,Whitney Ehlke-2307,,032406021945,2022-01-25,New,COOKWARE,OPENSTOCK,NONE,T-FAL,B2080264,Domestic,208.44,3,0/0/0,0,23
I have no control over this file. I just need to be able to parse it. You will see that there is a double quote at the end of the description: TFal B2080264 INI TNS GRY SAUTE PN 8".
I need the double quote to stay there and for that to parse as one field. I keep getting this error:
Invalid Opening Quote: a quote is found inside a field at line 2.
The quote is not an opening. It's technically a closing. But regardless, it will not parse.
This is currently my code:
const parser = fs.createReadStream(filePath).pipe(
parse({ columns: true, relax_quotes: true, escape: '\\', ltrim: true, rtrim: true })
)
I have removed some of the params and tried others, to no avail. Any ideas??
This code works fine with the latest csv-parse version (5.0.4). Which version of the csv-parse package are you using? I ask because it looks like the option may have been renamed from relax to relax_quotes only recently.
So, I think the solution is either:
upgrade to the latest csv-parse, and indicate relax_quotes, or
stay with your current version of csv-parse, and indicate relax
Just to be sure relax_quotes works with the current library, I tested the following code and it worked as expected:
const csv = require('csv-parse');
const fs = require('fs');
const parser = fs.createReadStream("70880341.csv").pipe(
csv.parse({ columns: true, relax_quotes: true, escape: '\\', ltrim: true, rtrim: true })
)
const records = [];
parser.on('readable', function() {
let record;
while ((record = parser.read()) !== null) {
records.push(record);
}
});
parser.on('error', function(err) {
console.error(err.message);
});
parser.on('end', function() {
console.log(records);
});
Result:
[{
sku: '032406021945-GreenBay',
description: 'TFAL B2080264 INIT TNS GRY SAUTE PN 8"',
productUnitOfMeasure: 'EA',
...
}]
For me full error message was: CsvError: Invalid Opening Quote: a quote is found on field 0 at line 1, value is "" (utf8 bom)
And adding "bom: true" parameter for csv.parse - helped.

How to apply regular expression for Javascript

I am trying to get message log from Azure application Insight like this
az monitor app-insights --app [app id] --analystics-query [condition like specific message id]
Then I got a message like this
"message": [
"Receiving message: {"type":"CTL","traceId":"f0d11b3dbf27b8fc57ac0e40c4ed9e48","spanId":"a5508acb0926fb1a","id":{"global":"GLkELDUjcRpP4srUt9yngY","caller":null,"local":"GLkELDUisjnGrSK5wKybht"},"eventVersion":"format version","timeStamp":"2021-10-01T14:55:59.8168722+07:00","eventMetadata":{"deleteTimeStamp":null,"ttlSeconds":null,"isFcra":null,"isDppa":true,"isCCPA":true,"globalProductId":null,"globalSubProductId":null,"mbsiProductId":null},"eventBody":{"sys":"otel","msg":"Testing Centralized Event Publisher with App1 (using logback)","app":{"name":"otel","service":"postHouse","status":"status name","method":"POST","protocol":"HTTP","resp_time_ms":"250","status_code":"4"},}}"
] }
So that I would like to apply Regular Expression for this message to get only the message from {"type.....to "status_code":"4"},}} and also convert it to JSON format
I have code like this in my .js file
Then('extract json from {string}', function(message){
message = getVal(message, this);
const getmess = message.match(/{(.*)}/g);
const messJson = JSON.parse(getmess);
console.log(messJson);
})
But it doesn't work for me
SyntaxError: Unexpected token \ in JSON at position 1
How can I apply this in my code on Javascript? Thank you so much for your help
Try this. But keep in mind, that current regex is binded with provided program output syntax. If output will be different in wrapper structure, this regex might not work any more.
// Text from app
const STDOUT = `
"message": [ "Receiving message: {"type":"CTL","traceId":"f0d11b3dbf27b8fc57ac0e40c4ed9e48","spanId":"a5508acb0926fb1a","id":{"global":"GLkELDUjcRpP4srUt9yngY","caller":null,"local":"GLkELDUisjnGrSK5wKybht"},"eventVersion":"format version","timeStamp":"2021-10-01T14:55:59.8168722+07:00","eventMetadata":{"deleteTimeStamp":null,"ttlSeconds":null,"isFcra":null,"isDppa":true,"isCCPA":true,"globalProductId":null,"globalSubProductId":null,"mbsiProductId":null},"eventBody":{"sys":"otel","msg":"Testing Centralized Event Publisher with App1 (using logback)","app":{"name":"otel","service":"postHouse","status":"status name","method":"POST","protocol":"HTTP","resp_time_ms":"250","status_code":"4"},}}"
] }
`;
// Match JSON part string
let JSONstr = /.*\[\s*\"Receiving message:\s*(.*?)\s*\"\s*]\s*}\s*$/.exec(STDOUT)[1];
// Remove trailing comma(s)
JSONstr = JSONstr.replace(/^(.*\")([^\"]+)$/, (s, m1, m2) => `${m1}${m2.replace(/\,/, "")}`);
// Convert to object
const JSONobj = JSON.parse(JSONstr);
// Result
console.log(JSONobj);
Try this one:
/.*?({"type":.*?,"status_code":"\d+"\})/
When used in Javascript, the part covered by the parentheses counts as Group 1, i.e.,:
const messJson = JSON.parse(message.match(/.*?({"type":.*?,"status_code":"\d+"\})/)[1]);
Reference here: https://regexr.com/66mf2

JavaScript - multiline template literals when source code has tabs in front

I am trying to create a JS multiline template literal like this:
function _on_message_arrived(_m) {
// Feedback.
console.log(
`FUNCTION: "_on_message_arrived()":
String (glyphs): ${_m.payloadString}
String (hex): ${_m.payloadBytes}`
);
}
But because there are tabs in 2nd and 3rd lines, these tabs are also printed in the browser's console (I colored them in red):
How can I format the JS source code so that it resembles what I get in multiple lines? That is when I am using tabs in the source code to indent the code and I am also using template literal.
You will have to resort to wrecking your code's indentation, unfortunately:
function _on_message_arrived(_m) {
// Feedback.
console.log(
`FUNCTION: "_on_message_arrived()":
String (glyphs): ${_m.payloadString}
String (hex): ${_m.payloadBytes}`
);
}
Get rid of whitespaces and try to make poor man's content management system.
const data = {
payloadString: '26.9',
payloadBytes: '50,54,46,57'
}
function _on_message_arrived(_m) {
const padStuff = (match, offset, string) => {
return Array(Number(match.replace('<', '').replace('>', ''))).fill('\t').join('');
}
// Feedback.
console.log(
`FUNCTION: "_on_message_arrived()":
<1>String (glyphs): ${_m.payloadString}
<2>String (hex): ${_m.payloadBytes}`.replace(/ +?/g, '').replace(/\<(?<year>\d+)\>/g, padStuff));
}
_on_message_arrived(data);

Bug in rhino CodeGenerator Token.EXPR_RESULT?

I'm currently looking at the code for Rhino 1.7.5 and 1.7.6.
In CodeGenerator.java is this code (line 380+):
case Token.EXPR_VOID:
case Token.EXPR_RESULT:
updateLineNumber(node);
visitExpression(child, 0);
addIcode((type == Token.EXPR_VOID) ? Icode_POP : Icode_POP_RESULT);
stackChange(-1);
break;
child is (line 232):
Node child = node.getFirstChild();
ExpressionStatement is the node which triggers the case above
but it never calls addChildToBack() which would set first to anything.
So when the code above is executed, child is null and I get a NullPointerException in CodeGenerator.visitExpression(Node, int)
I can't see how this code could ever work. But at the same time, it's such a core feature that I can't imagine how people could have missed it for 6 years.
[EDIT] I managed to create a test case:
import static org.junit.Assert.*;
import org.junit.Test;
import org.mozilla.javascript.CompilerEnvirons;
import org.mozilla.javascript.Interpreter;
import org.mozilla.javascript.Parser;
import org.mozilla.javascript.ast.ScriptNode;
public class RhinoTest {
#Test
public void testCompileExpression() throws Exception {
String expression = "row[\"COL_Col1\"]";
CompilerEnvirons compilerEnv = new CompilerEnvirons();
Parser p = new Parser( compilerEnv, compilerEnv.getErrorReporter() );
ScriptNode script = p.parse( expression, null, 0 );
Interpreter compiler = new Interpreter( );
Object compiledOb = compiler.compile( compilerEnv, script, null, false );
assertNotNull( compiledOb );
}
}
If I run this, I get this exception:
java.lang.NullPointerException
at org.mozilla.javascript.CodeGenerator.visitExpression(CodeGenerator.java:497)
at org.mozilla.javascript.CodeGenerator.visitStatement(CodeGenerator.java:383)
at org.mozilla.javascript.CodeGenerator.visitStatement(CodeGenerator.java:276)
at org.mozilla.javascript.CodeGenerator.generateICodeFromTree(CodeGenerator.java:113)
at org.mozilla.javascript.CodeGenerator.compile(CodeGenerator.java:83)
at org.mozilla.javascript.Interpreter.compile(Interpreter.java:194)
at com.avanon.basic.birt.RhinoTest.testCompileExpression(RhinoTest.java:21)
With the introduction of the AST API, code generation needs an additional step to convert the "raw" parse tree into something suitable for codegen.
To fix the test case above, change the line:
ScriptNode script = p.parse( expression, null, 0 );
into:
ScriptNode ast = p.parse( expression, null, 0 );
IRFactory irf = new IRFactory(compilerEnv, compilerEnv.getErrorReporter());
ScriptNode tree = irf.transformTree(ast);
You can also find examples how to prepare for codegen in Context.compileImpl()

How to get Abstract Syntax Tree (AST) out of JISON parser?

So I have generated a parser via JISON:
// mygenerator.js
var Parser = require("jison").Parser;
// a grammar in JSON
var grammar = {
"lex": {
"rules": [
["\\s+", "/* skip whitespace */"],
["[a-f0-9]+", "return 'HEX';"]
]
},
"bnf": {
"hex_strings" :[ "hex_strings HEX",
"HEX" ]
}
};
// `grammar` can also be a string that uses jison's grammar format
var parser = new Parser(grammar);
// generate source, ready to be written to disk
var parserSource = parser.generate();
// you can also use the parser directly from memory
// returns true
parser.parse("adfe34bc e82a");
// throws lexical error
parser.parse("adfe34bc zxg");
My question is, how do I retrieve the AST now? I can see that I can run the parser against input, but it just returns true if it works or fails if not.
For the record, I am using JISON: http://zaach.github.com/jison/docs/
I discovered an easier and cleaner way than the one in the other answer.
This post is divided into 2 parts:
General way: Read how to implement my way.
Actual answer: An implementation of the previously described way specific to OP's request.
General way
Add a return statement to your start rule.
Example:
start
: xyz EOF
{return $1;}
;
xyz is another production rule. $1 accesses the value of the first symbol (either terminal or non-terminal) of the associated production rule. In the above code $1 contains the result from xyz.
Add $$ = ... statements to all other rules.
Warning: Use $$ = ..., don't return! return will immediately abort further execution by returning the specified value, as the name indicates.
Example:
multiplication
: variable '*' variable
{$$ = {
type: 'multiplication',
arguments: [
$1,
$3
]
};
}
;
The above production rule will pass the object $$ to the higher level (i.e. the production rule which used this rule).
Let's complement the multiplication rule in order to achieve a runnable example:
/* lexical grammar */
%lex
%%
\s+ /* skip whitespace */
[0-9]+("."[0-9]+)?\b return 'NUMBER'
[a-zA-Z]+ return 'CHARACTER'
"*" return '*'
<<EOF>> return 'EOF'
. return 'INVALID'
/lex
%start start
%% /* language grammar */
start
: multiplication EOF
{return $1;}
;
multiplication
: variable '*' variable
{$$ = {
type: 'multiplication',
arguments: [
$1,
$3
]
};
}
;
variable
: 'NUMBER'
{$$ = {
type: 'number',
arguments: [$1]
};
}
| 'CHARACTER'
{$$ = {
type: 'character',
arguments: [$1]
};
}
;
You can try it online: http://zaach.github.io/jison/try/. At the time of this edit (12.02.2017), the online generator sadly throws an error - independently of the Jison file you feed in. See the addendum after step 3 for hints on how to generate the parser on your local machine.
If you input for example a*3, you get the object structure below:
{
"type": "multiplication",
"arguments": [
{
"type": "character",
"arguments": ["a"]
},
{
"type": "number",
"arguments": ["3"]
}
]
}
Clean the code and generated AST by injecting custom objects
When using the Jison-generated parser, you can inject arbitrary objects into the scope of the 'code blocks' in the syntax file:
const MyParser = require('./my-parser.js');
MyParser.parser.yy = {
MultiplicationTerm
/*, AdditionTerm, NegationTerm etc. */
};
let calculation = MyParser.parse("3*4");
// Using the modification below, calculation will now be an object of type MultiplicationTerm
If MultiplicationTerm had a constructor accepting both factors, the new part for multiplication would look like this:
multiplication
: variable '*' variable
{$$ = new yy.MultiplicationTerm($1, $3);}
;
Addendum on how to create the Jison parser:
Download the Jison NPM module. Then you can create the Jison-parser either by using Jison's command-line or running new jison.Generator(fileContents).generate() in your build file and write the returned string to your preferred file, e.g. my-parser.js.
Actual answer
Applying the rules above leads to the Jison file below.
The Jison file format and the JavaScript API (as stated in the question) are interchangeable as far as I know.
Also note that this Jison file only produces a flat tree (i.e. a list) since the input format is only a list as well (or how would you nest concatenated hex strings in a logical way?).
/* lexical grammar */
%lex
%%
\s+ /* skip whitespace */
[a-f0-9]+ return 'HEX'
<<EOF>> return 'EOF'
. return 'INVALID'
/lex
%start start
%% /* language grammar */
start
: hex_strings EOF
{return $1;}
;
hex_strings
: hex_strings HEX
{$$ = $1.concat([$2]);}
| HEX
{$$ = [$1];}
;
I'm not too familiar with Jison's inner workings, so I don't know any method that would do it.
But in case you're interested in a little bruteforce to solve this problem, try this:
First, create an object to hold the AST
function jisonAST(name, x) { this.name = name; this.x = x; }
// return the indented AST
jisonAST.prototype.get = function(indent){
// create an indentation for level l
function indentString(l) { var r=""; for(var i=0;i<l;i++){r+=" "}; return r }
var r = indentString(indent) + "["+this.name+": ";
var rem = this.x;
if( rem.length == 1 && !(rem[0] instanceof jisonAST) ) r += "'"+rem[0]+"'";
else for( i in rem ){
if( rem[i] instanceof jisonAST ) r += "\n" + rem[i].get(indent+1);
else { r += "\n" + indentString(indent+1); r += "'"+rem[i]+"'"; }
}
return r + "]";
}
Add a little helper function for Jison's BNF
function o( s ){
r = "$$ = new yy.jisonAST('"+s+"',[";
for( i = 1; i <= s.split(" ").length; i++ ){ r += "$"+i+"," }
r = r.slice(0,-1) + "]);";
return [s,r];
}
With this, continue to the example code (slight modification):
var Parser = require("jison").Parser;
// a grammar in JSON
var grammar = {
"lex": {
"rules": [
["\\s+", "/* skip whitespace */"],
["[a-f0-9]+", "return 'HEX';"]
]
},
"bnf": {
// had to add a start/end, see below
"start" : [ [ "hex_strings", "return $1" ] ],
"hex_strings" :[
o("hex_strings HEX"),
o("HEX")
]
}
};
var parser = new Parser(grammar);
// expose the AST object to Jison
parser.yy.jisonAST = jisonAST
Now you can try parsing:
console.log( parser.parse("adfe34bc e82a 43af").get(0) );
This will give you:
[hex_strings HEX:
[hex_strings HEX:
[HEX: 'adfe34bc']
'e82a']
'43af']
Small note: I had to add a "start" rule, in order to only have one statement that returns the result. It is not clean (since the BNF works fine without it). Set it as an entry point to be sure...

Categories