Using Rhino parser in javascript code to parse strings in javascript

Using Rhino parser in javascript code to parse strings in javascript - javascript

I am new to Rhino parser. Can i use this rhino parser in javascript code to extract the Abstract Syntax Tree of javascript code in any html file. If so ho should i start this.This is for Analyzing AST of the code for computing the ratio between keywords and words used in javascript, to identify common decryption schemes, and to calculate the occurrences of certain classes of function calls such as fromCharCode(), eval(),and some string functions that are commonly used for the decryption
and execution of drive-by-download exploits.

As far as I know, you can't access the AST from JavaScript in Rhino. I would look at the Esprima parser though. It's a complete JavaScript parser written in JavaScript and it has a simple API for doing code analysis.
Here's a simple example that calculates the keyword to identifier ratio:
var tokens = esprima.parse(script, { tokens: true }).tokens;
var identifierCount = 0;
var keywordCount = 0;
tokens.forEach(function (token) {
if (token.type === 'Keyword') {
keywordCount++;
}
else if (token.type === 'Identifier') {
identifierCount++;
}
});
var ratio = keywordCount / identifierCount;

Related

Creating a DSL expressions parser / rules engine

I'm building an app which has a feature for embedding expressions/rules in a config yaml file. So for example user can reference a variable defined in yaml file like ${variables.name == 'John'} or ${is_equal(variables.name, 'John')}. I can probably get by with simple expressions but I want to support complex rules/expressions such ${variables.name == 'John'} and (${variables.age > 18} OR ${variables.adult == true})
I'm looking for a parsing/dsl/rules-engine library that can support these type of expressions and normalize it. I'm open using ruby, javascript, java, or python if anyone knows of a library for that languages.
One option I thought of was to just support javascript as conditons/rules and basically pass it through eval with the right context setup with access to variables and other reference-able vars.

I don't know if you use Golang or not, but if you use it, I recommend this https://github.com/antonmedv/expr.
I have used it for parsing bot strategy that (stock options bot). This is from my test unit:
func TestPattern(t *testing.T) {
a := "pattern('asdas asd 12dasd') && lastdigit(23asd) < sma(50) && sma(14) > sma(12) && ( macd(5,20) > macd_signal(12,26,9) || macd(5,20) <= macd_histogram(12,26,9) )"
r, _ := regexp.Compile(`(\w+)(\s+)?[(]['\d.,\s\w]+[)]`)
indicator := r.FindAllString(a, -1)
t.Logf("%v\n", indicator)
t.Logf("%v\n", len(indicator))
for _, i := range indicator {
t.Logf("%v\n", i)
if strings.HasPrefix(i, "pattern") {
r, _ = regexp.Compile(`pattern(\s+)?\('(.+)'\)`)
check1 := r.ReplaceAllString(i, "$2")
t.Logf("%v\n", check1)
r, _ = regexp.Compile(`[^du]`)
check2 := r.FindAllString(check1, -1)
t.Logf("%v\n", len(check2))
} else if strings.HasPrefix(i, "lastdigit") {
r, _ = regexp.Compile(`lastdigit(\s+)?\((.+)\)`)
args := r.ReplaceAllString(i, "$2")
r, _ = regexp.Compile(`[^\d]`)
parameter := r.FindAllString(args, -1)
t.Logf("%v\n", parameter)
} else {
}
}
}
Combine it with regex and you have good (if not great, string translator).
And for Java, I personally use https://github.com/ridencww/expression-evaluator but not for production. It has similar feature with above link.
It supports many condition and you don't have to worry about Parentheses and Brackets.
Assignment =
Operators + - * / DIV MOD % ^
Logical < <= == != >= > AND OR NOT
Ternary ? :
Shift << >>
Property ${<id>}
DataSource #<id>
Constants NULL PI
Functions CLEARGLOBAL, CLEARGLOBALS, DIM, GETGLOBAL, SETGLOBAL
NOW PRECISION
Hope it helps.

You might be surprised to see how far you can get with a syntax parser and 50 lines of code!
Check this out. The Abstract Syntax Tree (AST) on the right represents the code on the left in nice data structures. You can use these data structures to write your own simple interpreter.
I wrote a little example of one:
https://codesandbox.io/s/nostalgic-tree-rpxlb?file=/src/index.js
Open up the console (button in the bottom), and you'll see the result of the expression!
This example can only handle (||) and (>), but looking at the code (line 24), you can see how you could make it support any other JS operator. Just add a case to the branch, evaluate the sides, and do the calculation on JS.
Parenthesis and operator precedence are all handled by the parser for you.
I'm not sure if this is the solution for you, but it will for sure be fun ;)

One option I thought of was to just support javascript as
conditons/rules and basically pass it through eval with the right
context setup with access to variables and other reference-able vars.
I would personally lean towards something like this. If you are getting into complexities such as logic comparisons, a DSL can become a beast since you are basically almost writing a compiler and a language at that point. You might want to just not have a config, and instead have the configurable file just be JavaScript (or whatever language) that can then be evaluated and then loaded. Then whoever your target audience is for this "config" file can just supplement logical expressions as needed.
The only reason I would not do this is if this configuration file was being exposed to the public or something, but in that case security for a parser would also be quite difficult.

I did something like that once, you can probably pick it up and adapt it to your needs.
TL;DR: thanks to Python's eval, you doing this is a breeze.
The problem was to parse dates and durations in textual form. What I did was to create a yaml file mapping regex pattern to the result. The mapping itself was a python expression that would be evaluated with the match object, and had access to other functions and variables defined elsewhere in the file.
For example, the following self-contained snippet would recognize times like "l'11 agosto del 1993" (Italian for "August 11th, 1993,).
__meta_vars__:
month: (gennaio|febbraio|marzo|aprile|maggio|giugno|luglio|agosto|settembre|ottobre|novembre|dicembre)
prep_art: (il\s|l\s?'\s?|nel\s|nell\s?'\s?|del\s|dell\s?'\s?)
schema:
date: http://www.w3.org/2001/XMLSchema#date
__meta_func__:
- >
def month_to_num(month):
""" gennaio -> 1, febbraio -> 2, ..., dicembre -> 12 """
try:
return index_in_or(meta_vars['month'], month) + 1
except ValueError:
return month
Tempo:
- \b{prep_art}(?P<day>\d{{1,2}}) (?P<month>{month}) {prep_art}?\s*(?P<year>\d{{4}}): >
'"{}-{:02d}-{:02d}"^^<{schema}>'.format(match.group('year'),
month_to_num(match.group('month')),
int(match.group('day')),
schema=schema['date'])
__meta_func__ and __meta_vars (not the best names, I know) define functions and variables that are accessible to the match transformation rules. To make the rules easier to write, the pattern is formatted by using the meta-variables, so that {month} is replaced with the pattern matching all months. The transformation rule calls the meta-function month_to_num to convert the month to a number from 1 to 12, and reads from the schema meta-variable. On the example above, the match results in the string "1993-08-11"^^<http://www.w3.org/2001/XMLSchema#date>, but some other rules would produce a dictionary.
Doing this is quite easy in Python, as you can use exec to evaluate strings as Python code (obligatory warning about security implications). The meta-functions and meta-variables are evaluated and stored in a dictionary, which is then passed to the match transformation rules.
The code is on github, feel free to ask any questions if you need clarifications. Relevant parts, slightly edited:
class DateNormalizer:
def _meta_init(self, specs):
""" Reads the meta variables and the meta functions from the specification
:param dict specs: The specifications loaded from the file
:return: None
"""
self.meta_vars = specs.pop('__meta_vars__')
# compile meta functions in a dictionary
self.meta_funcs = {}
for f in specs.pop('__meta_funcs__'):
exec f in self.meta_funcs
# make meta variables available to the meta functions just defined
self.meta_funcs['__builtins__']['meta_vars'] = self.meta_vars
self.globals = self.meta_funcs
self.globals.update(self.meta_vars)
def normalize(self, expression):
""" Find the first matching part in the given expression
:param str expression: The expression in which to search the match
:return: Tuple with (start, end), category, result
:rtype: tuple
"""
expression = expression.lower()
for category, regexes in self.regexes.iteritems():
for regex, transform in regexes:
match = regex.search(expression)
if match:
result = eval(transform, self.globals, {'match': match})
start, end = match.span()
return (first_position + start, first_position + end) , category, result

Here are some categorized Ruby options and resources:
Insecure
Pass expression to eval in the language of your choice.
It must be mentioned that eval is technically an option, but extraordinary trust must exist in its inputs and it is safer to avoid it altogether.
Heavyweight
Write a parser for your expressions and an interpreter to evaluate them
A cost-intensive solution would be implementing your own expression language. That is, to design a lexicon for your expression language, implement a parser for it, and an interpreter to execute the code that's parsed.
Some Parsing Options (ruby)
Parslet
TreeTop
Citrus
Roll-your-own with StringScanner
Medium Weight
Pick an existing language to write expressions in and parse / interpret those expressions.
This route assumes you can pick a known language to write your expressions in. The benefit is that a parser likely already exists for that language to turn it into an Abstract Syntax Tree (data structure that can be walked for interpretation).
A ruby example with the Parser gem
require 'parser'
class MyInterpreter
# https://whitequark.github.io/ast/AST/Processor/Mixin.html
include ::Parser::AST::Processor::Mixin
def on_str(node)
node.children.first
end
def on_int(node)
node.children.first.to_i
end
def on_if(node)
expression, truthy, falsey = *node.children
if process(expression)
process(truthy)
else
process(falsey)
end
end
def on_true(_node)
true
end
def on_false(_node)
false
end
def on_lvar(node)
# lookup a variable by name=node.children.first
end
def on_send(node, &block)
# allow things like ==, string methods? whatever
end
# ... etc
end
ast = Parser::ConcurrentRuby.parse(<<~RUBY)
name == 'John' && adult
RUBY
MyParser.new.process(ast)
# => true
The benefit here is that a parser and syntax is predetermined and you can interpret only what you need to (and prevent malicious code from executing by controller what on_send and on_const allow).
Templating
This is more markup-oriented and possibly doesn't apply, but you could find some use in a templating library, which parses expressions and evaluates for you. Control and supplying variables to the expressions would be possible depending on the library you use for this. The output of the expression could be checked for truthiness.
Liquid
Jinja

Some toughs and things you should consider.
1. Unified Expression Language (EL),
Another option is EL, specified as part of the JSP 2.1 standard (JSR-245). Official documentation.
They have some nice examples that can give you a good overview of the syntax. For example:
El Expression: `${100.0 == 100}` Result= `true`
El Expression: `${4 > 3}` Result= `true`
You can use this to evaluate small script-like expressions. And there are some implementations: Juel is one open source implementation of the EL language.
2. Audience and Security
All the answers recommend using different interpreters, parser generators. And all are valid ways to add functionality to process complex data. But I would like to add an important note here.
Every interpreter has a parser, and injection attacks target those parsers, tricking them to interpret data as commands. You should have a clear understanding how the interpreter's parser works, because that's the key to reduce the chances to have a successful injection attack Real world parsers have many corner cases and flaws that may not match the specs. And have clear the measures to mitigate possible flaws.
And even if your application is not facing the public. You can have external or internal actors that can abuse this feature.

I'm building an app which has a feature for embedding expressions/rules in a config yaml file.
I'm looking for a parsing/dsl/rules-engine library that can support these type of expressions and normalize it. I'm open using ruby, javascript, java, or python if anyone knows of a library for that languages.
One possibility might be to embed a rule interpreter such as ClipsRules inside your application. You could then code your application in C++ (perhaps inspired by my clips-rules-gcc project) and link to it some C++ YAML library such as yaml-cpp.
Another approach could be to embed some Python interpreter inside a rule interpreter (perhaps the same ClipsRules) and some YAML library.
A third approach could be to use Guile (or SBCL or Javascript v8) and extend it with some "expert system shell".
Before starting to code, be sure to read several books such as the Dragon Book, the Garbage Collection handbook, Lisp In Small Pieces, Programming Language Pragmatics. Be aware of various parser generators such as ANTLR or GNU bison, and of JIT compilation libraries like libgccjit or asmjit.
You might need to contact a lawyer about legal compatibility of various open source licenses.

Assert simple conditions in Javascript without using eval

I'm trying to build a Javascript program that can interpret simple conditions :
let condition = '5 + 6 > 10';
if( assert(condition) ){
//... do something
} else {
// ... do other thing
}
A simple but unsecure way to implement assert is to simply use eval. It works, but opens a security hole. How could I work around this ?
I need support for addition and substraction, string and numbers comparisons, and parenthesis management.

There are several options:
you can write a tokenizer and parser for your expressions by hand. For a simple grammar like this, this isn't complicated and is actually a very good exercise in programming. Look up "shunting yard algorithm" or "recursive descent parser".
alternatively, you can use a parser generator like peg or nearley. With a generator, you write your grammar in their specific language and let the generator create a parsing function for you. Check out their examples: https://pegjs.org/online , https://github.com/kach/nearley/tree/master/examples/calculator
finally, you can use a full-blown Javascript parser like esprima or acorn. This way you'll get a complete javascript AST, which you can walk and perform calculations as you go. This method is far more complicated that the others, but gives you the full power of javascript in your expressions, e.g. functions, regexes etc.

If the interpreter supports toString() on functions, you can do this for example:
function test (f) {
let source = f.toString().replace(/^\(\) => /,'');
if (f()) {
console.log(source+" succeeded");
} else {
console.log(source+" failed");
}
}
let condition = () => 5 + 6 > 10;
test(condition)
test(() => 5 + 4 > 10)

Traversing JavaScript AST (Esprima.Net) to Tree in C#

I used Esprima.Net (https://github.com/Diullei/Esprima.NET) to get AST (Abstract Syntax Key) from a JavaScript code. It returns a List<Dynamic> consisting many child and sub-child nodes. I wonder how best to traverse all these nodes in C# for analysis. Basically I want to get the function name, variable name & function that it is under.
For example, in the following JavaScript code:
var y = 45;
function fTest(d)
{
var key: Argument.Callee;
var cars = 'Hello';
for (i = 0; i < cars.length; i++)
{
text += cars[i];
}
}
I wish to get the following result at the end:
variable: 45
function:parameter:'d'
function:variable:argument.callee
function:variable:'Hello'
funtion:loop:variable:object
I'm having a difficulty to traverse the List<Dynamic> given by Esprima.Net. Any ideas to process or traverse this list in a Tree or any structure so that I can access them? Thanks.

I ended up not using the Esprima.NET but Esprima JS (http://esprima.org/). I added Esprima JS in webpage and create an external javascript file that called Esprima parser to create AST. Once I had the AST, I used estraverse (https://github.com/estools/estraverse) to traverse the AST to get results.
Hope this helps others.

You can use Jint which is a JavaScript interpreter in .NET, and has an internal port of Esprima (ES5). It returns the same AST as Esprima.
Or you can use this other Esprima.NET that is based on ES6 and distributed separately from Jint.

How would you test a Scala.js library for equivalence of JVM and JS implementations?

I'm using Scala.js, and have written a trait that is implemented for both JVM and JS. I'm using third-party JVM and JS libraries to implement it in the two sides, which should provide functionally equivalent results in the JVM and browser. But, I need to write a test to verify that!
If I were just testing two vanilla Scala implementations, I'd know how to do it. I'd write generators of the trait's inputs, and drive each function from those, comparing the results of each. (I can assume that either the function results are booleans, integers, longs, strings, collections of same, or could be toString()'d.)
Is anyone out there doing this kind of testing?
How would I do this where one implementation is in Javascript? Phantom? (Can I pass a generated JS file to it, rather than simple JS-as-strings?) Something else?

You can use Scala's reflective toolbox in a macro to execute your test code at compilation time (on the JVM). You can then use the result and generate code that compares the value.
So we want to write a macro, that given the following code:
FuncTest.test { (1.0).toString }
Can generate something like this:
assert("1.0" == (1.0).toString)
This actually sounds harder than in is. Let's start with a macro skeleton for FuncTest:
import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context
object FuncTest {
def test[T](x: => T): Unit = macro FuncTestImpl.impl[T]
}
class FuncTestImpl(val c: Context) {
import c.universe._
def impl[T : WeakTypeTag](x: Tree): Tree = ???
}
Inside impl, we want to run the code in x and then generate an assertion (or whatever suits the test framework you use):
import scala.reflect.runtime.{universe => ru}
import scala.tools.reflect.ToolBox
def impl[T : WeakTypeTag](x: Tree): Tree = {
// Make a tool box (runtime compiler and evaluater)
val mirror = ru.runtimeMirror(getClass.getClassLoader)
val toolBox = mirror.mkToolBox()
// Import trees from compile time to runtime universe
val importer = ru.mkImporter(c.universe)
val tree = toolBox.untypecheck(importer.importTree(x))
// Evaluate expression and make a literal tree
val result = toolBox.eval(tree)
val resultTree = reifyLiteral(result)
// Emit assertion
q"assert($x == $resultTree)"
}
The only problem we have, is reifyLiteral. It basically is supposed to take an arbitrary value and create a literal out of it. This is hard / impossible in general. However, it is very easy for some basic values (primitives, strings, etc.):
/** Creates a literal tree out of a value (if possible) */
private def reifyLiteral(x: Any): Tree = x match {
case x: Int => q"$x"
case x: String => q"$x"
// Example for Seq
case x: Seq[_] =>
val elems = x.map(reifyLiteral)
q"Seq(..$elems)"
case _ =>
c.abort(c.enclosingPosition, s"Cannot reify $x of type ${x.getClass}")
}
That's it. You can now write:
FuncTest.test { /* your code */ }
To automatically generate tests for computational libraries.
Caveat The toolbox does not get the right classpath injected at the moment. So if you use an external library (which I assume you do), you will need to tweak that as well. Let me know if you need help there.

How to convert Csharp code directly in to javascript code

I am a newbie in Javascript.I want to convert a function written in c# to javascript and do the same functionality it was doing in c#. In that process I came across some online converters like duocode,sharpkit,JSIL,JSC,Script# that can do so but did not work. May be I am committing some mistake while operating
Here is the c# code I want to convert to a javascript function:
public static string Decrypt(string data)
{
var rsa = new RSACryptoServiceProvider();
var dataArray = data.Split(new char[] { ',' });
byte[] dataByte = new byte[dataArray.Length];
for (int i = 0; i < dataArray.Length; i++)
{
dataByte[i] = Convert.ToByte(dataArray[i]);
}
rsa.FromXmlString(_privateKey);
var decryptedByte = rsa.Decrypt(dataByte, false);
return _encoder.GetString(decryptedByte);
}
Any suggestions /help will be really appreciated.

What you want to do is not possible with the code you have. There are ways to convert code from one language to another, but only if the code is simple enough and does not use non-basic external libraries/classes. (i.e. the converter can convert loops or other basic logic).
Your code does not consist of any notable logic (except maybe the for-each loop), but only calls external libraries (rsyctypto et al) to do the actual job. In javascript those are definitely not default libraries, so no automated tool can help you there.
Instead google (use use stackoverflow) to search for a code snippet that does the same thing in javascript: encrypt a data using rsa in javascript (like RSA Encryption Javascript and Decrypt Java).

We Keep Coding

JavaScript is the programming language of the Web.

Using Rhino parser in javascript code to parse strings in javascript - javascript

Related

Creating a DSL expressions parser / rules engine

Assert simple conditions in Javascript without using eval

Traversing JavaScript AST (Esprima.Net) to Tree in C#

How would you test a Scala.js library for equivalence of JVM and JS implementations?

How to convert Csharp code directly in to javascript code

Categories

Resources