Parse JavaScript to instrument code - javascript

I need to split a JavaScript file into single instructions. For example
a = 2;
foo()
function bar() {
b = 5;
print("spam");
}
has to be separated into three instructions. (assignment, function call and function definition).
Basically I need to instrument the code, injecting code between these instructions to perform checks. Splitting by ";" wouldn't obviously work because you can also end instructions with newlines and maybe I don't want to instrument code inside function and class definitions (I don't know yet). I took a course about grammars with flex/Bison but in this case the semantic action for this rule would be "print all the descendants in the parse tree and put my code at the end" which can't be done with basic Bison I think. How do I do this? I also need to split the code because I need to interface with Python with python-spidermonkey.
Or... is there a library out there already which saves me from reinventing the wheel? It doesn't have to be in Python.

Why not use a JavaScript parser? There are lots, including a Python API for ANTLR and a Python wrapper around SpiderMonkey.

JavaScript is tricky to parse; you need a full JavaScript parser.
The DMS Software Reengineering Toolkit can parse full JavaScript and build a corresponding AST.
AST operators can then be used to walk over the tree to "split it". Even easier, however, is to apply source-to-source transformations that look for one surface syntax (JavaScript) pattern, and replace it by another. You can use such transformations to insert the instrumentation into the code, rather than splitting the code to make holds in which to do the insertions. After the transformations are complete, DMS can regenerate valid JavaScript code (complete with the orignal comments if unaffected).

Why not use an existing JavaScript interpreter like Rhino (Java) or python-spidermonkey (not sure whether this one is still alive)? It will parse the JS and then you can examine the resulting parse tree. I'm not sure how easy it will be to recreate the original code but that mostly depends on how readable the instrumented code must be. If no one ever looks at it, just generate a really compact form.
pyjamas might also be of interest; this is a Python to JavaScript transpiler.
[EDIT] While this doesn't solve your problem at first glance, you might use it for a different approach: Instead of instrumenting JavaScript, write your code in Python instead (which can be easily instrumented; all the tools are already there) and then convert the result to JavaScript.
Lastly, if you want to solve your problem in Python but can't find a parser: Use a Java engine to add comments to the code which you can then search for in Python to instrument the code.

Why not try a javascript beautifier?
For example http://jsbeautifier.org/
Or see Command line JavaScript code beautifier that works on Windows and Linux

Forget my parser. https://bitbucket.org/mvantellingen/pyjsparser is great and complete parser. I've fixed a couple of it's bugs here: https://bitbucket.org/nullie/pyjsparser

Related

How does atom text editor parse / tokenise code? (syntax-highlighting)

So CodeMirror uses modes to tokenise its code.
It breaks up the document into lines and makes each line a stream, which is then put through into the pre-defined mode. It can span multiple lines by using its state parameter.
It seems ACE has a similar method.
Neither of these methods use RegExp inherently (but obviously whomever creates the mode can code in RegExp into their mode).
From what I've read of Atom's code and style, is that it calls different syntax highlighters grammars and they resemble closely the grammars from TextMate.
These grammars resemble JSON objects which contain classnames and RegExps (see how to write a TextMate grammar).
I can't figure out for the life of me how exactly Atom Text Editor actually performs the parsing of code, keeping its state and also extending through various scopes.
If someone could point me in the right direction that would be great.
You're probably better of asking your question in the Atom forums, since they are frequented by the Atom developers.
The question was answered here.
Atom uses its first-mate module, which relies on oniguruma for parsing Regular Expressions.

ECMAScript pull parser

There seems to be a lot of resources for XML pull parsing, but is it possible to build a pull parser for JavaScript? Why is this not something people pursue? Pull parsing enables to stream the file while parsing it, which allows for infinitely sized files (for example) and concurrent use.
The problem I encounter is that I need to divide the code into certain small units. I thought statements would be a good way to split the code. Each call to the pull parser would yield another statement (or function declaration). However this goes wrong with function expressions. They require to split the statements up because each statement could contain a function with more statements.
How would I go about implementing such a parser? Or do you think this is an unwise design?
I'm trying to build a fast minifier.
EDIT: see http://www.infoq.com/articles/HIgh-Performance-Parsers-in-Java-V2 for more info on sequential access parsers. They only describe JSON and XML...
Also see https://github.com/qfox/zeparser2 for a fast streaming JS parser.
EDIT2:
I can think of a few options:
return each grammar type, even nested ones. So (most) tokens will be returned multiple times in different grammars (like an expression inside a statement). So for example you first return the statement 'var a = b + c;' and then return the expression 'b + c'. So as caller you can check if the returned grammar is a var-statement and do something with that...
work with event function, this is push-parsing. Like call the var-statement handler, or expression handler.
full blown AST generation with early return?

ternJS - Generate JSON type definition file

ternJS have several. JSON files defs which contains the definition of librarys. Can someone explain to me how I can best generate my own to my javascript libraries / or only definition objects?
I can not see that there is no common procedure for this?
There's a tool for this included in Tern. See condense at http://ternjs.net/doc/manual.html#utils . It runs Tern on your file and tries to output the types that it finds. It's far from flawless, but for simple programs it works well. For files with a complicated structure or interface, you'll often have to hand-write the definitions.
There are three ways I have thought about to solve your problem:
Using Abstract Syntax Tree Parser and Visitor
One way to solve your problem would be to use abstract syntax tree parser and visitor in order to automate the task of scanning through the code and documenting it.
The resources here will be of help:
-http://ramkulkarni.com/blog/understanding-ast-created-by-mozilla-rhino-parser/
-What is JavaScript AST, how to play with it?
You usually use a parser to retrieve a tree, and then use a visitor to visit all the nodes and do your work within there.
You will essentially have a tree representing the specific library and then you must write the code to store this in the def format you link to.
Getting a Documentation Generator and Modifying
Another idea is to download the source code for a documentation generator, e.g. https://github.com/yui/yuidoc/
By modifying the styling/output format you can generate "documentation" in the appropriate json format.
Converting Existing Documentation (HTML doc) into JSON
You can make a parser that takes a standard documentation format (I'm sure as Javadoc is one for java there should be one for javascript), and write a converter that exctracts the relevant information and stores in a JSON definition.

Check is text valid javascript using javascript without using eval()

Is there any way to do this? Or limit execution time of eval()(e.g. not more then 1 scecond)
You could try one of the minifiers, such as UglifyJS. They all include a parser which might be fairly easy to extract (UglifyJS contains a file called "parse-js.js", although I haven't looked at it in detail).

Are there any javascript frameworks for parsing/auto-completing a domain specific language?

I have a grammar for a domain specific language, and I need to create a javascript code editor for that language. Are there any tools that would allow me to generate
a) a javascript incremental parser
b) a javascript auto-complete / auto-suggest engine?
Thanks!
An Example of implementing content assist (auto-complete)
using Chevrotain Javascript Parsing DSL:
https://github.com/SAP/chevrotain/tree/master/examples/parser/content_assist
Chevrotain was designed specifically to build parsers used (as part of) language services tools in Editors/IDEs.
Some of the relevant features are:
Automatic Error Recovery / Fault tolerance because editors and IDEs need to be able to handle 'mostly valid' inputs.
Every Grammar rule may be used as the starting rule as an Editor/IDE may only want to implement incremental parsing for performance reasons.
You may want jison, a js parser generator. In terms of auto-complete / auto-suggest...most of the stuff out there I know if more based on word completion rather than code completion. But once you have a parser running I don't think that part is too difficult..
This is difficult. I'm doing the same sort of thing myself.
One approach is:
You need is a parser which will give you an array of the currently possible ASTs for the text up until the token before the current cursor position.
From there you can see the next token can be of a number of types (usually just one), and do the completion, based on the partial text.
If I ever get my incremental parser working, I'll send a link.
Good luck, and let me know if you find a package which does this.
Chris.

Categories