Related
I'm building an app which has a feature for embedding expressions/rules in a config yaml file. So for example user can reference a variable defined in yaml file like ${variables.name == 'John'} or ${is_equal(variables.name, 'John')}. I can probably get by with simple expressions but I want to support complex rules/expressions such ${variables.name == 'John'} and (${variables.age > 18} OR ${variables.adult == true})
I'm looking for a parsing/dsl/rules-engine library that can support these type of expressions and normalize it. I'm open using ruby, javascript, java, or python if anyone knows of a library for that languages.
One option I thought of was to just support javascript as conditons/rules and basically pass it through eval with the right context setup with access to variables and other reference-able vars.
I don't know if you use Golang or not, but if you use it, I recommend this https://github.com/antonmedv/expr.
I have used it for parsing bot strategy that (stock options bot). This is from my test unit:
func TestPattern(t *testing.T) {
a := "pattern('asdas asd 12dasd') && lastdigit(23asd) < sma(50) && sma(14) > sma(12) && ( macd(5,20) > macd_signal(12,26,9) || macd(5,20) <= macd_histogram(12,26,9) )"
r, _ := regexp.Compile(`(\w+)(\s+)?[(]['\d.,\s\w]+[)]`)
indicator := r.FindAllString(a, -1)
t.Logf("%v\n", indicator)
t.Logf("%v\n", len(indicator))
for _, i := range indicator {
t.Logf("%v\n", i)
if strings.HasPrefix(i, "pattern") {
r, _ = regexp.Compile(`pattern(\s+)?\('(.+)'\)`)
check1 := r.ReplaceAllString(i, "$2")
t.Logf("%v\n", check1)
r, _ = regexp.Compile(`[^du]`)
check2 := r.FindAllString(check1, -1)
t.Logf("%v\n", len(check2))
} else if strings.HasPrefix(i, "lastdigit") {
r, _ = regexp.Compile(`lastdigit(\s+)?\((.+)\)`)
args := r.ReplaceAllString(i, "$2")
r, _ = regexp.Compile(`[^\d]`)
parameter := r.FindAllString(args, -1)
t.Logf("%v\n", parameter)
} else {
}
}
}
Combine it with regex and you have good (if not great, string translator).
And for Java, I personally use https://github.com/ridencww/expression-evaluator but not for production. It has similar feature with above link.
It supports many condition and you don't have to worry about Parentheses and Brackets.
Assignment =
Operators + - * / DIV MOD % ^
Logical < <= == != >= > AND OR NOT
Ternary ? :
Shift << >>
Property ${<id>}
DataSource #<id>
Constants NULL PI
Functions CLEARGLOBAL, CLEARGLOBALS, DIM, GETGLOBAL, SETGLOBAL
NOW PRECISION
Hope it helps.
You might be surprised to see how far you can get with a syntax parser and 50 lines of code!
Check this out. The Abstract Syntax Tree (AST) on the right represents the code on the left in nice data structures. You can use these data structures to write your own simple interpreter.
I wrote a little example of one:
https://codesandbox.io/s/nostalgic-tree-rpxlb?file=/src/index.js
Open up the console (button in the bottom), and you'll see the result of the expression!
This example can only handle (||) and (>), but looking at the code (line 24), you can see how you could make it support any other JS operator. Just add a case to the branch, evaluate the sides, and do the calculation on JS.
Parenthesis and operator precedence are all handled by the parser for you.
I'm not sure if this is the solution for you, but it will for sure be fun ;)
One option I thought of was to just support javascript as
conditons/rules and basically pass it through eval with the right
context setup with access to variables and other reference-able vars.
I would personally lean towards something like this. If you are getting into complexities such as logic comparisons, a DSL can become a beast since you are basically almost writing a compiler and a language at that point. You might want to just not have a config, and instead have the configurable file just be JavaScript (or whatever language) that can then be evaluated and then loaded. Then whoever your target audience is for this "config" file can just supplement logical expressions as needed.
The only reason I would not do this is if this configuration file was being exposed to the public or something, but in that case security for a parser would also be quite difficult.
I did something like that once, you can probably pick it up and adapt it to your needs.
TL;DR: thanks to Python's eval, you doing this is a breeze.
The problem was to parse dates and durations in textual form. What I did was to create a yaml file mapping regex pattern to the result. The mapping itself was a python expression that would be evaluated with the match object, and had access to other functions and variables defined elsewhere in the file.
For example, the following self-contained snippet would recognize times like "l'11 agosto del 1993" (Italian for "August 11th, 1993,).
__meta_vars__:
month: (gennaio|febbraio|marzo|aprile|maggio|giugno|luglio|agosto|settembre|ottobre|novembre|dicembre)
prep_art: (il\s|l\s?'\s?|nel\s|nell\s?'\s?|del\s|dell\s?'\s?)
schema:
date: http://www.w3.org/2001/XMLSchema#date
__meta_func__:
- >
def month_to_num(month):
""" gennaio -> 1, febbraio -> 2, ..., dicembre -> 12 """
try:
return index_in_or(meta_vars['month'], month) + 1
except ValueError:
return month
Tempo:
- \b{prep_art}(?P<day>\d{{1,2}}) (?P<month>{month}) {prep_art}?\s*(?P<year>\d{{4}}): >
'"{}-{:02d}-{:02d}"^^<{schema}>'.format(match.group('year'),
month_to_num(match.group('month')),
int(match.group('day')),
schema=schema['date'])
__meta_func__ and __meta_vars (not the best names, I know) define functions and variables that are accessible to the match transformation rules. To make the rules easier to write, the pattern is formatted by using the meta-variables, so that {month} is replaced with the pattern matching all months. The transformation rule calls the meta-function month_to_num to convert the month to a number from 1 to 12, and reads from the schema meta-variable. On the example above, the match results in the string "1993-08-11"^^<http://www.w3.org/2001/XMLSchema#date>, but some other rules would produce a dictionary.
Doing this is quite easy in Python, as you can use exec to evaluate strings as Python code (obligatory warning about security implications). The meta-functions and meta-variables are evaluated and stored in a dictionary, which is then passed to the match transformation rules.
The code is on github, feel free to ask any questions if you need clarifications. Relevant parts, slightly edited:
class DateNormalizer:
def _meta_init(self, specs):
""" Reads the meta variables and the meta functions from the specification
:param dict specs: The specifications loaded from the file
:return: None
"""
self.meta_vars = specs.pop('__meta_vars__')
# compile meta functions in a dictionary
self.meta_funcs = {}
for f in specs.pop('__meta_funcs__'):
exec f in self.meta_funcs
# make meta variables available to the meta functions just defined
self.meta_funcs['__builtins__']['meta_vars'] = self.meta_vars
self.globals = self.meta_funcs
self.globals.update(self.meta_vars)
def normalize(self, expression):
""" Find the first matching part in the given expression
:param str expression: The expression in which to search the match
:return: Tuple with (start, end), category, result
:rtype: tuple
"""
expression = expression.lower()
for category, regexes in self.regexes.iteritems():
for regex, transform in regexes:
match = regex.search(expression)
if match:
result = eval(transform, self.globals, {'match': match})
start, end = match.span()
return (first_position + start, first_position + end) , category, result
Here are some categorized Ruby options and resources:
Insecure
Pass expression to eval in the language of your choice.
It must be mentioned that eval is technically an option, but extraordinary trust must exist in its inputs and it is safer to avoid it altogether.
Heavyweight
Write a parser for your expressions and an interpreter to evaluate them
A cost-intensive solution would be implementing your own expression language. That is, to design a lexicon for your expression language, implement a parser for it, and an interpreter to execute the code that's parsed.
Some Parsing Options (ruby)
Parslet
TreeTop
Citrus
Roll-your-own with StringScanner
Medium Weight
Pick an existing language to write expressions in and parse / interpret those expressions.
This route assumes you can pick a known language to write your expressions in. The benefit is that a parser likely already exists for that language to turn it into an Abstract Syntax Tree (data structure that can be walked for interpretation).
A ruby example with the Parser gem
require 'parser'
class MyInterpreter
# https://whitequark.github.io/ast/AST/Processor/Mixin.html
include ::Parser::AST::Processor::Mixin
def on_str(node)
node.children.first
end
def on_int(node)
node.children.first.to_i
end
def on_if(node)
expression, truthy, falsey = *node.children
if process(expression)
process(truthy)
else
process(falsey)
end
end
def on_true(_node)
true
end
def on_false(_node)
false
end
def on_lvar(node)
# lookup a variable by name=node.children.first
end
def on_send(node, &block)
# allow things like ==, string methods? whatever
end
# ... etc
end
ast = Parser::ConcurrentRuby.parse(<<~RUBY)
name == 'John' && adult
RUBY
MyParser.new.process(ast)
# => true
The benefit here is that a parser and syntax is predetermined and you can interpret only what you need to (and prevent malicious code from executing by controller what on_send and on_const allow).
Templating
This is more markup-oriented and possibly doesn't apply, but you could find some use in a templating library, which parses expressions and evaluates for you. Control and supplying variables to the expressions would be possible depending on the library you use for this. The output of the expression could be checked for truthiness.
Liquid
Jinja
Some toughs and things you should consider.
1. Unified Expression Language (EL),
Another option is EL, specified as part of the JSP 2.1 standard (JSR-245). Official documentation.
They have some nice examples that can give you a good overview of the syntax. For example:
El Expression: `${100.0 == 100}` Result= `true`
El Expression: `${4 > 3}` Result= `true`
You can use this to evaluate small script-like expressions. And there are some implementations: Juel is one open source implementation of the EL language.
2. Audience and Security
All the answers recommend using different interpreters, parser generators. And all are valid ways to add functionality to process complex data. But I would like to add an important note here.
Every interpreter has a parser, and injection attacks target those parsers, tricking them to interpret data as commands. You should have a clear understanding how the interpreter's parser works, because that's the key to reduce the chances to have a successful injection attack Real world parsers have many corner cases and flaws that may not match the specs. And have clear the measures to mitigate possible flaws.
And even if your application is not facing the public. You can have external or internal actors that can abuse this feature.
I'm building an app which has a feature for embedding expressions/rules in a config yaml file.
I'm looking for a parsing/dsl/rules-engine library that can support these type of expressions and normalize it. I'm open using ruby, javascript, java, or python if anyone knows of a library for that languages.
One possibility might be to embed a rule interpreter such as ClipsRules inside your application. You could then code your application in C++ (perhaps inspired by my clips-rules-gcc project) and link to it some C++ YAML library such as yaml-cpp.
Another approach could be to embed some Python interpreter inside a rule interpreter (perhaps the same ClipsRules) and some YAML library.
A third approach could be to use Guile (or SBCL or Javascript v8) and extend it with some "expert system shell".
Before starting to code, be sure to read several books such as the Dragon Book, the Garbage Collection handbook, Lisp In Small Pieces, Programming Language Pragmatics. Be aware of various parser generators such as ANTLR or GNU bison, and of JIT compilation libraries like libgccjit or asmjit.
You might need to contact a lawyer about legal compatibility of various open source licenses.
I'm following the Douglas Crockford's code convention, but I can't get the correct identation in JS mode in Emacs. I tried to customize the indent options of the mode, tried another modes like js3, but nothing seems to work.
When I have parenthesis, and I have to break the expression, Emacs indent like this:
this.offices.each(this.addOfficesToMap,
this);
While the convention that I'm following, says that I should leave just 4 spaces when an expression is broken up. So the indentation should look like:
this.offices.each(this.addOfficesToMap,
this);
Any idea of how I can change the indentation on broken up expressions?
The behaviour you want to change is hard-coded into a function called js--proper-indentation. An inelegant fix to your problem would be to replace the function in your .emacs:
(require 'cl)
(eval-after-load "js" '(defun js--proper-indentation (parse-status)
"Return the proper indentation for the current line."
(save-excursion
(back-to-indentation)
(cond ((nth 4 parse-status)
(js--get-c-offset 'c (nth 8 parse-status)))
((nth 8 parse-status) 0) ; inside string
((js--ctrl-statement-indentation))
((eq (char-after) ?#) 0)
((save-excursion (js--beginning-of-macro)) 4)
((nth 1 parse-status)
;; A single closing paren/bracket should be indented at the
;; same level as the opening statement. Same goes for
;; "case" and "default".
(let ((same-indent-p (looking-at
"[]})]\\|\\_<case\\_>\\|\\_<default\\_>"))
(continued-expr-p (js--continued-expression-p)))
(goto-char (nth 1 parse-status)) ; go to the opening char
(if (looking-at "[({[]\\s-*\\(/[/*]\\|$\\)")
(progn ; nothing following the opening paren/bracket
(skip-syntax-backward " ")
(when (eq (char-before) ?\)) (backward-list))
(back-to-indentation)
(cond (same-indent-p
(current-column))
(continued-expr-p
(+ (current-column) (* 2 js-indent-level)
js-expr-indent-offset))
(t
(+ (current-column) js-indent-level
(case (char-after (nth 1 parse-status))
(?\( js-paren-indent-offset)
(?\[ js-square-indent-offset)
(?\{ js-curly-indent-offset))))))
;; If there is something following the opening
;; paren/bracket, everything else should be indented at
;; the same level.
;; Modified code here:
(unless same-indent-p
(move-beginning-of-line 1)
(forward-char 4))
;; End modified code
(current-column))))
((js--continued-expression-p)
(+ js-indent-level js-expr-indent-offset))
(t 0)))) )
I have modified three lines of code towards the bottom of the function. If you want your indentation to be 8 chars instead of 4, change the (forward-char 4) line accordingly.
Note that js--proper-indentation (and the js library) requires the cl.el library, but that using eval-after-load mucks this up. So you need to explicitly require cl in your .emacs for this to work.
Note that this 'solution' hard codes a 4 space indentation only for the situation you indicate, and does not handle nested code at all. But knowing the point in the code that deals with your situation should at least point you towards the bit that needs work for a more sophisticated solution.
you can try https://github.com/mooz/js2-mode ...it's a fork js2-mode but with some impovements like good indentation...other way is read this article: http://mihai.bazon.net/projects/editing-javascript-with-emacs-js2-mode .. but sincerely it's better idea replace the old js2-mode ..it has several improvements https://github.com/mooz/js2-mode/wiki/Changes-from-the-original-mode ...hope this can help you...
You can file a feature request on js3-mode at https://github.com/thomblake/js3-mode/issues
Do you have a link to a style guide?
BTW, while the indentation conventions vary from language to language, and the preferences can even vary between users (such as in the above case), there is a fair bit of overlap and there are often ways to write your code such that there is little disagreement.
E.g. your above code could be written:
this.offices.each(
this.addOfficesToMap,
this
);
or
this.offices.each
(this.addOfficesToMap,
this);
and most indentation styles would largely agree on how to indent it.
According to JSHint, a Javascript programmer should not add a space after the first parenthesis and before the last one.
I have seen a lot of good Javascript libraries that add spaces, like this:
( foo === bar ) // bad according to JSHint
instead of this way:
(foo === bar) // good according to JSHint
Frankly, I prefer the first way (more spaces) because it makes the code more readable. Is there a strong reason to prefer the second way, which is recommended by JSHint?
This is my personal preference with reasons as to why.
I will discuss the following items in the accepted answer but in reverse order.
note-one not picking on Alnitak, these comments are common to us all...
note-two Code examples are not written as code blocks, because syntax highlighting deters from the actual question of whitespace only.
I've always done it that way.
Not only is this never a good reason to defend a practice in programming, but it also is never a good reason to defend ANY idea opposing change.
JS file download size matters [although minification does of course fix that]
Size will always matter for Any file(s) that are to be sent over-the-wire, which is why we have minification to remove unnecessary whitespace. Since JS files can now be reduced, the debate over whitespace in production code is moot.
moot: of little or no practical value or meaning; purely academic.
moot definition
Now we move on to the core issue of this question. The following ideas are mine only, and I understand that debate may ensue. I do not profess that this practice is correct, merely that it is currently correct for me. I am willing to discuss alternatives to this idea if it is sufficiently shown to be a poor choice.
It's perfectly readable and follows the vast majority of formatting conventions in Javascript's ancestor languages
There are two parts to this statement: "It's perfectly readable,"; "and follows the vast majority of formatting conventions in Javascript's ancestor languages"
The second item can be dismissed as to the same idea of I've always done it that way.
So let's just focus on the first part of the statement It's perfectly readable,"
First, let's make a few statements regarding code.
Programming languages are not for computers to read, but for humans to read.
In the English language, we read left to right, top to bottom.
Following established practices in English grammar will result in more easily read code by a larger percentage of programmers that code in English.
NOTE: I am establishing my case for the English language only, but may apply generally to many Latin-based languages.
Let's reduce the first statement by removing the adverb perfectly as it assumes that there can be no improvement. Let's instead work on what's left: "It's readable". In fact, we could go all JS on it and create a variable: "isReadable" as a boolean.
THE QUESTION
The question provides two alternatives:
( foo === bar )
(foo === bar)
Lacking any context, we could fault on the side of English grammar and go with the second option, which removes the whitespace. However, in both cases "isReadable" would easily be true.
So let's take this a step further and remove all whitespace...
(foo===bar)
Could we still claim isReadable to be true? This is where a boolean value might not apply so generally. Let's move isReadable to an Float where 0 is unreadable and 1 is perfectly readable.
In the previous three examples, we could assume that we would get a collection of values ranging from 0 - 1 for each of the individual examples, from each person we asked: "On a scale of 0 - 1, how would you rate the readability of this text?"
Now let's add some JS context to the examples...
if ( foo === bar ) { } ;
if(foo === bar){};
if(foo===bar){};
Again, here is our question: "On a scale of 0 - 1, how would you rate the readability of this text?"
I will make the assumption here that there is a balance to whitespace: too little whitespace and isReadable approaches 0; too much whitespace and isReadable approaches 0.
example: "Howareyou?" and "How are you ?"
If we continued to ask this question after many JS examples, we may discover an average limit to acceptable whitespace, which may be close to the grammar rules in the English language.
But first, let's move on to another example of parentheses in JS: the function!
function isReadable(one, two, three){};
function examineString(string){};
The two function examples follow the current standard of no whitespace between () except after commas. The next argument below is not concerned with how whitespace is used when declaring a function like the examples above, but instead the most important part of the readability of code: where the code is invoked!
Ask this question regarding each of the examples below...
"On a scale of 0 - 1, how would you rate the readability of this text?"
examineString(isReadable(string));
examineString( isReadable( string ));
The second example makes use of my own rule
whitespace in-between parentheses between words, but not between opening or closing punctuation.
i.e. not like this examineString( isReadable( string ) ) ;
but like this examineString( isReadable( string ));
or this examineString( isReadable({ string: string, thing: thing });
If we were to use English grammar rules, then we would space before the "(" and our code would be...
examineString (isReadable (string));
I am not in favor of this practice as it breaks apart the function invocation away from the function, which it should be part of.
examineString(); // yes; examineString (): // no;
Since we are not exactly mirroring proper English grammar, but English grammar does say that a break is needed, then perhaps adding whitespace in-between parentheses might get us closer to 1 with isReadable?
I'll leave it up to you all, but remember the basic question:
"Does this change make it more readable, or less?"
Here are some more examples in support of my case.
Assume functions and variables have already been declared...
input.$setViewValue(setToUpperLimit(inputValue));
Is this how we write a proper English sentence?
input.$setViewValue( setToUpperLimit( inputValue ));
closer to 1?
config.urls['pay-me-now'].initialize(filterSomeValues).then(magic);
or
config.urls[ 'pay-me-now' ].initialize( fitlerSomeValues ).then( magic );
(spaces just like we do with operators)
Could you imagine no whitespace around operators?
var hello='someting';
if(type===undefined){};
var string="I"+"can\'t"+"read"+"this";
What I do...
I space between (), {}, and []; as in the following examples
function hello( one, two, three ){
return one;
}
hello( one );
hello({ key: value, thing1: thing2 });
var array = [ 1, 2, 3, 4 ];
array.slice( 0, 1 );
chain[ 'things' ].together( andKeepThemReadable, withPunctuation, andWhitespace ).but( notTooMuch );
There are few if any technical reasons to prefer one over the other - the reasons are almost entirely subjective.
In my case I would use the second format, simply because:
It's perfectly readable, and follows the vast majority of formatting conventions in Javascript's ancestor languages
JS file download size matters [although minification does of course fix that]
I've always done it that way.
Quoting Code Conventions for the JavaScript Programming Language:
All binary operators except . (period) and ( (left parenthesis) and [ (left bracket) should be separated from their operands by a space.
and:
There should be no space between the name of a function and the ( (left parenthesis) of its parameter list.
I prefer the second format. However there are also coding style standards out there that insist on the first. Given the fact that javascript is often transmitted as source (e.g. any client-side code), one could see a slightly stronger case with it than with other languages, but only marginally so.
I find the second more readable, you find the first more readable, and since we aren't working on the same code we should each stick as we like. Were you and I to collaborate then it would probably be better that we picked one rather than mixed them (less readable than either), but while there have been holy wars on such matters since long before javascript was around (in other languages with similar syntax such as C), both have their merits.
I use the second (no space) style most of the time, but sometimes I put spaces if there are nested brackets - especially nested square brackets which for some reason I find harder to read than nested curved brackets (parentheses). Or to put that another way, I'll start any given expression without spaces, but if I find it hard to read I insert a few spaces to compare, and leave 'em in if they helped.
Regarding JS Hint, I wouldn't worry- this particular recommendation is more a matter of opinion. You're not likely to introduce bugs because of this one.
I used JSHint to lint this code snippet and it didn't give such an advice:
if( window )
{
var me = 'me';
}
I personally use no spaces between the arguments in parentheses and the parentheses themselves for one reason: I use keyboard navigation and keyboard shortcuts. When I navigate around the code, I expect the cursor to jump to the next variable name, symbol etc, but adding spaces messes things up for me.
It's just personal preference as it all gets converted to the same bytecode/binary at the end of the day!
Standards are important and we should follow them, but not blindly.
To me, this question is about that syntax styling should be all about readability.
this.someMethod(toString(value),max(value1,value2),myStream(fileName));
this.someMethod( toString( value ), max( value1, value2 ), myStream( fileName ) );
The second line is clearly more readable to me.
In the end, it may come down to personal preference, but I would ask those who prefer the 1st line if they really make their choice because "they are used it" or because they truly believe it's more readable.
If it's something you are used to, then a short time investment into a minor discomfort for a long term benefit might be worth the switch.
I've recently abandoned mouse-driven, platform-specific GUI editors and committed entirely to vim. The experience so far has been fantastic, but I'm stuck when it comes to Javascript.
The ever-popular taglist utility (using Exuberant Ctags) has been great for everything but Javascript. With the language's overly-free form and structure, taglist could only pick up a handful of functions when I opened it up -- only those defined in the format:
function FUNCNAME (arg1, arg2) {
but no variables or function objects defined like:
var myFunc = function (arg1, arg2) {
So I googled a bit and found the following definition set for ctags, which I put in my ~/.ctags file:
--langdef=js
--langmap=js:.js
--regex-js=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\{/\1/,object/
--regex-js=/([A-Za-z0-9._$()]+)[ \t]*[:=][ \t]*function[ \t]*\(/\1/,function/
--regex-js=/function[ \t]+([A-Za-z0-9._$]+)[ \t]*([^])])/\1/,function/
--regex-js=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\[/\1/,array/
--regex-js=/([^= ]+)[ \t]*=[ \t]*[^""]'[^'']*/\1/,string/
--regex-js=/([^= ]+)[ \t]*=[ \t]*[^'']"[^""]*/\1/,string/
After that, running ctags from the command line was fantastic. It found every function and object that I needed it to find.
The problem is that the taglist.vim plugin isn't seeing those new results. When I open my javascript file in vim and hit :TlistToggle, I get the exact same meager handful of functions I got before. I hit 'u' to update the list, with no effect.
Digging into taglist.vim, I found this:
" java language
let s:tlist_def_java_settings = 'java;p:package;c:class;i:interface;' .
\ 'f:field;m:method'
" javascript language
let s:tlist_def_javascript_settings = 'javascript;f:function'
...which implies we're only looking at one specific kind of output from the ctags utility for javascript. Unfortunately, I'm not savvy enough with taglist or vim in general (yet) to discover what change I can make to get all those wonderful ctags command-line results to show up in vim.
Help appreciated!
Got it! I dove into the taglist.vim code for awhile, and this is what I found:
taglist.vim forces ctags to use the same filetype that vim is using. So even though the ~/.ctags snippet I found via google is assigning my much-needed definitions to the new "js" language and applying it to files that end in .js, taglist is forcing ctags into using the "JavaScript" filetype that vim is using -- which is built right into ctags already.
The solution is to change the ~/.ctags file from what I've posted above to this:
--regex-JavaScript=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*new[ \t]+Object\(/\1/o,object/
--regex-JavaScript=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\{/\1/o,object/
--regex-JavaScript=/([A-Za-z0-9._$()]+)[ \t]*[:=][ \t]*function[ \t]*\(/\1/f,function/
--regex-JavaScript=/function[ \t]+([A-Za-z0-9._$]+)[ \t]*\([^\]\)]*\)/\1/f,function/
--regex-JavaScript=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*new[ \t]+Array\(/\1/a,array/
--regex-JavaScript=/([A-Za-z0-9._$]+)[ \t]*[:=][ \t]*\[/\1/a,array/
--regex-JavaScript=/([^= ]+)[ \t]*=[ \t]*[^""]'[^'']*/\1/s,string/
--regex-JavaScript=/([^= ]+)[ \t]*=[ \t]*[^'']"[^""]*/\1/s,string/
which alters the pre-existing JavaScript language definition directly, rather than creating a new language definition within ctags. Now, when taglib forces vim's registered filetype, the new definitions are used. Also missing from the previously posted ~/.ctags lines was the "kind" letter that Al mentioned in his answer, so those are included in my updated version as well.
From there, drop the following into your ~/.vimrc to activate the new types:
let g:tlist_javascript_settings = 'javascript;s:string;a:array;o:object;f:function'
All-in-all, the new regex lines aren't perfect -- they'll definitely need some tweaking to avoid a lot of false positives, and it might be nice to separate out constants and such. But now, at least, I have the ability to do that :).
Edit: Added instructions on how to activate types without editing the plugin, and vastly improved the main ctags function regex to avoid some false-positives.
Edit 2: Added more array and object definitions to the ctags regex.
I ran into this post on a google search, and although your findings are excellent, I think we can improve them. This is the results of a bit of hacking on your solution:
.ctags
--regex-JavaScript=/^var[ \t]+([a-zA-Z0-9_$]+) = \[/\1/a,array/
--regex-JavaScript=/^var[ \t]+([a-zA-Z0-9_$]+) = \{/\1/o,object/
--regex-JavaScript=/^var[ \t]+([a-zA-Z0-9_$]+) = (^{^[)+/\1/r,var/
--regex-JavaScript=/^[ \t]*(this\.)?([A-Za-z0-9_$()]+)[ \t]*[:=][ \t]*function[ \t]*\(\)/\2/u,function/
--regex-JavaScript=/^[ \t]*function ([a-z0-9]+[A-Za-z0-9_]*)/\1/u,function/
--regex-JavaScript=/^[ \t]*([A-Za-z0-9]+)\.prototype\.([a-z0-9]+[A-Za-z0-9_]*)/\1 : \2/u,function/
--regex-JavaScript=/^[ \t]*function ([A-Z]+[A-Za-z0-9_]*)/\1/o,object/
.vimrc
let g:tlist_javascript_settings = 'javascript;r:var;s:string;a:array;o:object;u:function'
This gets rid of a few more false positives, and adds some more features in, as a tradeoff for getting rid of some of the more problematic regexes. I'll keep updating if I find I need more.
Edit: I've gotten everything working really nicely now; I feel like this result is solid. The only major deficiency is that it doesn't work on comma separated variable definitions. That seems particularly nasty. Maybe another day. :)
Note also that I changed the .vimrc. This isn't because I'm a freak; it's because somehow taglist or ctags or something has some default values set, and if you don't change it, then you get a lot of doubles where functions and vars are concerned, which really drives me insane (I pay super attention to detail.. :P )
Edit: More tweaks. It picks up on prototype function declarations now, and doesn't do some other stupid stuff.
The best-practice solution, which is also very new, neat and easy way to get JavaScript source-code browsing / tag-list in Vim, is using Mozilla's DoctorJS (formerly known as jsctags).
See my answer for this question for more info.
Enjoy. :)
I've not used javascript or taglist much, but looking through :help taglist-extend, it looks like your definitions (listed above) rename the javascript output to js, so you'll probably need something like (in your vimrc):
let tlist_js_settings = 'js;f:function;m:method'
This is assuming that the ctags 'kind' is 'f' for function and 'm' for method. Have a look at your tags file and see what the 'kind' column looks like. By way of example, my C code tags file includes this line:
ADC_CR1_AWDCH_0 .\LibraryModules\CMSIS\Headers\stm32f10x.h 2871;" d
This is a #define of a symbol ADC_CR1_AWDCH_0, which is in the listed file at line 2871. The 'd' is the ctags 'kind' for a defined name. Hopefully that will give you enough to get you going.
As an aside, I'm not sure whether the override will work correctly, so it might be worth naming your file 'myfile.mjs' and changing your langmap to js:.mjs until it's working properly. Then at least you'll know whether your problems are associated with misidentification of files or the actual parsing.
Hi thanks to Tom Frost for his question and research, I think there is a little problem with the 4th line regexp of your final answer:
--regex-JavaScript=/function[ \t]+([A-Za-z0-9._$]+)[ \t]*\([^\]\)]*\)/\1/f,function/
Doesn't worked for me, I pulled it a bit and now works ok:
--regex-JavaScript=/function[ \t]+([A-Za-z0-9._$]+)[ \t]*\([^\)]*\)/\1/f,function/
PD. The others answers' regexps posted here doesn't work at all at least for me :-?
To avoid duplicate entries from ctags' built in javascript support I define 'js' language as in original post and help taglist use ctags with it. I also make sure that tagnames are stripped from some less useful bits (quotes, "this.", ".prototype"). I don't use object/array/string/var regexps, but it's easy to combine my regexps with the other suggestions.
~/.ctags:
--langdef=js
--langmap=js:.js
--regex-js=/["']?(this\.)?([A-Za-z0-9_$]+)["']?((\.prototype)?(\.[A-Za-z0-9_$]+))?[ \t]*[:=][ \t]*function/\2\5/f,function/
--regex-js=/function[ \t]+([A-Za-z0-9_$]+)/\1/f,function/
~/.vimrc:
let g:tlist_javascript_settings = 'js;f:function'
Is there any way to get the source line number in Javascript, like __LINE__ for C or PHP?
There is a way, although more expensive: throw an exception, catch it immediately, and dig out the first entry from its stack trace. See example here on how to parse the trace. The same trick can also be used in plain Java (if the code is compiled with debugging information turned on).
Edit: Apparently not all browsers support this. The good news is (thanks for the comment, Christoph!) that some browsers export source file name and line number directly through the fileName and lineNumber properties of the error object.
The short answer is no.
The long answer is that depending on your browser you might be able to throw & catch an exception and pull out a stack trace.
I suspect you're using this for debugging (I hope so, anyway!) so your best bet would be to use Firebug. This will give you a console object; you can call console.trace() at any point to see what your programme is doing without breaking execution.
You can try to run C preprocessor (f.e. cpp from GNU Compiler Collection) on your javascript files -- either dynamically with each request or statically, by making this operation be applied every time you change your javascript files. I think the javascript syntax is similar enough for this to work.
Then you'd have all the power of C preprocessor.
You can use this in vanilla JS:
function getLine(offset) {
var stack = new Error().stack.split('\n'),
line = stack[(offset || 1) + 1].split(':');
return parseInt(line[line.length - 2], 10);
}
Object.defineProperty(window, '__LINE__', {
get: function () {
return getLine(2);
}
});
You will now have access to the global variable __LINE__
A __LINE__ in C is expanded by a preprocessor which literally replaces it with the line number of the current input. So, when you see
printf("Line Number: %d\r\n", __LINE__);
the compiler sees:
printf("Line Number: %d\r\n", 324);
In effect the number (324 in this case) is HARDCODED into the program. It is only this two-pass mechanism that makes this possible.
I do not know how PHP achieves this (is it preprocessed also?).
I think preprocessing makes more sense, in that it adds no runtime overhead. An alternative to the C preprocessor is using perl, as in the 2 step procedure below:
1 – add “Line # 999 \n” to each line in the script that you want numbered, e.g.,
alert ( "Line # 999 \n"+request.responseText);
2 – run the perl below:
cat my_js.js | perl -ane "{ s/Line # \d+ /Line # $. /; print $_;}" > C_my_js.js; mv C_my_js.js my_js.js
There is one workaround.
Usually the __ LINE __ combined with the __ FILE __ is used for marking a locations in code and the marking is done to find that location later.
However, it is possible to achieve the same effect by using Globally Unique Identifiers (GUID-s) in stead of the __ LINE __ and __ FILE __. Details of the solution reside in the COMMENTS.txt of a BSD-licensed toolset that automates the process.