When parsing Javascript, what determines the meaning of a slash? - javascript

Javascript has a tricky grammar to parse. Forward-slashes can mean a number of different things: division operator, regular expression literal, comment introducer, or line-comment introducer. The last two are easy to distinguish: if the slash is followed by a star, it starts a multiline comment. If the slash is followed by another slash, it is a line-comment.
But the rules for disambiguating division and regex literal are escaping me. I can't find it in the ECMAScript standard. There the lexical grammar is explicitly divided into two parts, InputElementDiv and InputElementRegExp, depending on what a slash will mean. But there's nothing explaining when to use which.
And of course the dreaded semicolon insertion rules complicate everything.
Does anyone have an example of clear code for lexing Javascript that has the answer?

It's actually fairly easy, but it requires making your lexer a little smarter than usual.
The division operator must follow an expression, and a regular expression literal can't follow an expression, so in all other cases you can safely assume you're looking at a regular expression literal.
You already have to identify Punctuators as multiple-character strings, if you're doing it right. So look at the previous token, and see if it's any of these:
. ( , { } [ ; , < > <= >= == != === !== + - * % ++ --
<< >> >>> & | ^ ! ~ && || ? : = += -= *= %= <<= >>= >>>=
&= |= ^= / /=
For most of these, you now know you're in a context where you can find a regular expression literal. Now, in the case of ++ --, you'll need to do some extra work. If the ++ or -- is a pre-increment/decrement, then the / following it starts a regular expression literal; if it is a post-increment/decrement, then the / following it starts a DivPunctuator.
Fortunately, you can determine whether it is a "pre-" operator by checking its previous token. First, post-increment/decrement is a restricted production, so if ++ or -- is preceded by a linebreak, then you know it is "pre-". Otherwise, if the previous token is any of the things that can precede a regular expression literal (yay recursion!), then you know it is "pre-". In all other cases, it is "post-".
Of course, the ) punctuator doesn't always indicate the end of an expression - for example if (something) /regex/.exec(x). This is tricky because it does require some semantic understanding to disentangle.
Sadly, that's not quite all. There are some operators that are not Punctuators, and other notable keywords to boot. Regular expression literals can also follow these. They are:
new delete void typeof instanceof in do return case throw else
If the IdentifierName you just consumed is one of these, then you're looking at a regular expression literal; otherwise, it's a DivPunctuator.
The above is based on the ECMAScript 5.1 specification (as found here) and does not include any browser-specific extensions to the language. But if you need to support those, then this should provide easy guidelines for determining which sort of context you're in.
Of course, most of the above represent very silly cases for including a regular expression literal. For example, you can't actually pre-increment a regular expression, even though it is syntactically allowed. So most tools can get away with simplifying the regular expression context checking for real-world applications. JSLint's method of checking the preceding character for (,=:[!&|?{}; is probably sufficient. But if you take such a shortcut when developing what's supposed to be a tool for lexing JS, then you should make sure to note that.

I am currently developing a JavaScript/ECMAScript 5.1 parser with JavaCC. RegularExpressionLiteral and Automatic Semicolon Insertion are two things which make me crazy in ECMAScript grammar. This question and an answers were invaluable for the regex question. In this answer I'd like to put my own findings together.
TL;DR In JavaCC, use lexical states and switch them from the parser.
Very important is what Thom Blake wrote:
The division operator must follow an expression, and a regular
expression literal can't follow an expression, so in all other cases
you can safely assume you're looking at a regular expression literal.
So you actually need to understand if it was an expression or not before. This is trivial in the parser but very hard in the lexer.
As Thom pointed out, in many (but, unfortunately, not all) cases you can understand if it was an expression by "looking" at the last token. You have to consider punctuators as well as keywords.
Let's start with keywords. The following keywords cannot precede a DivPunctuator (for example, you cannot have case /5), so if you see a / after these, you have a RegularExpressionLiteral:
case
delete
do
else
in
instanceof
new
return
throw
typeof
void
Next, punctuators. The following punctuators cannot precede a DivPunctuator (ex. in { /a... the symbol / can never start a division):
{ ( [
. ; , < > <=
>= == != === !==
+ - * %
<< >> >>> & | ^
! ~ && || ? :
= += -= *= %= <<=
>>= >>>= &= |= ^=
/=
So if you have one of these and see /... after this, then this can never be a DivPunctuator and therefore must be a RegularExpressionLiteral.
Next, if you have:
/
And /... after that it also must be a RegularExpressionLiteral. If there were no space between these slashes (i.e. // ...), this must have handled as a SingleLineComment ("maximal munch").
Next, the following punctuator may only end an expression:
]
So the following / must start a DivPunctuator.
Now we have the following remaining cases which are, unfortunately, ambiguous:
}
)
++
--
For } and ) you have to know if they end an expression or not, for ++ and -- - they end an PostfixExpression or start an UnaryExpression.
And I have come to the conclusion that it is very hard (if not impossible) to find out in the lexer. To give you a sense of that, a couple of examples.
In this example:
{}/a/g
/a/g is a RegularExpressionLiteral, but in this one:
+{}/a/g
/a/g is a division.
In case of ) you can have a division:
('a')/a/g
as well as a RegularExpressionLiteral:
if ('a')/a/g
So, unfortunately, it looks like you can't solve it with the lexer alone. Or you'll have to bring in so much grammar into the lexer so it's no lexer anymore.
This is a problem.
Now, a possible solution, which is, in my case JavaCC-based.
I am not sure if you have similar features in other parser generators, but JavaCC has a lexical states feature which can be used to switch between "we expect a DivPunctuator" and "we expect a RegularExpressionLiteral" states. For instance, in this grammar the NOREGEXP state means "we don't expect a RegularExpressionLiteral here".
This solves part of the problem, but not the ambiguous ), }, ++ and --.
For this, you'll need to be able to switch lexical states from the parser. This is possible, see the following question in JavaCC FAQ:
Can the parser force a switch to a new lexical state?
Yes, but it is very easy to create bugs by doing so.
A lookahead parser may have already gone too far in the token stream (i.e. already read / as a DIV or vice versa).
Fortunately there seems to be a way to make switching lexical states a bit safer:
Is there a way to make SwitchTo safer?
The idea is to make a "backup" token stream and push tokens read during lookahead back again.
I think that this should work for }, ), ++, -- as they are normally found in LOOKAHEAD(1) situations, but I am not 100% sure of that. In the worst case the lexer may have already tried to parse /-starting token as a RegularExpressionLiteral and failed as it was not terminated by another /.
In any case, I see no better way of doing that. The next good thing would be probably to drop the case altogether (like JSLint and many others did), document and just not parse these types of expressions. {}/a/g does not make much sense anyway.

JSLint appears to expect a regular expression if the preceding token is one of
(,=:[!&|?{};
Rhino always returns a DIV (slash) token from the lexer.

You can only know how to interpret the / by also implementing a syntax parser. Whichever lex path arrives at a valid parse determines how to interpret the character. Apparently, this is something they had considered fixing, but didn't.
More reading here:
http://www-archive.mozilla.org/js/language/js20-2002-04/rationale/syntax.html#regular-expressions

See section 7:
There are two goal symbols for the lexical grammar. The InputElementDiv symbol is used in those syntactic grammar contexts where a leading division (/) or division-assignment (/=) operator is permitted. The InputElementRegExp symbol is used in other syntactic grammar contexts.
NOTE There are no syntactic grammar contexts where both a leading division or division-assignment, and a leading RegularExpressionLiteral are permitted. This is not affected by semicolon insertion (see 7.9); in examples such as the
following:
a = b
/hi/g.exec(c).map(d);
where the first non-whitespace, non-comment character after a LineTerminator is slash (/) and the syntactic context allows division or division-assignment, no semicolon is inserted at the LineTerminator. That is, the above example is interpreted in
the same way as:
a = b / hi / g.exec(c).map(d);
I agree, it's confusing and there should be one top-level grammar expression rather than two.
edit:
But there's nothing explaining when to use which.
Maybe the simple answer is staring us in the face: try one and then try the other. Since they are not both permitted, at most one will yield an error-free match.

Related

Most efficient regex for checking if a string contains at least 3 alphanumeric characters

I have this regex:
(?:.*[a-zA-Z0-9].*){3}
I use it to see if a string has at least 3 alphanumeric characters in it. It seems to work.
Examples of strings it should match:
'a3c'
'_0_c_8_'
' 9 9d '
However, I need it to work faster. Is there a better way to use regex to match the same patterns?
Edit:
I ended up using this regex for my purposes:
(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}
(no modifiers needed)
The most efficient regex approach is to use the principle of contrast, i.e. using opposite character classes side by side. Here is a regex that can be used to check if a string has 3 Latin script letters or digits:
^(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}
See demo.
In case you need a full string match, you will need to append .* (or .*$ if you want to guarantee you will match all up to the end of string/line), but in my tests on regexhero, .* yields better performance):
^(?:[^a-zA-Z0-9]*[a-zA-Z0-9]){3}.*
Also, a lot depends on the engine. PCRE has auto-optimizations in place that consists in auto-possessification (i.e. it turns the * to *+ in (?:[^a-zA-Z0-9]*+).
See more details on password validation optimizations here.
(?:.*?[a-zA-Z0-9]){3}.*
You can use this.This is much faster and takes much lesser steps than yours.See demo.You probably would want to use ^$ anchors too to make sure there are no partial matches.
https://regex101.com/r/nS2lT4/32
The reason is
(?:.*[a-zA-Z0-9].*){3}
^^
This actually consumes the whole string and then engine has to backtrack.When using the other regex this is avoided
Just consider this. Regular expressions are powerful because they're expressive and very flexible (with features such as look-ahead, greedy consumption and back-tracking). There will almost always be a cost to that, however minor.
If you want raw speed (and you're willing to give up the expressiveness), you may find that it's faster to bypass regular expressions altogether and just evaluate the string, such as with the following pseudo-code:
def hasThreeAlphaNums(str):
alphanums = 0
for pos = 0 to len(str) - 1:
if str[pos] in set "[a-zA-Z0-9]":
alphanums++
if alphanums == 3:
return true
return false
It's a parser (a very simple one in this case), a tool that can be even more powerful than regular expressions. For a more concrete example, consider the following C code:
#include <ctype.h>
int hasThreeAlphaNums (char *str) {
int count = 0;
for (int ch = *str; ch != '\0'; str++)
if (isalnum (ch))
if (++count == 3)
return 1;
return 0;
}
Now, as to whether or not that's faster for this specific case, that depends on many factors, such as whether the language is interpreted or compiled, how efficient the regex is under the covers, and so on.
That's why the mantra of optimisation is "Measure, don't guess!" You should evaluate the possibilities in your target environment.

How to implement a negative LOOKAHEAD check for a token in JavaCC?

I currently implementing a JavaScript/ECMAScript 5.1 parser with JavaCC. I recently learned about LOOKAHEADs which are handy here as the grammar is not fully LL(1).
One of the things I see in the ECMAScript grammar is "negative lookahead check", like in the following ExpressionStatement production:
ExpressionStatement :
[lookahead ∉ {{, function}] Expression ;
So I'll probably need something like LOOKAHEAD(!("{" | "function")) but it does not work in this syntax.
My question is, how could I implement this "negative LOOKAHEAD" it in JavaCC?
After reading the LOOKAHEAD MiniTutorial I think that an expression like getToken(1).kind != FUNCTION may be what I need, but I am not quite sure about it.
For the example you provide, I would prefer to use syntactic look ahead, which is in a sense necessarily "positive".
The production for ExpressionStatement is not the place to tackle the problem as there is no choice.
void ExpressionStatement() : {} { Expression() ";" }
The problem will arise where there is a choice between an expression statement and a block or between an expression statement and a function declaration (or both).
E.g. in Statement you will find
void Statement() :{} {
...
|
Block()
|
ExpressionStatement()
| ...
}
gives a warning because both choices can start with a "{". You have two options. One is to ignore the warning. The first choice will be taken and all will be well, as long as Block comes first. The second choice is to suppress the warning with a lookahead specification. like this:
void Statement() :{} {
...
|
LOOKAHEAD("{") Block()
|
ExpressionStatement()
| ...
}
Syntactic look ahead is, in a sense positive -- "take this alternative if X".
If you really want a negative --i.e., "take this alternative if not X"-- look ahead it has to be semantic.
In the case of Statement you could write
void Statement() :{} {
...
|
LOOKAHEAD({!(getToken(1)==LBRACE)}) ExpressionStatement()
|
Block()
}
I made sure that these are the last two alternatives since otherwise you'd need to include more tokens in the set of tokens that block ExpressionStatement(), e.g. it should not be chosen if the next token is an "if" or a "while" or a "for", etc.
On the whole, you are better off using syntactic lookahead when you can. It is usually more straight forward and harder to mess up.
I came across this question looking for something else, and yes, I am aware that the question was posed nearly 6 years ago.
The most advanced version of JavaCC is JavaCC21. and JavaCC21 does allow negative syntactic lookahead.
In JavaCC21 you would write LOOKAHEAD(~<LBRACE>) to specify that you only enter the expansion that follows if the next token is not an LBRACE, for example. The ~ character negates the lookahead expansion and you can use it to negate more complex expansions than a single token, if you want to. For example:
LOOKAHEAD (~(<LBRACE>|<LPAREN>))
There are actually quite a few other features that JavaCC21 that are not present in the legacy JavaCC project. Here is a biggie: the longstanding bug in which nested syntactic lookahead does not work correctly has been fixed. See here.

"+" Quantifier not doing its job?

I'm a novice programmer making a simple calculator in JavaScript for a school project, and instead of using eval() to evaluate a string, I made my own function calculate(exp).
Essentially, my program uses order of operations (PEMDAS, or Parenthesis, Exponents, Multiplication/Division, Addition/Subtraction) to evaluate a string expression. One of my regex patterns is like so ("mdi" for multiplication/division):
mdi = /(-?\d+(\.\d+)?)([\*\/])(-?\d+(\.\d+)?)/g; // line 36 on JSFiddle
What this does is:
-?\d+ finds an integer number
(\.\d+)? matches the decimal if there is one
[\*\/] matches the operator used (* or / for multiplication or division)
/g matches every occurence in the string expression.
I loop through this regular expression's matches with the following code:
while((res = mdi.exec(exp)) !== null) { // line 69 on JSFiddle
exp = exp.replace(mdi,
function(match,$1,$3,$4,$5) {
if($4 == "*")
return parseFloat($1) * parseFloat($5);
else
return parseFloat($1) / parseFloat($5);
});
exp = exp.replace(doN,""); // this gets rid of double negatives
}
However, this does not work all the time. It only works with numbers with an absolute value less than 10. I cannot do any operations on numbers like 24 and -5232000321, even though the regex should match it with the + quantifier. It works with small numbers, but crashes and uses up most of my CPU when the numbers are larger than 10.
For example, when the expression 5*.5 is inputted, 2.5 is outputted, but when you input 75*.5 and press enter, the program stops.
I'm not really sure what's happening here, because I can't locate the source of the error for some reason - nothing is showing up even though I have console.log() all over my code for debugging, but I think it is something wrong with this regex. What is happening?
The full code (so far) is here at JSFiddle.net, but please be aware that it may crash. If you have any other suggestions, please tell me as well.
Thanks for any help.
The problem is
bzp = /^.\d/;
while((res = bzp.exec(result)) !== null) {
result = result.replace(bzp,
function($match) {
console.log($match + " -> 0 + " + $match);
return "0" + $match;
});
}
It keeps prepending zeros with no limit.
Removing that code it works well.
I have also cleaned your code, declared variables, and made it more maintainable: Demo
If you have any other suggestions, please tell me as well.
As pointed out in the comments, parsing your input by iteratively applying regular expressions is very ad-hoc. A better approach would be to actually construct a grammar for your input language and parse based on that. Here's an example grammar that basically matches your input language:
expr ::= term ( additiveOperator term )*
term ::= factor ( multiplicativeOperator factor )*
expr ::= number | '(' expr ')'
additiveOperator ::= '+' | '-'
multiplicativeOperator ::= '*' | '/'
The syntax here is pretty similar to regular expressions, where parenthesese denote groups, * denotes zero-or-more repetitions, and | denotes alternatives. The symbols enclosed in single quotes are literals, whereas everything else is symbolic. Note that this grammar doesn't handle unary operators (based on your post it sounds like you assume a single negative sign for negative numbers, which can be parsed by the number parser).
There are several parser-generator libraries for JavaScript, but I prefer combinator-style parsers where the parser is built functionally at runtime rather than having to run a separate tool to generate the code for your parer. Parsimmon is a nice combinator parser for JavaScript, and the API is pretty easy to wrap your head around.
A parser usually returns some sort of a tree data structure corresponding to the parsed syntax (i.e. an abstract syntax tree). You then traverse this data structure in order to calculate the value of the arithmetic expression.
I created a fiddle demonstrating parsing and evaluating of arithmetic expressions. I didn't integrate any of this into your existing calculator interface, but if you can understand how to use the parser
Mathematical expression are not parsed and calculated with regular expressions because of the number of permutations and combinations available. The faster way so far, is POST FIX notation because other notations are not as fast as this one. As they mention on Wikipedia:
In comparison testing of reverse Polish notation with algebraic
notation, reverse Polish has been found to lead to faster
calculations, for two reasons. Because reverse Polish calculators do
not need expressions to be parenthesized, fewer operations need to be
entered to perform typical calculations. Additionally, users of
reverse Polish calculators made fewer mistakes than for other types of
calculator. Later research clarified that the increased speed
from reverse Polish notation may be attributed to the smaller number
of keystrokes needed to enter this notation, rather than to a smaller
cognitive load on its users. However, anecdotal evidence suggests
that reverse Polish notation is more difficult for users to learn than
algebraic notation.
Full article: Reverse Polish Notation
And also here you can see other notations that are still far more better than regex.
Calculator Input Methods
I would therefore suggest you change your algorithm to a more efficient one, personally I would prefer POST FIX.

If comments are safe, then why doesn't `x = 0; x+/*cmt*/+;` or `var f/*cmt*/oo = 'foo';` work?

This thread inspired the question. Here are the code samples again. I'm looking for an answer that tells exactly what is going on.
Both x = 0; x+/*cmt*/+; and var f/*cmt*/oo = 'foo'; produce syntax errors, which renders the answers in this question wrong.
You're interrupting a word instead of a sentence. ++ and foo are words. People assume you won't be interrupting those.
Much the same as you can't put whitespace in the middle of words even though whitespace is "safe".
Because comments are parsed at the lexical level, generally considered as whitespace.
When compiling, the first step is to lexically break it up into individual tokens. Comments are one type of token, and operators are another. You're splitting the ++ operator token so that it's interpretted as two separate items.
From ECMAScript reference :
Comments behave like white space and are discarded except that, if a
MultiLineComment contains a line terminator character, then the entire
comment is considered to be a LineTerminator for purposes of parsing
by the syntactic grammar.
As many others have pointed out, the lexical parsing determines how things will become.
Let me point out some example:
ax + ay - 0x01; /* hello */
^----^---------------------- Identifier (variables)
^----^------------------- Operator
^----------------- literal constant (int)
^------------- Statement separator
^-^--^-^--- ^------------ Whitespace (ignored)
[_________]- Comments (ignored)
So the resulting token list will be:
identifier("ax");
operator("+");
identifier("ay");
operator("-");
const((int)0x01);
separator();
But if you do this:
a/* hello */x + ay - 0x01;
^-----------^---^----------- Identifier (variables)
^----^-------- Operator
^------ literal constant (int)
^-- Statement separator
^-^--^-^------- Whitespace (ignored)
[_________]---------------- Comments (ignored)
The resulting token list will be:
identifier("a");
identifier("x"); // Error: Unexpected identifier `x` at line whatever
operator("+");
identifier("ay");
operator("-");
const((int)0x01);
separator();
Then same happens when comments inserted inside an operator.
So you can see that comments behave just like whitespace.
In fact, I recently just read an article on writing a simple interpreter with JavaScript. It helped me with this answer. http://www.codeproject.com/Articles/345888/How-to-write-a-simple-interpreter-in-JavaScript

Why is my RegExp construction not accepted by JavaScript?

I'm using a RegExp to validate some user input on an ASP.NET web page. It's meant to enforce the construction of a password (i.e. between 8 and 20 long, at least one upper case character, at least one lower case character, at least one number, at least one of the characters ##!$% and no use of letters L or O (upper or lower) or numbers 0 and 1. This RegExp works fine in my tester (Expresso) and in my C# code.
This is how it looks:
(?-i)^(?=.{8,20})(?=.*[2-9])(?=.*[a-hj-km-np-z])(?=.*[A-HJ-KM-NP-Z])
(?=.*[##!$%])[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]*$
(Line break added for formatting)
However, when I run the code it lives in in IE6 or IE7 (haven't tried other browsers as this is an internal app and we're a Microsoft shop), I get a runtime error saying 'Syntax error in regular expression'. That's it - no further information in the error message aside from the line number.
What is it about this that JavaScript doesn't like?
Well, there are two ways of defining a Regex in Javascript:
a. Through a Regexp object constructor:
var re = new RegExp("pattern","flags");
re.test(myTestString);
b. Using a string literal:
var re = /pattern/flags;
You should also note that JS does not support some of the tenets of Regular Expressions. For a non-comprehensive list of features unsupported in JS, check out the regular-expressions.info site.
Specifically speaking, you appear to be setting some flags on the expression (for example, the case insensitive flag). I would suggest that you use the /i flag (as indicated by the syntax above) instead of using (?-i)
That would make your Regex as follows (Positive Lookahead appears to be supported):
/^(?=.{8,20})(?=.*[2-9])(?=.*[a-hj-km-np-z])(?=.*[A-HJ-KM-NP-Z])(?=.*[##!$%])[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]*$/i;
For a very good article on the subject, check out Regular Expressions in JavaScript.
Edit (after Howard's comment)
If you are simply assigning this Regex pattern to a RegularExpressionValidator control, then you will not have the ability to set Regex options (such as ignore case). Also, you will not be able to use the Regex literal syntax supported by Javascript. Therefore, the only option that remains is to make your pattern intrinsically case insensitive. For example, [a-h] would have to be written as [A-Ha-h]. This would make your Regex quite long-winded, I'm sorry to say.
Here is a solution to this problem, though I cannot vouch for it's legitimacy. Some other options that come to mind may be to turn of Client side validation altogether and validate exclusively on the Server. This will give you access to the full Regex flavour implemented by the System.Text.RegularExpressions.Regex object. Alternatively, use a CustomValidator and create your own JS function which applies the Regex match using the patterns that I (and others) have suggested.
I'm not familiar with C#'s regular expression syntax, but is this (at the start)
(?-i)
meant to turn the case insensitivity pattern modifier on? If so, that's your problem. Javascript doesn't support specifying the pattern modifiers in the expression. There's two ways to do this in javascript
var re = /pattern/i
var re = new RegExp('pattern','i');
Give one of those a try, and your expression should be happy.
As Cerberus mentions, (?-i) is not supported in JavaScript regexps. So, you need to get rid of that and use /i. Something to keep in mind is that there is no standard for regular expression syntax; it is different in each language, so testing in something that uses the .NET regular expression engine is not a valid test of how it will work in JavaScript. Instead, try and look for a reference on JavaScript regular expressions, such as this one.
Your match that looks for 8-20 characters is also invalid. This will ensure that there are at least 8 characters, but it does not limit the string to 20, since the character class with the kleene-closure (* operator) at the end can match as many characters as provided. What you want instead is to replace the * at the end with the {8,20}, and eliminate it from the beginning.
var re = /^(?=.*[2-9])(?=.*[a-hj-km-np-z])(?=.*[A-HJ-KM-NP-Z])(?=.*[##!$%])[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]{8,20}$/i;
On the other hand, I'm not really sure why you would want to restrict the length of passwords, unless there's a hard database limit (which there shouldn't be, since you shouldn't be storing passwords in plain text in the database, but instead hashing them down to something fixed size using a secure hash algorithm with a salt). And as mentioned, I don't see a reason to be so restrictive on the set of characters you allow. I'd recommend something more like this:
var re = /^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##!$%])[a-zA-Z0-9##!$%]{8,}$/i;
Also, why would you forbid 1, 0, L and O from your passwords (and it looks like you're trying to forbid I as well, which you forgot to mention)? This will make it very hard for people to construct good passwords, and since you never see a password as you type it, there's no reason to worry about letters which look confusingly similar. If you want to have a more permissive regexp:
var re = /^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[##!$%]).{8,}$/i;
Are you enclosing the regexp in / / characters?
var regexp = /[]/;
return regexp.test();
(?-i)
Doesn't exist in JS Regexp. Flags can be specified as “new RegExp('pattern', 'i')”, or literal syntax “/pattern/i”.
(?=
Exists in modern implementations of JS Regexp, but is dangerously buggy in IE. Lookahead assertions should be avoided in JS for this reason.
between 8 and 20 long, at least one upper case character, at least one lower case character, at least one number, at least one of the characters ##!$% and no use of letters L or O (upper or lower) or numbers 0 and 1.
Do you have to do this in RegExp, and do you have to put all the conditions in one RegExp? Because those are easy conditions to match using multiple RegExps, or even simple string matching:
if (
s.length<8 || s.length>20 ||
s==s.toLowerCase() || s==s.toUpperCase() ||
s.indexOf('0')!=-1 || s.indexOf('1')!=-1 ||
s.toLowerCase().indexOf('l')!=-1 || s.toLowerCase().indexOf('o')!=-1 ||
(s.indexOf('#')==-1 && s.indexOf('#')==-1 && s.indexOf('!')==-1 && s.indexOf('%')==-1 && s.indexOf('%')==-1)
)
alert('Bad password!');
(These are really cruel and unhelpful password rules if meant for end-users BTW!)
I would use this regular expression:
/(?=[^2-9]*[2-9])(?=[^a-hj-km-np-z]*[a-hj-km-np-z])(?=[^A-HJ-KM-NP-Z]*[A-HJ-KM-NP-Z])(?=[^##!$%]*[##!$%])^[2-9a-hj-km-np-zA-HJ-KM-NP-Z##!$%]{8,}$/
The [^a-z]*[a-z] will make sure that the match is made as early as possible instead of expanding the .* and doing backtracking.
(?-i) is supposed to turn case-insensitivity off. Everybody seems to be assuming you're trying to turn it on, but that would be (?i). Anyway, you don't want it to be case-insensitive, since you need to ensure that there are both uppercase and lowercase letters. Since case-sensitive matching is the default, prefacing a regex with (?-i) is pointless even in those flavors (like .NET) that support inline modifiers.

Categories