Solve Catastrophic Backtracking in my regex detecting Email [duplicate] - javascript

This question already has an answer here:
Email validation Regular expression is causing catastrophic backtracking
(1 answer)
Closed 7 months ago.
I have regex
/^\w+([.-]?\w+)*#\w+([.-]?\w+)*(\.\w{2,4})+$/
for checking valid Email.
It works, but GitHub's code scanner shows this error
This Part of the Regular Expression May Cause Exponential Backtracking on Strings Starting With 'A#a' and Containing Many Repetitions of 'A'.
I got the error, however, I'm not sure how to solve it.

A good place to start is this: How can I recognize an evil regex?
As one of the answers there says, the key is to avoid "repetition of a repetition". For instance, given (\w+)* and the input aaa, it could match as (aaa), or (a)(aa), or (aa)(a), or (a)(a)(a); and as the input gets longer, the number of possibilities goes up exponentially. If instead you just write (\w*), it will match all the same strings, but only in one way.
In your case, you have two places where you write ([.-]?\w+)* and because you've made the [.-] optional, it can match in all the ways that (\w+)* can. But text without a dot or dash is already matched by the \w+ just before, so you can have ([.-]\w+)* instead.
The string .aaa can now only match one way, because (.a)(aa) doesn't have a dot or dash at the start of the second group. Other strings like aaa or ..a can be ruled out because you need exactly one dot or dash, and at least one character matching \w (which doesn't include . or -).

Related

RegEx if then else for conditional test depending on first character [duplicate]

This question already has answers here:
How do I make part of a regex match optional?
(2 answers)
Closed 1 year ago.
I should allow 2 different input strings formats, with each their own validation.
So eg:
AA2222222222222222
and
2222222222222222
This means that if the first character is a letter, I should validate for ^[a-zA-Z]{2}\d{16}$. If the first character is numeric, I should validate for d{16}.
I tried to write it in an conditional regex:
^(([a-zA-Z])(?([a-zA-Z])^[a-zA-Z]{2}\d{16}$|d{16})
but I get a pattern error and can't figure out what exactly is wrong.
Any insight would be apreciated
I tried to write it in an conditional regex
JavaScript doesn't support regular expression conditional syntax, so (?ifthen|else) doesn't work in JavaScript.
This means that if the first character is a letter, I should validate for ^[a-zA-Z]{2}\d{16}$. If the first character is numeric, I should validate for d{16}.
Since the \d{16} part is the same, you can just make the [a-zA-Z]{2} part optional:
/^(?:[a-zA-Z]{2})?\d{16}$/
That uses a non-capturing group around the [a-zA-Z]{2} and makes the entire group optional via the ? after it.
If the validation were different (say, maybe the version with the letters at the start only does \d{14}), you could use an alternation:
/^(?:[a-zA-Z]{2}\d{14}|\d{16})$/
(Beware the gotcha: Without the non-capturing around around the alternation, the ^ would be part of the first alternative but not the second, and the $ would be part of the second alternative, but not the first.)
But in your specific case, you don't need that.

AngularJS: regex matching everything except strings with specific symbols weird behaviour [duplicate]

This question already has answers here:
AngularJS - Remove leading and trailing whitespace from input-box using regex
(2 answers)
Closed 7 years ago.
I am not very good at regular expressions so maybe this is a simple question, but I am certainly missing something. I use regular expressions to validate specific input from user. The input must be accepted (regex must match) if and only if the input string contains no commas and no whitespaces(in other words, the input must be single word without commas). Except that, it can contain any symbols and the input string can have any length. Now, when I use this regular expression, it matches input, that doesn't contain commas.
/^[^,]*$/
I wanted to add the whitespace part to it, so I made this expression
/^[^,\s]*$/
which behaves in a very weird way. It does what it should except one thing. For some reasons, it matches(and lets in) strings, that end with space (If they end with comma, everything is OK and it doesn't match). I dont wan't it to match strings with trailing whitespaces but I don't know, how to adjust the regular expression to do this. So my questions are - why is this weird thing happening and how to change the regular expression to do what it should.
here is an example:
http://jsbin.com/qoyoyagilo/2/edit?html,js,output
What is even weirder, when I tried my regex on rubular, it didn' t match strings with trailing whitespaces. I am starting to believe, that this has to do something with javascript and not with my particular regex
Angular already trims your strings before validating them and binding to model. Extra whitespace at the beginning and at the end of strings won't even be matched against your regular expression (or any other validator).
You can use ng-trim="false" if you wish to disable this behavior:
<input ng-model="yourmodelvar" ng-trim="false" ng-pattern="[^,\s]*">
Also note that you don't need the ^ and $ chars in your regexp, since validation is performed against the whole string automatically. From the documentation on ng-pattern:
Sets pattern validation error key if the ngModel $viewValue value does
not match a RegExp found by evaluating the Angular expression given in
the attribute value. If the expression evaluates to a RegExp object,
then this is used directly. If the expression evaluates to a string,
then it will be converted to a RegExp after wrapping it in ^ and $
characters. For instance, "abc" will be converted to new
RegExp('^abc$').
References:
https://docs.angularjs.org/api/ng/directive/input (official doc)
How to disable trimming of inputs in AngularJS?

Match all Inside Parenthesis but not Outside [duplicate]

This question already has answers here:
Recursive matching with regular expressions in Javascript
(6 answers)
Closed 8 years ago.
I'm trying to use regular expressions to match certain groups of strings which correspond to functions. Right now it looks like this:
(Spreadsheet.[^)\)]+\))
Where it finds the variable Spreadsheet which has the function as an attribute. The expression keeps going until it gets to the end parenthesis. For simple functions such as
Spreadsheet.ADD(1,2)
the regular expression will work fine.
However, if I try to do any sort of nesting, the expression does not work because it will stop at the inside parenthesis instead of going to the last parenthesis.
Spreadsheet.ADD(Spreadsheet.ADD(1, 2), 3)
Thus, the ", 3)" isn't identified and ends being ignored. Of course, due to the way my code processes it, this unusual string ends up causing an error.
Does anyone with more knowledge of regular expressions know how it could be changed such that it will stop only when it is at the last parenthesis and not the first?
Thanks.
Assuming that you only want to match functions in the form that you state in the question. If you want to match any type of function (including operators, nested comments, etc) then what you are wanting is going to be difficult with regex, see here. Anyway, to match the last bracket you can use:
(Spreadsheet\..+\))
This will match
Spreadsheet.ADD(1,2)
Spreadsheet.ADD(Spreadsheet.ADD(1, 2), 3)
Spreadsheet.ADD(Spreadsheet.ADD(1, 2), 3)foo
(foo not part of the match)
The reason that your regex did not match the full string is because it will stop when it finds a character that is not a ) which is the first ). Also, as an aside Spreadsheet. will match Spreadsheeta, Spreadsheetb, Spreadsheetc. To match a dot you need \..
In my regex .+) will include the last bracket because + is greedy, so it will get the longest match it can. As an aside you would specify a non-greedy match using +?

UK bank sort code javascript regular expression

I'm trying to create a regular expression in javascript for a UK bank sort code so that the user can input 6 digits, or 6 digits with a hyphen between pairs. For example "123456" or "12-34-56". Also not all of the digits can be 0.
So far I've got /(?!0{2}(-?0{2}){2})(\d{2}(-\d{2}){2})|(\d{6})/ and this jsFiddle to test.
This is my first regular expression so I'm not sure I'm doing it right. The test for 6 0-digits should fail and I thought the -? optional hyphen in the lookahead would cause it to treat it the same as 6 0-digits with hyphens, but it isn't.
I'd appreciate some help and any criticism if I'm doing it completely incorrectly!
Just to answer your question, you can validate user input with:
/^(?!(?:0{6}|00-00-00))(?:\d{6}|\d\d-\d\d-\d\d)$/.test(inputString)
It will strictly match only input in the form XX-XX-XX or XXXXXX where X are digits, and will exclude 00-00-00, 000000 along with any other cases (e.g. XX-XXXX or XXXX-XX).
However, in my opinion, as stated in other comments, I think it is still better if you force user to either always enter the hyphen, or none at all. Being extra strict when dealing with anything related to money saves (unknown) troubles later.
Since any of the digits can be zero, but not all at once, you should treat the one case where they are all zero as a single, special case.
You are checking for two digits (\d{2}), then an optional hyphen (-?), then another two digits (\d{2}) and another optional hyphen (-?), before another two digits (\d{2}).
Putting this together gives \d{2}-?\d{2}-?\d{2}, but you can simplify this further:
(\d{2}-?){2}\d{2}
You then use the following pseudocode to match the format but not 000000 or 00-00-00:
if (string.match("/(\d{2}-?){2}\d{2}/") && !string.match("/(00-?){2}00/"))
//then it's a valid code, you could also use (0{2}-?){2}0{2} to check zeros
You may wish to add the string anchors ^ (start) and $ (end) to check the entire string.

RegEx string for three letters and two numbers with pre- and post- spaces

Two quick questions:
What would be a RegEx string for three letters and two numbers with space before and after them (i.e. " LET 12 ")?
Would you happen to know any good RegEx resources/tools?
For a good resource, try this website and the program RegexBuddy. You may even be able to figure out the answer to your question yourself using these sites.
To start you off you want something like this:
/^[a-zA-Z]{3}\s+[0-9]{2}$/
But the exact details depend on your requirements. It's probably a better idea that you learn how to use regular expressions yourself and then write the regular expression instead of just copying the answers here. The small details make a big difference. Examples:
What is a "letter"? Just A-Z or also foreign letters? What about lower case?
What is a "number"? Just 0-9 or also foreign numerals? Only integers? Only positive integers? Can there be leading zeros?
Should there be a single space between the letters and numbers? Or any amount of any whitespace? Even none?
Do you want to search for this string in a larger text? Or match a line exactly?
etc..
The answers to these questions will change the regular expression. It would be much faster for you in the long run to learn how to create the regular expression than to completely specify your requirements and wait for other people to reply.
I forgot to mention that there will be a space before and after. How do I include that?
Again you need to consider the questions:
Do you mean just one space or any amount of spaces? Possibly not always a space but only sometimes?
Do you mean literally a space character or any whitespace characters?
My guess is:
/^\s+[a-zA-Z]{3}\s+[0-9]{2}\s+$/
/[a-z]{3} [0-9]{2}/i will match 3 letters followed by a whitespace character, and then 2 numbers. [a-z] is a character class containing the letters a through z, and the {3} means that you want exactly 3 members of that class. The space character matches a literal space (alternately, you could use \s, which is a "shorthand" character class that matches any whitespace character). The i at the end is a pattern modifier specifying that your pattern is case-insenstive.
If you want the entire string to only be that, you need to anchor it with ^ and $:
/^[a-z]{3} [0-9]{2}$/i
Regular expression resources:
http://www.regular-expressions.info - great tutorial with a lot of information
http://rexv.org/ - online regular expression tester that supports a variety of engines.
^([A-Za-z]{3}) ([0-9]{2})$ assuming one space between the letters/numbers, as in your example. This will capture the letters and numbers separately.
I use http://gskinner.com/RegExr/ - it allows you to build a regex and test it with your own text.
As you can probably tell from the wide variety of answers, RegEx is a complex subject with a wide variety of opinions and preferences, and often more than one way of doing things. Here's my preferred solution.
^[a-zA-Z]{3}\s*\d{2}$
I used [a-zA-Z] instead of \w because \w sometimes includes underscores.
The \s* is to allow zero or more spaces.
I try to use character classes wherever possible, which is why I went with \d.
\w{3}\s{1}\d{2}
And I like this site.
EDIT:[a-zA-Z]{3}\s{1}\d{2} - The \w supports numeric characters too.
try this regularexpression
[^"\r\n]{3,}

Categories