Match all Inside Parenthesis but not Outside [duplicate] - javascript

This question already has answers here:
Recursive matching with regular expressions in Javascript
(6 answers)
Closed 8 years ago.
I'm trying to use regular expressions to match certain groups of strings which correspond to functions. Right now it looks like this:
(Spreadsheet.[^)\)]+\))
Where it finds the variable Spreadsheet which has the function as an attribute. The expression keeps going until it gets to the end parenthesis. For simple functions such as
Spreadsheet.ADD(1,2)
the regular expression will work fine.
However, if I try to do any sort of nesting, the expression does not work because it will stop at the inside parenthesis instead of going to the last parenthesis.
Spreadsheet.ADD(Spreadsheet.ADD(1, 2), 3)
Thus, the ", 3)" isn't identified and ends being ignored. Of course, due to the way my code processes it, this unusual string ends up causing an error.
Does anyone with more knowledge of regular expressions know how it could be changed such that it will stop only when it is at the last parenthesis and not the first?
Thanks.

Assuming that you only want to match functions in the form that you state in the question. If you want to match any type of function (including operators, nested comments, etc) then what you are wanting is going to be difficult with regex, see here. Anyway, to match the last bracket you can use:
(Spreadsheet\..+\))
This will match
Spreadsheet.ADD(1,2)
Spreadsheet.ADD(Spreadsheet.ADD(1, 2), 3)
Spreadsheet.ADD(Spreadsheet.ADD(1, 2), 3)foo
(foo not part of the match)
The reason that your regex did not match the full string is because it will stop when it finds a character that is not a ) which is the first ). Also, as an aside Spreadsheet. will match Spreadsheeta, Spreadsheetb, Spreadsheetc. To match a dot you need \..
In my regex .+) will include the last bracket because + is greedy, so it will get the longest match it can. As an aside you would specify a non-greedy match using +?

Related

Solve Catastrophic Backtracking in my regex detecting Email [duplicate]

This question already has an answer here:
Email validation Regular expression is causing catastrophic backtracking
(1 answer)
Closed 7 months ago.
I have regex
/^\w+([.-]?\w+)*#\w+([.-]?\w+)*(\.\w{2,4})+$/
for checking valid Email.
It works, but GitHub's code scanner shows this error
This Part of the Regular Expression May Cause Exponential Backtracking on Strings Starting With 'A#a' and Containing Many Repetitions of 'A'.
I got the error, however, I'm not sure how to solve it.
A good place to start is this: How can I recognize an evil regex?
As one of the answers there says, the key is to avoid "repetition of a repetition". For instance, given (\w+)* and the input aaa, it could match as (aaa), or (a)(aa), or (aa)(a), or (a)(a)(a); and as the input gets longer, the number of possibilities goes up exponentially. If instead you just write (\w*), it will match all the same strings, but only in one way.
In your case, you have two places where you write ([.-]?\w+)* and because you've made the [.-] optional, it can match in all the ways that (\w+)* can. But text without a dot or dash is already matched by the \w+ just before, so you can have ([.-]\w+)* instead.
The string .aaa can now only match one way, because (.a)(aa) doesn't have a dot or dash at the start of the second group. Other strings like aaa or ..a can be ruled out because you need exactly one dot or dash, and at least one character matching \w (which doesn't include . or -).

RegEx if then else for conditional test depending on first character [duplicate]

This question already has answers here:
How do I make part of a regex match optional?
(2 answers)
Closed 1 year ago.
I should allow 2 different input strings formats, with each their own validation.
So eg:
AA2222222222222222
and
2222222222222222
This means that if the first character is a letter, I should validate for ^[a-zA-Z]{2}\d{16}$. If the first character is numeric, I should validate for d{16}.
I tried to write it in an conditional regex:
^(([a-zA-Z])(?([a-zA-Z])^[a-zA-Z]{2}\d{16}$|d{16})
but I get a pattern error and can't figure out what exactly is wrong.
Any insight would be apreciated
I tried to write it in an conditional regex
JavaScript doesn't support regular expression conditional syntax, so (?ifthen|else) doesn't work in JavaScript.
This means that if the first character is a letter, I should validate for ^[a-zA-Z]{2}\d{16}$. If the first character is numeric, I should validate for d{16}.
Since the \d{16} part is the same, you can just make the [a-zA-Z]{2} part optional:
/^(?:[a-zA-Z]{2})?\d{16}$/
That uses a non-capturing group around the [a-zA-Z]{2} and makes the entire group optional via the ? after it.
If the validation were different (say, maybe the version with the letters at the start only does \d{14}), you could use an alternation:
/^(?:[a-zA-Z]{2}\d{14}|\d{16})$/
(Beware the gotcha: Without the non-capturing around around the alternation, the ^ would be part of the first alternative but not the second, and the $ would be part of the second alternative, but not the first.)
But in your specific case, you don't need that.

JavaScript regex: why is alternation not ordered? [duplicate]

This question already has answers here:
Why order matters in this RegEx with alternation?
(3 answers)
Order of regular expression operator (..|.. ... ..|..)
(1 answer)
Closed 2 years ago.
Given this code:
const regex = /graph|photograph/;
'A photograph'.match(regex);
// Output: [ 'photograph', index: 2, input: 'A photograph', groups: undefined ]
Why is the engine not finding graph first? After looking at similar SO questions and the ECMAScript docs, I can see that
The | regular expression operator separates two alternatives. The pattern first tries to match the left Alternative (followed by the sequel of the regular expression); if it fails, it tries to match the right Disjunction (followed by the sequel of the regular expression).
Now, the above quote covers the case /photo|photograph/ where the alternatives share a common beginning, but the case where they share a common ending appears to be governed by a different rule.
I am content with the result I am getting, as in my use case I prefer to get the longest match, not the earliest one, but I would like to know why this happens, so I can be sure this isn't just a coincidence that is bound to change in the future.
The alternative graph does not match starting at the third character, but the alternative photograph does. The engine proceeds through the string from left to right.
The ordering you refer to in the question applies when alternatives match from a common starting point in the string. Otherwise, while proceeding through the "haystack" string, the alternatives are all considered. If there's a single match starting from a particular character,
then the rest of the regex will proceed with that (and may of course backtrack later).
Whether the engine prefers longer matches from a set of alternatives when there are multiple matches from the same character in the source, I can't say off the top of my head. I would guess it would try the longer one first, to consume more of the string optimistically, because it can always backtrack. However, I don't know that to be actual specified behavior and just thinking about reading the regex semantics in the spec makes my head hurt.

Why the first character after a sub is ignored somehow in the regex? [duplicate]

This question already has answers here:
Regex that can match empty string is breaking the javascript regex engine
(2 answers)
Closed 3 years ago.
Here is what I got in the console of Chrome 78.
console.log('1111'.replace(/(^|[^2])/g, '$12'))
// output "21121212"
Why isn't it replacing the first 1 with 12?
I think what's happening is that after replacing a zero-width match, it increments the position in the input string by 1 before searching for the next match. Otherwise, it would get stuck in an infinite loop, continually matching and replacing the same zero-width string.
Since ^ matches a zero-width string at the beginning, it increments the position, skipping over the first character of the string before looking for the next match.
Method 1
My guess is that you're trying to write
(?<=^)|([^2])
yet, you'd want to check if lookarounds are supported or not.
Demo 1
Method 2
This method also has lookarounds,
(?<=^|[^2])
Demo 2
If you would provide some sample intputs and outputs, there might be some workarounds.
For example, maybe a positive lookahead might be an option to look into:
(?=^|[^2]|$)
Demo 3
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:

AngularJS: regex matching everything except strings with specific symbols weird behaviour [duplicate]

This question already has answers here:
AngularJS - Remove leading and trailing whitespace from input-box using regex
(2 answers)
Closed 7 years ago.
I am not very good at regular expressions so maybe this is a simple question, but I am certainly missing something. I use regular expressions to validate specific input from user. The input must be accepted (regex must match) if and only if the input string contains no commas and no whitespaces(in other words, the input must be single word without commas). Except that, it can contain any symbols and the input string can have any length. Now, when I use this regular expression, it matches input, that doesn't contain commas.
/^[^,]*$/
I wanted to add the whitespace part to it, so I made this expression
/^[^,\s]*$/
which behaves in a very weird way. It does what it should except one thing. For some reasons, it matches(and lets in) strings, that end with space (If they end with comma, everything is OK and it doesn't match). I dont wan't it to match strings with trailing whitespaces but I don't know, how to adjust the regular expression to do this. So my questions are - why is this weird thing happening and how to change the regular expression to do what it should.
here is an example:
http://jsbin.com/qoyoyagilo/2/edit?html,js,output
What is even weirder, when I tried my regex on rubular, it didn' t match strings with trailing whitespaces. I am starting to believe, that this has to do something with javascript and not with my particular regex
Angular already trims your strings before validating them and binding to model. Extra whitespace at the beginning and at the end of strings won't even be matched against your regular expression (or any other validator).
You can use ng-trim="false" if you wish to disable this behavior:
<input ng-model="yourmodelvar" ng-trim="false" ng-pattern="[^,\s]*">
Also note that you don't need the ^ and $ chars in your regexp, since validation is performed against the whole string automatically. From the documentation on ng-pattern:
Sets pattern validation error key if the ngModel $viewValue value does
not match a RegExp found by evaluating the Angular expression given in
the attribute value. If the expression evaluates to a RegExp object,
then this is used directly. If the expression evaluates to a string,
then it will be converted to a RegExp after wrapping it in ^ and $
characters. For instance, "abc" will be converted to new
RegExp('^abc$').
References:
https://docs.angularjs.org/api/ng/directive/input (official doc)
How to disable trimming of inputs in AngularJS?

Categories