Regex, Get sequence if not preceded by symbols [duplicate] - javascript

This question already has answers here:
RegEx for a^b instead of pow(a,b)
(6 answers)
Closed 2 years ago.
I'm using the math.js library and I need to take the exponent of some variables. I have the following strings:
//Ok
pow(y,2)
pow(y,2+2)
pow(y,2-3)
pow(y,2.2)
pow(y,(23)/(2))+23123
pow(y,pow(2,pow(2,4)))-932
pow(y,pow(2,1*pow(2,0.5)))+23
//Erro
pow(y,2)*pow(2,2)
pow(y,3)-pow(2,2)
pow(y,4)+pow(2,2)
pow(y,pow(2,1*pow(2,0.5)))+pow(1,1)
I'm having trouble implementing this search using regex. The pow(a,b) function is composed of two arguments "a" is the base and "b" the exponent.
In the last four strings of the code above, I need to capture only "2", "3", "4" and "pow(2,1*pow(2,0.5))". I don't want to take the part after "*", "+" and "-".
Since it is possible to chain the pow() function and both "a" and "b" can have arithmetic operators and functions like pow() and sqrt(), this turned out to be very complex. Is there any way to resolve this using regex?
The closest I got is in this regex: https://regex101.com/r/hB1cg4/4

As stated in the comments, doing balanced match is hard in regex, though the .NET regex flavor supports this feature. Please see this answer: https://stackoverflow.com/a/35271017/8031896
Nevertheless, there is a work-around that uses the common regex flavors. However, please note that you may need to modifiy it according to the number of parentheses recursion layer in your mathematic notation.
((?<=^pow)\(([^()]*|\(([^()]*|\([^()]*\))*\))*\))
demo: https://regex101.com/r/hB1cg4/6
For more detailed explanation, please see this answer: https://stackoverflow.com/a/18703444/8031896

The following regex matches all of the "Euro" strings, and one variant, but unfortunately fails to match two of the "OK" strings. Perhaps some tweaking is possible. The regex contains a single capture group that captures the information of interest.
^pow\([^,]+,(\d[^()]*|pow\(\d+,\d+(?:\)|[^()]*\([^()]*\)\)))\).*
Javascript demo
To match the "Euro" strings I assumed that pow(2,1*pow(2,0.5)) in pow(y,pow(2,1*pow(2,0.5)))+23 represented the maximum number of nested "pow"'s.
The regex performs the following operations.
^ # match beginning of line
pow\( # match 'pow('
[^,]+ # match 1+ chars other than ','
, # match ','
( # begin capture group 1
\d[^()]* # match a digit, 0+ chars other than '(' and ')'
| # or
pow\(\d+,\d+ # match 'pow(', 1+ digits, ',' 1+ digits
(?: # begin non-cap grp
\) # match ')'
| # or
[^()]* # match 0+ chars other than '(' and ')'
\( # match '('
[^()]* # match 0+ chars other than '(' and ')'
\)\) # match '))'
) # end non-cap grp
) # end cap grp 1
\) # match ')'

Related

js regex not working. Exact match and or operation

The first 3 characters needs to be:
Exactly either ABC or ACD or BCD
Then followed be a hyphen -
Then followed by either a 5 or 8
Then any 4 numbers
Examples:
ABC-56789 (True)
AAA-56789 (False)
I have tried this:
/^[^ABC$|^ACD$|^BCD$][*-][5|8][0-9]{4}$/
How about use this expression?
^(ABC|ACD|BCD)-[5|8]\d{4}$
[] means character set. So, [ABC] means any A or B or C, not ABC.
And ^ means negated in []. So, regex you used may not work fine.
If you want to group the tokens, you should use ().
You can also use \d (digit) instead of [0-9].
Use this regex:
const regex = /^(?:ABC|ACD|BCD)-[58][0-9]{4}$/;
[
'ABC-56789',
'AAA-56789'
].forEach(str => {
console.log(str, '==>', regex.test(str));
})
Output:
ABC-56789 ==> true
AAA-56789 ==> false
Explanation of regex:
^ -- anchor at beginning of string
(?:ABC|ACD|BCD) -- non-capture group with OR combinations
- -- literla dash
[58] -- a 5 or 8
[0-9]{4} -- four digits
$ -- anchor at end of string
Learn more about regex: https://twiki.org/cgi-bin/view/Codev/TWikiPresentation2018x10x14Regex
Use parentheses, not square brackets, to group alternation patterns:
^(ABC|ACD|BCD)-[58][0-9]{4}$
You have to change the regex to the following:
/^(ABC|ACD|BCD)-(5|8)[0-9]{4}$/
[] match single characters, but you want to match three characters in the beginning, so you have to use the () to create a capturing group.
Using a single alternation:
^(?:ABC|[AB]CD)-[58][0-9]{4}$
Explanation
^ Start of string
(?: Non capture group for the alternatives
ABC Match literally
| Or
[AB]CD Match either ACD or BCD
) Close the non capture group
- Match literally
[58] Match either 5 or 8`
[0-9]{4} Match 4 digits 0-9
$ End of string
See a regex101 demo

regex for simple arithmetic expression

I've read other stackoverflow posts about a simple arithmetic expression regex, but none of them is working with my issue:
I need to validate this kind of expression: "12+5.6-3.51-1.06",
I tried
const mathre = /(\d+(.)?\d*)([+-])?(\d+(.)?\d*)*/;
console.log("12+5.6-3.51-1.06".match(mathre));
but the result is '12+5', and I can't figure why ?
You only get 12.5 as a match, as there is not /g global flag, but if you would enable the global flag it will give partial matches as there are no anchors ^ and $ in the pattern validating the whole string.
The [+-] is only matched once, which should be repeated to match it multiple times.
Currently the pattern will match 1+2+3 but it will also match 1a1+2b2 as the dot is not escaped and can match any character (use \. to match it literally).
For starting with digits and optional decimal parts and repeating 1 or more times a + or -:
^\d+(?:\.\d+)?(?:[-+]\d+(?:\.\d+)?)+$
Regex demo
If the values can start with optional plus and minus and can also be decimals without leading digits:
^[+-]?\d*\.?\d+(?:[-+][+-]?\d*\.?\d+)+$
^ Start of string
[+-]? Optional + or -
\d*\.\d+ Match *+ digits with optional . and 1+ digits
(?: Non capture group
[-+] Match a + or -
[+-]?\d*\.\d+ Match an optional + or - 0+ digits and optional . and 1+ digits
)+ Close the noncapture group and repeat 1+ times to match at least a single + or -
$ End of string
Regex demo
You would try to use this solution for PCRE compatible RegExp engine:
^(?:(-?\d+(?:[\.,]{1}\d)?)[+-]?)*(?1)$
^ Start of String
(?: Non capture group ng1
(-?\d+(?:[\.,]{1}\d)?) Pattern for digit with or without start
"-" and with "." or "," in the middle, matches 1 or 1.1 or 1,1
(Matching group 1)
[+-]? Pattern for "+" or "-"
)* Says
that group ng1 might to repeat 0 or more times
(?1) Says that
it must be a digit in the end of pattern by reference to the first subpattern
$ End of string
As JS does not support recursive reference, you may use full version instead:
/^(?:(-?\d+(?:[\.,]{1}\d)?)[+-]?)*(-?\d+(?:[\.,]{1}\d)?)$/gm

How do I combine these two regular expressions into one?

I'm writing a rudimentary lexer using regular expressions in JavaScript and I have two regular expressions (one for single quoted strings and one for double quoted strings) which I wish to combine into one. These are my two regular expressions (I added the ^ and $ characters for testing purposes):
var singleQuotedString = /^'(?:[^'\\]|\\'|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*'$/gi;
var doubleQuotedString = /^"(?:[^"\\]|\\"|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*"$/gi;
Now I tried to combine them into a single regular expression as follows:
var string = /^(["'])(?:[^\1\\]|\\\1|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*\1$/gi;
However when I test the input "Hello"World!" it returns true instead of false:
alert(string.test('"Hello"World!"')); //should return false as a double quoted string must escape double quote characters
I figured that the problem is in [^\1\\] which should match any character besides matching group \1 (which is either a single or a double quote - the delimiter of the string) and \\ (which is the backslash character).
The regular expression correctly filters out backslashes and matches the delimiters, but it doesn't filter out the delimiter within the string. Any help will be greatly appreciated. Note that I referred to Crockford's railroad diagrams to write the regular expressions.
You can't refer to a matched group inside a character class: (['"])[^\1\\]. Try something like this instead:
(['"])((?!\1|\\).|\\[bnfrt]|\\u[a-fA-F\d]{4}|\\\1)*\1
(you'll need to add some more escapes, but you get my drift...)
A quick explanation:
(['"]) # match a single or double quote and store it in group 1
( # start group 2
(?!\1|\\). # if group 1 or a backslash isn't ahead, match any non-line break char
| # OR
\\[bnfrt] # match an escape sequence
| # OR
\\u[a-fA-F\d]{4} # match a Unicode escape
| # OR
\\\1 # match an escaped quote
)* # close group 2 and repeat it zero or more times
\1 # match whatever group 1 matched
This should work too (raw regex).
If speed is a factor, this is the 'unrolled' method, said to be the fastest for this kind of thing.
(['"])(?:(?!\\|\1).)*(?:\\(?:[\/bfnrt]|u[0-9A-F]{4}|\1)(?:(?!\\|\1).)*)*/1
Expanded
(['"]) # Capture a quote
(?:
(?!\\|\1). # As many non-escape and non-quote chars as possible
)*
(?:
\\ # escape plus,
(?:
[\/bfnrt] # /,b,f,n,r,t or u[a-9A-f]{4} or captured quote
| u[0-9A-F]{4}
| \1
)
(?:
(?!\\|\1). # As many non-escape and non-quote chars as possible
)*
)*
/1 # Captured quote
Well, you can always just create a larger regex by just using the alternation operator on the smaller regexes
/(?:single-quoted-regex)|(?:double-quoted-regex)/
Or explicitly:
var string = /(?:^'(?:[^'\\]|\\'|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*'$)|(?:^"(?:[^"\\]|\\"|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*"$)/gi;
Finally, if you want to avoid the code duplication, you can build up this regex dynamically, using the new Regex constructor.
var quoted_string = function(delimiter){
return ('^' + delimiter + '(?:[^' + delimiter + '\\]|\\' + delimiter + '|\\\\|\\\/|\\b|\\f|\\n|\\r|\\t|\\u[0-9A-F]{4})*' + delimiter + '$').replace(/\\/g, '\\\\');
//in the general case you could consider using a regex excaping function to avoid backslash hell.
};
var string = new RegExp( '(?:' + quoted_string("'") + ')|(?:' + quoted_string('"') + ')' , 'gi' );

Regular expression negative match

I can't seem to figure out how to compose a regular expression (used in Javascript) that does the following:
Match all strings where the characters after the 4th character do not contain "GP".
Some example strings:
EDAR - match!
EDARGP - no match
EDARDTGPRI - no match
ECMRNL - match
I'd love some help here...
Use zero-width assertions:
if (subject.match(/^.{4}(?!.*GP)/)) {
// Successful match
}
Explanation:
"
^ # Assert position at the beginning of the string
. # Match any single character that is not a line break character
{4} # Exactly 4 times
(?! # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
. # Match any single character that is not a line break character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
GP # Match the characters “GP” literally
)
"
You can use what's called a negative lookahead assertion here. It looks into the string ahead of the location and matches only if the pattern contained is /not/ found. Here is an example regular expression:
/^.{4}(?!.*GP)/
This matches only if, after the first four characters, the string GP is not found.
could do something like this:
var str = "EDARDTGPRI";
var test = !(/GP/.test(str.substr(4)));
test will return true for matches and false for non.

Replace spaces but not when between parentheses

I guess I can do this with multiple regexs fairly easily, but I want to replace all the spaces in a string, but not when those spaces are between parentheses.
For example:
Here is a string (that I want to) replace spaces in.
After the regex I want the string to be
Hereisastring(that I want to)replacespacesin.
Is there an easy way to do this with lookahead or lookbehing operators?
I'm a little confused on how they work, and not real sure they would work in this situation.
Try this:
replace(/\s+(?=[^()]*(\(|$))/g, '')
A quick explanation:
\s+ # one or more white-space chars
(?= # start positive look ahead
[^()]* # zero or more chars other than '(' and ')'
( # start group 1
\( # a '('
| # OR
$ # the end of input
) # end group 1
) # end positive look ahead
In plain English: it matches one or more white space chars if either a ( or the end-of-input can be seen ahead without encountering any parenthesis in between.
An online Ideone demo: http://ideone.com/jaljw
The above will not work if:
there are nested parenthesis
parenthesis can be escaped

Categories