Difference between ?:, ?! and ?= - javascript

I searched for the meaning of these expressions but couldn't understand the exact difference between them.
This is what they say:
?: Match expression but do not capture it.
?= Match a suffix but exclude it from capture.
?! Match if the suffix is absent.
I tried using these in simple RegEx and got similar results for all.
For example: the following 3 expressions give very similar results.
[a-zA-Z0-9._-]+#[a-zA-Z0-9-]+(?!\.[a-zA-Z0-9]+)*
[a-zA-Z0-9._-]+#[a-zA-Z0-9-]+(?=\.[a-zA-Z0-9]+)*
[a-zA-Z0-9._-]+#[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9]+)*

The difference between ?= and ?! is that the former requires the given expression to match and the latter requires it to not match. For example a(?=b) will match the "a" in "ab", but not the "a" in "ac". Whereas a(?!b) will match the "a" in "ac", but not the "a" in "ab".
The difference between ?: and ?= is that ?= excludes the expression from the entire match while ?: just doesn't create a capturing group. So for example a(?:b) will match the "ab" in "abc", while a(?=b) will only match the "a" in "abc". a(b) would match the "ab" in "abc" and create a capture containing the "b".

?: is for non capturing group
?= is for positive look ahead
?! is for negative look ahead
?<= is for positive look behind
?<! is for negative look behind
Please check Lookahead and Lookbehind Zero-Length Assertions for very good tutorial and examples on lookahead in regular expressions.

To better understand let's apply the three expressions plus a capturing group and analyse each behaviour.
() capturing group - the regex inside the parenthesis must be matched and the match create a capturing group
(?:) non-capturing group - the regex inside the parenthesis must be matched but does not create the capturing group
(?=) positive lookahead - asserts that the regex must be matched
(?!) negative lookahead - asserts that it is impossible to match the regex
Let's apply q(u)i to quit.q matches q and the capturing group u matches u.The match inside the capturing group is taken and a capturing group is created.So the engine continues with i.And i will match i.This last match attempt is successful.qui is matched and a capturing group with u is created.
Let's apply q(?:u)i to quit.Again, q matches q and the non-capturing group u matches u.The match from the non-capturing group is taken, but the capturing group is not created.So the engine continues with i.And i will match i.This last match attempt is successful.qui is matched.
Let's apply q(?=u)i to quit.The lookahead is positive and is followed by another token.Again, q matches q and u matches u.But the match from the lookahead must be discarded, so the engine steps back from i in the string to u.Given that the lookahead was successful the engine continues with i.But i cannot match u.So this match attempt fails.
Let's apply q(?=u)u to quit.The lookahead is positive and is followed by another token.Again, q matches q and u matches u.But the match from the lookahead must be discarded, so the engine steps back from u in the string to u.Given that the lookahead was successful the engine continues with u.And u will match u. So this match attempt is successful.qu is matched.
Let's apply q(?!i)u to quit.Even in this case lookahead is positive (because i does not match) and is followed by another token.Again, q matches q and i doesn't match u.The match from the lookahead must be discarded, so the engine steps back from u in the string to u.Given that the lookahead was successful the engine continues with u.And u will match u.So this match attempt is successful.qu is matched.
So, in conclusion, the real difference between lookahead and non-capturing groups is all about if you want just to test the existence or test and save the match.
But capturing groups are expensive so use it judiciously.

Try matching foobar against these:
/foo(?=b)(.*)/
/foo(?!b)(.*)/
The first regex will match and will return "bar" as first submatch — (?=b) matches the 'b', but does not consume it, leaving it for the following parentheses.
The second regex will NOT match, because it expects "foo" to be followed by something different from 'b'.
(?:...) has exactly the same effect as simple (...), but it does not return that portion as a submatch.

The simplest way to understand assertions is to treat them as the command inserted into a regular expression.
When the engine runs to an assertion, it will immediately check the condition described by the assertion.
If the result is true, then continue to run the regular expression.

This is the real difference:
>>> re.match('a(?=b)bc', 'abc')
<Match...>
>>> re.match('a(?:b)c', 'abc')
<Match...>
# note:
>>> re.match('a(?=b)c', 'abc')
None
If you dont care the content after "?:" or "?=", "?:" and "?=" are just the same. Both of them are ok to use.
But if you need those content for further process(not just match the whole thing. In that case you can simply use "a(b)") You have to use "?=" instead. Cause "?:"will just through it away.

Related

Replacements only in the first line with a regex

There is a transform of multiline string.
!a! b!
should become
.a. b.
And
!a! b!
c!
!d!
should become
.a. b.
c!
!d!
I approached it with a lookbehind:
str(/(?<!\n)([^\n!]*)!+/g, '$1.')
It didn't work as intended:
.a. b.
c.
!d.
Splitting a string and transforming the first line seems straightforward. But is there a reliable way to do replacements only in the first line of multiline string with a regex only?
Also would appreciate an explanation what exactly goes wrong with my approach so it fails.
The question is not limited to JS regex flavour but I'm interested in this one in the first place.
About the pattern you tried:
(?<!\n) Negative lookbehind, assert what is directly to the left is not a newline or !
([^\n!]*) Capture group 1, match 0+ times any char except a newline or !
!+ Match 1+ times ! (What you want to remove)
The pattern will match too much, as it will match all the individual parts. There is for example no rule that says match this pattern 2 times, so you will replace with group 1 for every time that pattern has a match.
Note that the quantifier in this part is 0+ times ([^\n!]*) it will also match a single ! except when preceded by a newline.
If you can make use of SKIP FAIL, you can first match what you want to avoid, which in this case is a line that optionally starts with an exclamation mark and ends with an exclamation mark with none in between.
After that match all the other exclamation marks and replace them with a dot.
^!?[^\r\n!]*!$(*SKIP)(*FAIL)|!
See a regex demo
Another option could be using 2 capturing groups.
The first group will match between the first set of exclamation marks, and the second group will match the whitespaces after followed by a char other than !.
Then match the ! at the end so it is not in the replacement
!([^\s!]+)!([^\S\r\n]+[^\s!])!
See another regex demo
In the replacement use the 2 capturing groups with the dots
.$1.$2.

How to form regex to match everything up to a "("

In javascript, how can a regular expression be formed to match everything up to and NOT including an opening parenthesis "("?
example input:
"12(pm):00"
"12(am):))"
"8(am):00"
ive found /^(.*?)\(/ to be successful with the "up to" part, but the match returned includes the "("
In regex101.com, its says the first capturing group is what im looking for, is there a way to return only the captured group?
There are three ways to deal with this. The first is to restrict the characters you match to not include the parenthesis:
let match = "12(pm):00".match(/[^(]*/);
console.log(match[0]);
The second is to only get the part of the match you are interested in, using capture groups:
let match = "12(pm):00".match(/(.*?)\(/);
console.log(match[1]);
The third is to use lookahead to explicitly exclude the parenthesis from the match:
let match = "12(pm):00".match(/.*?(?=\()/);
console.log(match[0]);
As in OP, note the non-greedy modifier in the second and third case: it is necessary to restrict the quantifier in case there is another open parenthesis further inside the string. This is not necessary in the first place, since the quantifier is explicitly forbidden to gobble up the parenthesis.
Try
^\d+
^ asserts position at start of a line
\d matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
https://regex101.com/r/C9XNT4/1

JavaScript regex replace last pattern in string?

I have a string which looks like
var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42)
.ramBam(8.1, 0).bam(8.1, (slot_height-thick)/2)
I want to put a tag around the last .bam() or .ramBam().
str.replace(/(\.(ram)?bam\(.*?\))$/i, '<span class="focus">$1</span>');
And I hope to get:
new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42).ramBam(8.1, 0)<span class="focus">.bam(8.1, (slot_height-thick)/2)</span>
But somehow I keep on fighting with the non greedy parameter, it wraps everything after new Bammer with the span tags. Also tried a questionmark after before the $ to make the group non greedy.
I was hoping to do this easy, and with the bam or ramBam I thought that regex would be the easiest solution but I think I'm wrong.
Where do I go wrong?
You can use the following regex:
(?!.*\)\.)(\.(?:bam|ramBam)\(.*\))$
Demo
(?!.*\)\.) # do not match ').' later in the string
( # begin capture group 1
.\ # match '.'
(?:bam|ramBam) # match 'bam' or 'ramBam' in non-cap group
\(.*\) # match '(', 0+ chars, ')'
) # end capture group 1
$ # match end of line
For the example given in the question the negative lookahead (?!.*\)\.) moves an internal pointer to just before the substring:
.bam(8.1, (slot_height-thick)/2)
as that is the first location where there is no substring ). later in the string.
If there were no end-of-line anchor $ and the string ended:
...0).bam(8.1, (slot_height-thick)/2)abc
then the substitution would still be made, resulting in a string that ends:
...0)<span class="focus">.bam(8.1, (slot_height-thick)/2)</span>abc
Including the end-of-line anchor prevents the substitution if the string does not end with the contents of the intended capture group.
Regex to use:
/\.((?:ram)?[bB]am\([^)]*\))(?!.*\.(ram)?[bB]am\()/
\. Matches period.
(?:ram)? Optionally matches ram in a non-capturing group.
[bB]am Matches bam or Bam.
\( Matches (.
[^)]* Matches 0 or more characters as long as they are not a ).
) Matches a ). Items 2. through 6. are placed in Capture Group 1.
(?!.*\.(ram)?[bB]am\() This is a negative lookahead assertion stating that the rest of the string contains no further instance of .ram( or .rambam( or .ramBam( and therefore this is the last instance.
See Regex Demo
let str = 'var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, 0).bam(0, -42).ramBam(8.1, 0).bam(8.1, slot_height)';
console.log(str.replace(/\.((?:ram)?[bB]am\([^)]*\))(?!.*\.(ram)?[bB]am\()/, '<span class="focus">.$1</span>'));
Update
The JavaScript regular expression engine is not powerful enough to handle nested parentheses. The only way I know of solving this is if we can make the assumption that after the final call to bam or ramBam there are no more extraneous right parentheses in the string. Then where I had been scanning the parenthesized expression with \([^)]*\), which would fail to pick up final parentheses, we must now use \(.*\) to scan everything until the final parentheses. At least I know no other way. But that also means that the way that I had been using to determine the final instance of ram or ramBam by using a negative lookahead needs a slight adjustment. I need to make sure that I have the final instance of ram or ramBam before I start doing any greedy matches:
(\.(?:bam|ramBam)(?!.*\.(bam|ramBam)\()\((.*)\))
See Regex Demo
\. Matches ..
(?:bam|ramBam) Matches bam or ramBam.
(?!.*\.(bam|ramBam)\() Asserts that Item 1. was the final instance
\( Matches (.
(.*) Greedily matches everything until ...
\) the final ).
) Items 1. through 6. are placed in Capture Group 1.
let str = 'var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42) .ramBam(8.1, 0).bam(8.1, (slot_height-thick)/2)';
console.log(str.replace(/(\.(?:bam|ramBam)(?!.*\.(bam|ramBam)\()\((.*)\))/, '<span class="focus">$1</span>'));
The non-greedy flag isn't quite right here, as that will just make the regex select the minimal number of characters to fit the pattern. I'd suggest that you do something with a negative lookahead like this:
str.replace(/(\.(?:ram)?[Bb]am\([^)]*\)(?!.*(ram)?[Bb]am))/i, '<span class="focus">$1</span>');
Note that this will only replace the last function name (bam OR ramBam), but not both. You'd need to take a slightly different approach to be able to replace both of them.

Why are positive lookbehinds captured as part of regex matches in javascript?

The following javascript expression yields ["myParameter=foobar", "foobar"].
'XXXXmyParameter=foobar'.match(/(?:myParameter=)([A-Za-z]+)/);
Why is "myParameter=foobar" even a match? I thought that positive lookbehinds are excluded from matches?
Is there a way to only capture the ([A-Za-z]+) portion of my regex in javascript? I could just take the second item in the list, but is there a way to explicitly exclude myParameter= from matches in the regex?
(?:myParameter=) is a non-capturing group, not a lookbehind. JavaScript does not support lookbehinds.
The first element of the result is always the complete match. The value of your capture group is the second element of the array, "foobar".
If you used a capture group, i.e. (myParameter=), the result would be:
["myParameter=foobar", "myParameter=", "foobar"]
So again, the first element is the complete match, every other element corresponds to a capture group.
You are not implementing Positive Lookbehind, (?:...) syntax is called a non-capturing group which is used to group expressions, not capture them (usually used in alternation when you have a set of different patterns).
You can simply reference the captured group for the match result.
var r = 'XXXXmyParameter=foobar'.match(/myParameter=([A-Za-z]+)/)[1];
if (r)
console.log(r); //=> "foobar"
Note: Lookbehind assertions are not supported in JavaScript ...
The myParameter is a match because you are using a non capturing group.
The non capturing group matches the text but it cannot be backreferenced.
Non Capturing Group:
(?:myParameter=)
Positive Lookahead:
(?=myParameter=)
Negative Lookahead:
(?!myParameter=)
The regex you need is:
(?!myParameter=)[A-Za-z]+$
DEMO

What does ?=^ mean in a regexp?

I want to write regexp which allows some special characters like #-. and it should contain at least one letter. I want to understand below things also:
/(?=^[A-Z0-9. '-]{1,45}$)/i
In this regexp what is the meaning of ?=^ ? What is a subexpression in regexp?
(?=) is a lookahead, it's looking ahead in the string to see if it matches without actually capturing it
^ means it matches at the BEGINNING of the input (for example with the string a test, ^test would not match as it doesn't start with "test" even though it contains it)
Overall, your expression is saying it has to ^ start and $ end with 1-45 {1,45} items that exist in your character group [A-Z0-9. '-] (case insensitive /i). The fact it is within a lookahead in this case just means it's not going to capture anything (zero-length match).
?= is a positive lookahead
Read more on regex

Categories