Why not being greedy in triple dots?

Why not being greedy in triple dots? - javascript

I'm not looking to brute this to work, with a workaround, I am interested in learning why it failed.
I am trying to match all occurrences of a comma or period NOT followed by a space.
I used this patt: ([.,]+)(?! )
It should match only two cases in this string:
This is a test... And,another test.
It should match the , between And and another AND it should match the final period of the sentence. HOWEVER it is also matching the first two dots of the triple dots. .... Shouldn't the + make it greedy so it see the tripple dots is followed by a space and not match it?
Screenshot:

Your regex ([.,]+)(?! ) matches .. in ... because of backtracking. It happens when a regex may match a part of the string in different ways, and it is the case when you use quantifiers and lookarounds. Here, the engine matches ... and checks if there is a space. There is a space after ... in your string, thus, the match is failed, but the regex engine knows there is another possible way to match at the current location, and backtracks. It discards the final . from the match and checks if the second . in ... is not followed with a space. It is not, there is a . after it. So, .. are matched.
You can use an atomic group workaround here:
/(?=([.,]+))\1(?! )/g
See the regex demo
One or more dots or commas are captured inside a lookahead and then \1 consumes the text. Since there is no backtracking possible into backreferences, the negative lookahead is checked after the last . or , and if there is a space, fail occurs and the preceding . or , are not checked.
A better way for a JS regex engine to match what you want is to include the . and , into the negative lookahead condition (see Pavneet's suggestion):
/[.,]+(?![ .,])/g
^^^^^

Related

Replacements only in the first line with a regex

There is a transform of multiline string.
!a! b!
should become
.a. b.
And
!a! b!
c!
!d!
should become
.a. b.
c!
!d!
I approached it with a lookbehind:
str(/(?<!\n)([^\n!]*)!+/g, '$1.')
It didn't work as intended:
.a. b.
c.
!d.
Splitting a string and transforming the first line seems straightforward. But is there a reliable way to do replacements only in the first line of multiline string with a regex only?
Also would appreciate an explanation what exactly goes wrong with my approach so it fails.
The question is not limited to JS regex flavour but I'm interested in this one in the first place.

About the pattern you tried:
(?<!\n) Negative lookbehind, assert what is directly to the left is not a newline or !
([^\n!]*) Capture group 1, match 0+ times any char except a newline or !
!+ Match 1+ times ! (What you want to remove)
The pattern will match too much, as it will match all the individual parts. There is for example no rule that says match this pattern 2 times, so you will replace with group 1 for every time that pattern has a match.
Note that the quantifier in this part is 0+ times ([^\n!]*) it will also match a single ! except when preceded by a newline.
If you can make use of SKIP FAIL, you can first match what you want to avoid, which in this case is a line that optionally starts with an exclamation mark and ends with an exclamation mark with none in between.
After that match all the other exclamation marks and replace them with a dot.
^!?[^\r\n!]*!$(*SKIP)(*FAIL)|!
See a regex demo
Another option could be using 2 capturing groups.
The first group will match between the first set of exclamation marks, and the second group will match the whitespaces after followed by a char other than !.
Then match the ! at the end so it is not in the replacement
!([^\s!]+)!([^\S\r\n]+[^\s!])!
See another regex demo
In the replacement use the 2 capturing groups with the dots
.$1.$2.

JavaScript regex replace last pattern in string?

I have a string which looks like
var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42)
.ramBam(8.1, 0).bam(8.1, (slot_height-thick)/2)
I want to put a tag around the last .bam() or .ramBam().
str.replace(/(\.(ram)?bam\(.*?\))$/i, '<span class="focus">$1</span>');
And I hope to get:
new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42).ramBam(8.1, 0)<span class="focus">.bam(8.1, (slot_height-thick)/2)</span>
But somehow I keep on fighting with the non greedy parameter, it wraps everything after new Bammer with the span tags. Also tried a questionmark after before the $ to make the group non greedy.
I was hoping to do this easy, and with the bam or ramBam I thought that regex would be the easiest solution but I think I'm wrong.
Where do I go wrong?

You can use the following regex:
(?!.*\)\.)(\.(?:bam|ramBam)\(.*\))$
Demo
(?!.*\)\.) # do not match ').' later in the string
( # begin capture group 1
.\ # match '.'
(?:bam|ramBam) # match 'bam' or 'ramBam' in non-cap group
\(.*\) # match '(', 0+ chars, ')'
) # end capture group 1
$ # match end of line
For the example given in the question the negative lookahead (?!.*\)\.) moves an internal pointer to just before the substring:
.bam(8.1, (slot_height-thick)/2)
as that is the first location where there is no substring ). later in the string.
If there were no end-of-line anchor $ and the string ended:
...0).bam(8.1, (slot_height-thick)/2)abc
then the substitution would still be made, resulting in a string that ends:
...0)<span class="focus">.bam(8.1, (slot_height-thick)/2)</span>abc
Including the end-of-line anchor prevents the substitution if the string does not end with the contents of the intended capture group.

Regex to use:
/\.((?:ram)?[bB]am\([^)]*\))(?!.*\.(ram)?[bB]am\()/
\. Matches period.
(?:ram)? Optionally matches ram in a non-capturing group.
[bB]am Matches bam or Bam.
\( Matches (.
[^)]* Matches 0 or more characters as long as they are not a ).
) Matches a ). Items 2. through 6. are placed in Capture Group 1.
(?!.*\.(ram)?[bB]am\() This is a negative lookahead assertion stating that the rest of the string contains no further instance of .ram( or .rambam( or .ramBam( and therefore this is the last instance.
See Regex Demo
let str = 'var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, 0).bam(0, -42).ramBam(8.1, 0).bam(8.1, slot_height)';
console.log(str.replace(/\.((?:ram)?[bB]am\([^)]*\))(?!.*\.(ram)?[bB]am\()/, '<span class="focus">.$1</span>'));
Update
The JavaScript regular expression engine is not powerful enough to handle nested parentheses. The only way I know of solving this is if we can make the assumption that after the final call to bam or ramBam there are no more extraneous right parentheses in the string. Then where I had been scanning the parenthesized expression with \([^)]*\), which would fail to pick up final parentheses, we must now use \(.*\) to scan everything until the final parentheses. At least I know no other way. But that also means that the way that I had been using to determine the final instance of ram or ramBam by using a negative lookahead needs a slight adjustment. I need to make sure that I have the final instance of ram or ramBam before I start doing any greedy matches:
(\.(?:bam|ramBam)(?!.*\.(bam|ramBam)\()\((.*)\))
See Regex Demo
\. Matches ..
(?:bam|ramBam) Matches bam or ramBam.
(?!.*\.(bam|ramBam)\() Asserts that Item 1. was the final instance
\( Matches (.
(.*) Greedily matches everything until ...
\) the final ).
) Items 1. through 6. are placed in Capture Group 1.
let str = 'var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42) .ramBam(8.1, 0).bam(8.1, (slot_height-thick)/2)';
console.log(str.replace(/(\.(?:bam|ramBam)(?!.*\.(bam|ramBam)\()\((.*)\))/, '<span class="focus">$1</span>'));

The non-greedy flag isn't quite right here, as that will just make the regex select the minimal number of characters to fit the pattern. I'd suggest that you do something with a negative lookahead like this:
str.replace(/(\.(?:ram)?[Bb]am\([^)]*\)(?!.*(ram)?[Bb]am))/i, '<span class="focus">$1</span>');
Note that this will only replace the last function name (bam OR ramBam), but not both. You'd need to take a slightly different approach to be able to replace both of them.

Characters still matching even though regex says "don't match if the same character is behind/in front"

I want to match words between asterisks:
This is a *test.*
This is the regex:
\*(.*?)\*
But I don't want to match the word if it's surrounded by two or more asterisks:
This is a **test.**
So I updated the regex to reflect that:
(?<!\*)\*(.*?)\*(?!\*)
However, **test** is still being matched.
Why is this? And how to fix it?
https://regexr.com/4td9l

. matches anything (except newlines) .*? may include *s - the matched substring is basically expanding to include the inner *s, while the outer *s are matching the \*s, so that the lookbehind and lookaheads are still fulfilled.
Change the inner group to not match *s:
(?<!\*)\*([^*]+)\*(?!\*)
^^^^^^^
https://regex101.com/r/J6tUoL/1

If there is the possibility to match a double ** between the single * you could double the lookarounds
(?<!\*)\*(?!\*)(.*?)(?<!\*)\*(?!\*)
Regex demo

regexp: match everything beginning from second dot including dot

I want to match everything beginning from second ., including .
Regexp: /(?<=\d\.\d+)\..*/g. Playground regex101
It does not work for strings 1232..233232.

Update
as #WiktorStribiżew points out the regex don't test for 1212.2e1.121212 os this might be a better solution.
/(?<=^[^.]*\.[^.]*)\..*/ since it will also test for this
old answer.
You can do this regex101, this will begin including from the second . including it.
Regexp: /(?<=\d?\.\d*)\..*/g
You need to use * (include 0 to x elements of this character) instead of + (include 1 to x of this character)
I have added a ? after your first \d to handle the case if it starts with a . and not a digit.

When reading your question literally.
I want to match everything beginning from second ., including .
This would do the trick:
[.][^.]*([.].*)
Leaving the resulting answer in group 1. Keep in mind that [^.] also matches newline characters, if you don't want this add \n to the character negation class.

Regular expression match specific key words

I am trying to use regexp to match some specific key words.
For those codes as below, I'd like to only match those IFs at first and second line, which have no prefix and postfix. The regexp I am using now is \b(IF|ELSE)\b, and it will give me all the IFs back.
IF A > B THEN STOP
IF B < C THEN STOP
LOL.IF
IF.LOL
IF.ELSE
Thanks for any help in advance.
And I am using http://regexr.com/ for test.
Need to work with JS.

I'm guessing this is what you're looking for, assuming you've added the m flag for multiline:
(?:^|\s)(IF|ELSE)(?:$|\s)
It's comprised of three groups:
(?:^|\s) - Matches either the beginning of the line, or a single space character
(IF|ELSE) - Matches one of your keywords
(?:$|\s) - Matches either the end of the line, or a single space character.
Regexr

you can do it with lookaround (lookahead + lookbehind). this is what you really want as it explicitly matches what you are searching. you don't want to check for other characters like string start or whitespaces around the match but exactly match "IF or ELSE not surrounded by dots"
/(?<!\.)(IF|ELSE)(?!\.)/g
explanation:
use the g-flag to find all occurrences
(?<!X)Y is a negative lookbehind which matches a Y not preceeded by an X
Y(?!X) is a negative lookahead which matches a Y not followed by an X
working example: https://regex101.com/r/oS2dZ6/1
PS: if you don't have to write regex for JS better use a tool which supports the posix standard like regex101.com

We Keep Coding

JavaScript is the programming language of the Web.

Why not being greedy in triple dots? - javascript

Related

Replacements only in the first line with a regex

JavaScript regex replace last pattern in string?

Characters still matching even though regex says "don't match if the same character is behind/in front"

regexp: match everything beginning from second dot including dot

Regular expression match specific key words

Categories

Resources