JavaScript regex replace last pattern in string? - javascript

I have a string which looks like
var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42)
.ramBam(8.1, 0).bam(8.1, (slot_height-thick)/2)
I want to put a tag around the last .bam() or .ramBam().
str.replace(/(\.(ram)?bam\(.*?\))$/i, '<span class="focus">$1</span>');
And I hope to get:
new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42).ramBam(8.1, 0)<span class="focus">.bam(8.1, (slot_height-thick)/2)</span>
But somehow I keep on fighting with the non greedy parameter, it wraps everything after new Bammer with the span tags. Also tried a questionmark after before the $ to make the group non greedy.
I was hoping to do this easy, and with the bam or ramBam I thought that regex would be the easiest solution but I think I'm wrong.
Where do I go wrong?

You can use the following regex:
(?!.*\)\.)(\.(?:bam|ramBam)\(.*\))$
Demo
(?!.*\)\.) # do not match ').' later in the string
( # begin capture group 1
.\ # match '.'
(?:bam|ramBam) # match 'bam' or 'ramBam' in non-cap group
\(.*\) # match '(', 0+ chars, ')'
) # end capture group 1
$ # match end of line
For the example given in the question the negative lookahead (?!.*\)\.) moves an internal pointer to just before the substring:
.bam(8.1, (slot_height-thick)/2)
as that is the first location where there is no substring ). later in the string.
If there were no end-of-line anchor $ and the string ended:
...0).bam(8.1, (slot_height-thick)/2)abc
then the substitution would still be made, resulting in a string that ends:
...0)<span class="focus">.bam(8.1, (slot_height-thick)/2)</span>abc
Including the end-of-line anchor prevents the substitution if the string does not end with the contents of the intended capture group.

Regex to use:
/\.((?:ram)?[bB]am\([^)]*\))(?!.*\.(ram)?[bB]am\()/
\. Matches period.
(?:ram)? Optionally matches ram in a non-capturing group.
[bB]am Matches bam or Bam.
\( Matches (.
[^)]* Matches 0 or more characters as long as they are not a ).
) Matches a ). Items 2. through 6. are placed in Capture Group 1.
(?!.*\.(ram)?[bB]am\() This is a negative lookahead assertion stating that the rest of the string contains no further instance of .ram( or .rambam( or .ramBam( and therefore this is the last instance.
See Regex Demo
let str = 'var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, 0).bam(0, -42).ramBam(8.1, 0).bam(8.1, slot_height)';
console.log(str.replace(/\.((?:ram)?[bB]am\([^)]*\))(?!.*\.(ram)?[bB]am\()/, '<span class="focus">.$1</span>'));
Update
The JavaScript regular expression engine is not powerful enough to handle nested parentheses. The only way I know of solving this is if we can make the assumption that after the final call to bam or ramBam there are no more extraneous right parentheses in the string. Then where I had been scanning the parenthesized expression with \([^)]*\), which would fail to pick up final parentheses, we must now use \(.*\) to scan everything until the final parentheses. At least I know no other way. But that also means that the way that I had been using to determine the final instance of ram or ramBam by using a negative lookahead needs a slight adjustment. I need to make sure that I have the final instance of ram or ramBam before I start doing any greedy matches:
(\.(?:bam|ramBam)(?!.*\.(bam|ramBam)\()\((.*)\))
See Regex Demo
\. Matches ..
(?:bam|ramBam) Matches bam or ramBam.
(?!.*\.(bam|ramBam)\() Asserts that Item 1. was the final instance
\( Matches (.
(.*) Greedily matches everything until ...
\) the final ).
) Items 1. through 6. are placed in Capture Group 1.
let str = 'var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42) .ramBam(8.1, 0).bam(8.1, (slot_height-thick)/2)';
console.log(str.replace(/(\.(?:bam|ramBam)(?!.*\.(bam|ramBam)\()\((.*)\))/, '<span class="focus">$1</span>'));

The non-greedy flag isn't quite right here, as that will just make the regex select the minimal number of characters to fit the pattern. I'd suggest that you do something with a negative lookahead like this:
str.replace(/(\.(?:ram)?[Bb]am\([^)]*\)(?!.*(ram)?[Bb]am))/i, '<span class="focus">$1</span>');
Note that this will only replace the last function name (bam OR ramBam), but not both. You'd need to take a slightly different approach to be able to replace both of them.

Related

How to form regex to match everything up to a "("

In javascript, how can a regular expression be formed to match everything up to and NOT including an opening parenthesis "("?
example input:
"12(pm):00"
"12(am):))"
"8(am):00"
ive found /^(.*?)\(/ to be successful with the "up to" part, but the match returned includes the "("
In regex101.com, its says the first capturing group is what im looking for, is there a way to return only the captured group?
There are three ways to deal with this. The first is to restrict the characters you match to not include the parenthesis:
let match = "12(pm):00".match(/[^(]*/);
console.log(match[0]);
The second is to only get the part of the match you are interested in, using capture groups:
let match = "12(pm):00".match(/(.*?)\(/);
console.log(match[1]);
The third is to use lookahead to explicitly exclude the parenthesis from the match:
let match = "12(pm):00".match(/.*?(?=\()/);
console.log(match[0]);
As in OP, note the non-greedy modifier in the second and third case: it is necessary to restrict the quantifier in case there is another open parenthesis further inside the string. This is not necessary in the first place, since the quantifier is explicitly forbidden to gobble up the parenthesis.
Try
^\d+
^ asserts position at start of a line
\d matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
https://regex101.com/r/C9XNT4/1

Remove Last Instance Of Character From String - Javascript - Revisited

According to the accepted answer from this question, the following is the syntax for removing the last instance of a certain character from a string (In this case I want to remove the last &):
function remove (string) {
string = string.replace(/&([^&]*)$/, '$1');
return string;
}
console.log(remove("height=74&width=12&"));
But I'm trying to fully understand why it works.
According to regex101.com,
/&([^&]*)$/
& matches the character & literally (case sensitive)
1st Capturing Group ([^&]*)
Match a single character not present in the list below [^&]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
& matches the character & literally (case sensitive)
$ asserts position at the end of the string, or before the line terminator right at the end of the string (if any)
So if we're matching the character & literally with the first &:
Then why are we also "matching a single character not present in the following list"?
Seems counter productive.
And then, "$ asserts position at the end of the string" - what does this mean? That it starts searching for matches from the back of the string first?
And finally, what is the $1 doing in the replaceValue? Why is it $1 instead of an empty string? ""
1- The solution for that problem I think is different to the solution you want:
That regex will replace the last "&" no matter where it is, in the middle or in the end of the string.
If you apply this regex to this two examples you will see that the first get "incorrectly" replaced:
height=74&width=12&test=1
height=74&width=12&test=1&
They get replaced as :
height=74&width=12test=1
height=74&width=12&test=1
So to really replace the last "&" the only thing you need to do is :
string.replace(/&$/, '');
Now, if you want to replace the last ocurrence of "&" no matter where it is, I will explain that regex :
$1 Represents a (capturing group), everything inside those ([^&]*) are captured inside that $1. This is a oversimplification.
&([^&]*)$
& Will match a literal "&" then in the following capturing group this regex will look for any amount (0 to infinite) of characters (NOT EQUAL TO "&", explained latter) until the end of the string or line (Depending on the flag you use in the regex, /m for matching lines ). Anything captured in this capturing group will go to $1 when you apply the replacement.
So, If you apply this logic in your mind you will see that it will always match the last & and replace it with anything on its right that does not contain a single "&""
&(<nothing-like-a-&>*)<until-we-reach-the-end> replaced by anything found inside (<nothing-like-a-&>*) == $1. In this case because of the use of * , it means 0 or more times, sometimes the capturing group $1 will be empty.
NOT EQUAL TO part:
The regex uses a [^], in simple terms [] represents a group of independent characters, example: [ab] or [ba] represents the same, it will always look for "a" or "b". Inside this you can also look for ranges like 0 to 9 like this [0-9ba], it will always match anything from 0 to 9, a or b.
The "^" here [^] represents a negation of the content, so, it will match anything not in this group, like [^0-9] will always match anything that is not a number. In your regex [^&] it was used for looking for anything that is not a "&"

String replace with regexp overwrites non matching character

The idea is replacing in a string all decimal numbers without a digit before the decimal point with the zero so .03 sqrt(.02) would become 0.03 sqrt(0.02).
See the code below for a sample, the problem is that the replacement overwrites the opening parenthesis when there's one preceding the decimal point. I think that the parenthesis does not pertain to the matching string, does it?
let s='.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(?:^|\D)\.(\d+)/g , "0.$1");
console.log(s)
Make your initial group capturing, not non-capturing, and use it in the replacement:
s=s.replace(/(^|[^\d])\.(\d+)/g , "$10.$2");
// ^---- capturing, not non-capturing
Example:
let s = '.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(^|[^\d])\.(\d+)/g , "$10.$2");
console.log(s)
I think that the parenthesis does not pertain to the matching string, does it?
It does, because it matches [^\d].
Side note: As Wiktor points out, you can use \D instead of [^\d].
Side note 2: JavaScript regexes are finally getting lookbehind (in the living specification, and will be in the ES2018 spec snapshot), so an alternate way to do this with modern JavaScript environments would be a negative lookbehind:
s=s.replace(/(?<!\d)\.(\d+)/g , "0.$1");
// ^^^^^^^--- negative lookbehind for a digit
That means basically "If there's a digit here, don't match." (There's also positive lookbehind, (?<=...).)
Example:
let s = '.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(?<!\d)\.(\d+)/g , "0.$1");
console.log(s)
A parenthesis is a nn-digit, thus it is matched with [^\d] and removed.
The solution is to match and capture the part before a dot and then insert back using a replacement backreference.
Use
.replace(/(^|\D)\.(\d+)/g , "$10.$2")
See the regex demo.
Pattern details
(^|\D) - Capturing group 1 (later referred to with $1 from the replacement pattern): a start of string or any non-digit ([^\d] = \D)
\. - a dot
(\d+) - Capturing group 2 (later referred to with $2 from the replacement pattern): 1+ digits.
See the JS demo:
let s='.05 sqrt(.005) another(.33) thisShouldnt(a.b) neither(3.4)'
s=s.replace(/(^|\D)\.(\d+)/g , "$10.$2");
console.log(s)
Note that $10.$2 will be parsed by the RegExp engine as $1 backreference, then 0. text and then $2 backreference, since there are only 2 capturing groups in the pattern, there are no 10 capturing groups and thus $10 will not be considered as a valid token in the replacement pattern.

Why not being greedy in triple dots?

I'm not looking to brute this to work, with a workaround, I am interested in learning why it failed.
I am trying to match all occurrences of a comma or period NOT followed by a space.
I used this patt: ([.,]+)(?! )
It should match only two cases in this string:
This is a test... And,another test.
It should match the , between And and another AND it should match the final period of the sentence. HOWEVER it is also matching the first two dots of the triple dots. .... Shouldn't the + make it greedy so it see the tripple dots is followed by a space and not match it?
Screenshot:
Your regex ([.,]+)(?! ) matches .. in ... because of backtracking. It happens when a regex may match a part of the string in different ways, and it is the case when you use quantifiers and lookarounds. Here, the engine matches ... and checks if there is a space. There is a space after ... in your string, thus, the match is failed, but the regex engine knows there is another possible way to match at the current location, and backtracks. It discards the final . from the match and checks if the second . in ... is not followed with a space. It is not, there is a . after it. So, .. are matched.
You can use an atomic group workaround here:
/(?=([.,]+))\1(?! )/g
See the regex demo
One or more dots or commas are captured inside a lookahead and then \1 consumes the text. Since there is no backtracking possible into backreferences, the negative lookahead is checked after the last . or , and if there is a space, fail occurs and the preceding . or , are not checked.
A better way for a JS regex engine to match what you want is to include the . and , into the negative lookahead condition (see Pavneet's suggestion):
/[.,]+(?![ .,])/g
^^^^^

Regex needed to split a string by "."

I am in need for a regex in Javascript. I have a string:
'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'
I want to split this string by periods such that I get an array:
[
'*window',
'some1',
'some\.2', //ignore the . because it's escaped
'(a.b ? cc\.c : d.n [a.b, cc\.c])', //ignore everything inside ()
'some\.3',
'(this.o.p ? ".mike." [ff\.])',
'some5'
]
What regex will do this?
var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array
Fiddle: http://jsfiddle.net/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:
/ Start of RegExp literal
(?: Create a group without reference (example: say, group A)
\( `(` character
(?: Create a group without reference (example: say, group B)
(['"]) ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
\) `)` character
\1 The character as matched at group 1, either `'` or `"`
| OR
[^)]+? Any non-`)` character, at least once (see below)
)+ End of group (B). Let this group occur at least once
| OR
\\\. `\.` (escaped backslash and dot, because they're special chars)
| OR
[^.]+? Any non-`.` character, at least once (see below)
)+ End of group (A). Let this group occur at least once
/g "End of RegExp, global flag"
/*Summary: Match everything which is not satisfying the split-by-dot
condition as specified by the OP*/
There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.
The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.
When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:
Index 0: <Whole match>
Index 1: <Group 1>
The regex below :
result = subject.match(/(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g);
Can be used to acquire the desired results. Group 1 has the results since you want to omit the .
Use this :
var myregexp = /(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
for (var i = 0; i < match.length; i++) {
// matched text: match[i]
}
match = myregexp.exec(subject);
}
Explanation :
// (?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))
//
// Match the regular expression below «(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))»
// Match the regular expression below and capture its match into backreference number 1 «(\(.*?[^'"]\)|.*?[^\\])»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\(.*?[^'"]\)»
// Match the character “(” literally «\(»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match a single character NOT present in the list “'"” «[^'"]»
// Match the character “)” literally «\)»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^\\]»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match any character that is NOT a “A \ character” «[^\\]»
// Match the regular expression below «(?:\.|$)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\.»
// Match the character “.” literally «\.»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
// Assert position at the end of the string (or before the line break at the end of the string, if any) «$»
It is notoriously difficult to use a Regex to do balanced parenthesis matching, especially in Javascript.
You would be way better off creating your own parser. Here's a clever way to do this that will utilize the strength of Regex's:
Create a Regex that matches and captures any "pattern of interest" - /(?:(\\.)|([\(\[\{])|([\)\]\}])|(\.))/g
Use string.replace(pattern, function (...)), and in the function, keep a count of opening braces and closing braces.
Add the matching text to a buffer.
If the split character is found and the opening and closing braces are balanced, add the buffer to your results array.
This solution will take a bit of work, and requires knowledge of closures, and you should probably see the documentation of string.replace, but I think it is a great way to solve your problem!
Update:
After noticing the number of questions related to this one, I decided to take on the above challenge.
Here is the live code to use a Regex to split a string.
This code has the following features:
Uses a Regex pattern to find the splits
Only splits if there are balanced parenthesis
Only splits if there are balanced quotes
Allows escaping of parenthesis, quotes, and splits using \
This code will work perfectly for your example.
not need regex for this work.
var s = '*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5';
console.log(s.match(/(?:\([^\)]+\)|.*?\.)/g));
output:
["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]
So, was working with this, and now I see that #FailedDev is rather not a failure, since that was pretty nice. :)
Anyhow, here's my solution. I'll just post the regex only.
((\(.*?((?<!")\)(?!")))|((\\\.)|([^.]))+)
Sadly this won't work in your case however, since I'm using negative lookbehind, which I don't think is supported by javascript regex engine. It should work as intended in other engines however, as can be confirmed here: http://gskinner.com/RegExr/. Replace with $1\n.

Categories