String replace with regexp overwrites non matching character

String replace with regexp overwrites non matching character - javascript

The idea is replacing in a string all decimal numbers without a digit before the decimal point with the zero so .03 sqrt(.02) would become 0.03 sqrt(0.02).
See the code below for a sample, the problem is that the replacement overwrites the opening parenthesis when there's one preceding the decimal point. I think that the parenthesis does not pertain to the matching string, does it?
let s='.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(?:^|\D)\.(\d+)/g , "0.$1");
console.log(s)

Make your initial group capturing, not non-capturing, and use it in the replacement:
s=s.replace(/(^|[^\d])\.(\d+)/g , "$10.$2");
// ^---- capturing, not non-capturing
Example:
let s = '.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(^|[^\d])\.(\d+)/g , "$10.$2");
console.log(s)
I think that the parenthesis does not pertain to the matching string, does it?
It does, because it matches [^\d].
Side note: As Wiktor points out, you can use \D instead of [^\d].
Side note 2: JavaScript regexes are finally getting lookbehind (in the living specification, and will be in the ES2018 spec snapshot), so an alternate way to do this with modern JavaScript environments would be a negative lookbehind:
s=s.replace(/(?<!\d)\.(\d+)/g , "0.$1");
// ^^^^^^^--- negative lookbehind for a digit
That means basically "If there's a digit here, don't match." (There's also positive lookbehind, (?<=...).)
Example:
let s = '.05 sqrt(.005) another(.33) thisShouldntChange(a.b) neither(3.4)'
s=s.replace(/(?<!\d)\.(\d+)/g , "0.$1");
console.log(s)

A parenthesis is a nn-digit, thus it is matched with [^\d] and removed.
The solution is to match and capture the part before a dot and then insert back using a replacement backreference.
Use
.replace(/(^|\D)\.(\d+)/g , "$10.$2")
See the regex demo.
Pattern details
(^|\D) - Capturing group 1 (later referred to with $1 from the replacement pattern): a start of string or any non-digit ([^\d] = \D)
\. - a dot
(\d+) - Capturing group 2 (later referred to with $2 from the replacement pattern): 1+ digits.
See the JS demo:
let s='.05 sqrt(.005) another(.33) thisShouldnt(a.b) neither(3.4)'
s=s.replace(/(^|\D)\.(\d+)/g , "$10.$2");
console.log(s)
Note that $10.$2 will be parsed by the RegExp engine as $1 backreference, then 0. text and then $2 backreference, since there are only 2 capturing groups in the pattern, there are no 10 capturing groups and thus $10 will not be considered as a valid token in the replacement pattern.

Related

Find two letters at the end of a link and use an IF statement with the match [duplicate]

I have a string. The end is different, such as index.php?test=1&list=UL or index.php?list=UL&more=1. The one thing I'm looking for is &list=.
How can I match it, whether it's in the middle of the string or it's at the end? So far I've got [&|\?]list=.*?([&|$]), but the ([&|$]) part doesn't actually work; I'm trying to use that to match either & or the end of the string, but the end of the string part doesn't work, so this pattern matches the second example but not the first.

Use:
/(&|\?)list=.*?(&|$)/
Note that when you use a bracket expression, every character within it (with some exceptions) is going to be interpreted literally. In other words, [&|$] matches the characters &, |, and $.

In short
Any zero-width assertions inside [...] lose their meaning of a zero-width assertion. [\b] does not match a word boundary (it matches a backspace, or, in POSIX, \ or b), [$] matches a literal $ char, [^] is either an error or, as in ECMAScript regex flavor, any char. Same with \z, \Z, \A anchors.
You may solve the problem using any of the below patterns:
[&?]list=([^&]*)
[&?]list=(.*?)(?=&|$)
[&?]list=(.*?)(?![^&])
If you need to check for the "absolute", unambiguous string end anchor, you need to remember that is various regex flavors, it is expressed with different constructs:
[&?]list=(.*?)(?=&|$) - OK for ECMA regex (JavaScript, default C++ `std::regex`)
[&?]list=(.*?)(?=&|\z) - OK for .NET, Go, Onigmo (Ruby), Perl, PCRE (PHP, base R), Boost, ICU (R `stringr`), Java/Andorid
[&?]list=(.*?)(?=&|\Z) - OK for Python
Matching between a char sequence and a single char or end of string (current scenario)
The .*?([YOUR_SINGLE_CHAR_DELIMITER(S)]|$) pattern (suggested by João Silva) is rather inefficient since the regex engine checks for the patterns that appear to the right of the lazy dot pattern first, and only if they do not match does it "expand" the lazy dot pattern.
In these cases it is recommended to use negated character class (or bracket expression in the POSIX talk):
[&?]list=([^&]*)
See demo. Details
[&?] - a positive character class matching either & or ? (note the relationships between chars/char ranges in a character class are OR relationships)
list= - a substring, char sequence
([^&]*) - Capturing group #1: zero or more (*) chars other than & ([^&]), as many as possible
Checking for the trailing single char delimiter presence without returning it or end of string
Most regex flavors (including JavaScript beginning with ECMAScript 2018) support lookarounds, constructs that only return true or false if there patterns match or not. They are crucial in case consecutive matches that may start and end with the same char are expected (see the original pattern, it may match a string starting and ending with &). Although it is not expected in a query string, it is a common scenario.
In that case, you can use two approaches:
A positive lookahead with an alternation containing positive character class: (?=[SINGLE_CHAR_DELIMITER(S)]|$)
A negative lookahead with just a negative character class: (?![^SINGLE_CHAR_DELIMITER(S)])
The negative lookahead solution is a bit more efficient because it does not contain an alternation group that adds complexity to matching procedure. The OP solution would look like
[&?]list=(.*?)(?=&|$)
or
[&?]list=(.*?)(?![^&])
See this regex demo and another one here.
Certainly, in case the trailing delimiters are multichar sequences, only a positive lookahead solution will work since [^yes] does not negate a sequence of chars, but the chars inside the class (i.e. [^yes] matches any char but y, e and s).

JavaScript regex replace last pattern in string?

I have a string which looks like
var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42)
.ramBam(8.1, 0).bam(8.1, (slot_height-thick)/2)
I want to put a tag around the last .bam() or .ramBam().
str.replace(/(\.(ram)?bam\(.*?\))$/i, '<span class="focus">$1</span>');
And I hope to get:
new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42).ramBam(8.1, 0)<span class="focus">.bam(8.1, (slot_height-thick)/2)</span>
But somehow I keep on fighting with the non greedy parameter, it wraps everything after new Bammer with the span tags. Also tried a questionmark after before the $ to make the group non greedy.
I was hoping to do this easy, and with the bam or ramBam I thought that regex would be the easiest solution but I think I'm wrong.
Where do I go wrong?

You can use the following regex:
(?!.*\)\.)(\.(?:bam|ramBam)\(.*\))$
Demo
(?!.*\)\.) # do not match ').' later in the string
( # begin capture group 1
.\ # match '.'
(?:bam|ramBam) # match 'bam' or 'ramBam' in non-cap group
\(.*\) # match '(', 0+ chars, ')'
) # end capture group 1
$ # match end of line
For the example given in the question the negative lookahead (?!.*\)\.) moves an internal pointer to just before the substring:
.bam(8.1, (slot_height-thick)/2)
as that is the first location where there is no substring ). later in the string.
If there were no end-of-line anchor $ and the string ended:
...0).bam(8.1, (slot_height-thick)/2)abc
then the substitution would still be made, resulting in a string that ends:
...0)<span class="focus">.bam(8.1, (slot_height-thick)/2)</span>abc
Including the end-of-line anchor prevents the substitution if the string does not end with the contents of the intended capture group.

Regex to use:
/\.((?:ram)?[bB]am\([^)]*\))(?!.*\.(ram)?[bB]am\()/
\. Matches period.
(?:ram)? Optionally matches ram in a non-capturing group.
[bB]am Matches bam or Bam.
\( Matches (.
[^)]* Matches 0 or more characters as long as they are not a ).
) Matches a ). Items 2. through 6. are placed in Capture Group 1.
(?!.*\.(ram)?[bB]am\() This is a negative lookahead assertion stating that the rest of the string contains no further instance of .ram( or .rambam( or .ramBam( and therefore this is the last instance.
See Regex Demo
let str = 'var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, 0).bam(0, -42).ramBam(8.1, 0).bam(8.1, slot_height)';
console.log(str.replace(/\.((?:ram)?[bB]am\([^)]*\))(?!.*\.(ram)?[bB]am\()/, '<span class="focus">.$1</span>'));
Update
The JavaScript regular expression engine is not powerful enough to handle nested parentheses. The only way I know of solving this is if we can make the assumption that after the final call to bam or ramBam there are no more extraneous right parentheses in the string. Then where I had been scanning the parenthesized expression with \([^)]*\), which would fail to pick up final parentheses, we must now use \(.*\) to scan everything until the final parentheses. At least I know no other way. But that also means that the way that I had been using to determine the final instance of ram or ramBam by using a negative lookahead needs a slight adjustment. I need to make sure that I have the final instance of ram or ramBam before I start doing any greedy matches:
(\.(?:bam|ramBam)(?!.*\.(bam|ramBam)\()\((.*)\))
See Regex Demo
\. Matches ..
(?:bam|ramBam) Matches bam or ramBam.
(?!.*\.(bam|ramBam)\() Asserts that Item 1. was the final instance
\( Matches (.
(.*) Greedily matches everything until ...
\) the final ).
) Items 1. through 6. are placed in Capture Group 1.
let str = 'var std = new Bammer({mode:"deg"}).bam(0, 112).bam(177.58, (line-4)/2).bam(0, -42) .ramBam(8.1, 0).bam(8.1, (slot_height-thick)/2)';
console.log(str.replace(/(\.(?:bam|ramBam)(?!.*\.(bam|ramBam)\()\((.*)\))/, '<span class="focus">$1</span>'));

The non-greedy flag isn't quite right here, as that will just make the regex select the minimal number of characters to fit the pattern. I'd suggest that you do something with a negative lookahead like this:
str.replace(/(\.(?:ram)?[Bb]am\([^)]*\)(?!.*(ram)?[Bb]am))/i, '<span class="focus">$1</span>');
Note that this will only replace the last function name (bam OR ramBam), but not both. You'd need to take a slightly different approach to be able to replace both of them.

How to match all words starting with dollar sign but not slash dollar

I want to match all words which are starting with dollar sign but not slash and dollar sign.
I already try few regex.
(?:(?!\\)\$\w+)
\\(\\?\$\w+)\b
String
$10<i class="">$i01d</i>\$id
Expected result
*$10*
*$i01d*
but not this
*$id*
After find all expected matching word i want to replace this my object.

One option is to eliminate escape sequences first, and then match the cleaned-up string:
s = String.raw`$10<i class="">$i01d</i>\$id`
found = s.replace(/\\./g, '').match(/\$\w+/g)
console.log(found)

The big problem here is that you need a negative lookbehind, however, JavaScript does not support it. It's possible to emulate it crudely, but I will offer an alternative which, while not great, will work:
var input = '$10<i class="">$i01d</i>\\$id';
var regex = /\b\w+\b\$(?!\\)/g;
//sample implementation of a string reversal function. There are better implementations out there
function reverseString(string) {
return string.split("").reverse().join("");
}
var reverseInput = reverseString(input);
var matches = reverseInput
.match(regex)
.map(reverseString);
console.log(matches);
It is not elegant but it will do the job. Here is how it works:
JavaScript does support a lookahead expression ((?>)) and a negative lookahead ((?!)). Since this is the reverse of of a negative lookbehind, you can reverse the string and reverse the regex, which will match exactly what you want. Since all the matches are going to be in reverse, you need to also reverse them back to the original.
It is not elegant, as I said, since it does a lot of string manipulations but it does produce exactly what you want.
See this in action on Regex101
Regex explanation Normally, the "match x long as it's not preceded by y" will be expressed as (?<!y)x, so in your case, the regex will be
/(?<!\\)\$\b\w+\b/g
demonstration (not JavaScript)
where
(?<!\\) //do not match a preceding "\"
\$ //match literal "$"
\b //word boundary
\w+ //one or more word characters
\b //second word boundary, hence making the match a word
When the input is reversed, so do all the tokens in order to match. Furthermore, the negative lookbehind gets inverted into a negative lookahead of the form x(?!y) so the new regular expression is
/\b\w+\b\$(?!\\)/g;

This is more difficult than it appears at first blush. How like Regular Expressions!
If you have look-behind available, you can try:
/(?<!\\)\$\w+/g
This is NOT available in JS. Alternatively, you could specify a boundary that you know exists and use a capture group like:
/\s(\$\w+)/g
Unfortunately, you cannot rely on word boundaries via /b because there's no such boundary before '\'.
Also, this is a cool site for testing your regex expressions. And this explains the word boundary anchor.

If you're using a language that supports negative lookback assertions you can use something like this.
(?<!\\)\$\w+
I think this is the cleanest approach, but unfortunately it's not supported by all languages.
This is a hackier implementation that may work as well.
(?:(^\$\w+)|[^\\](\$\w+))
This matches either
A literal $ at the beginning of a line followed by multiple word characters. Or...
A literal $ this is preceded by any character except a backslash.
Here is a working example.

How to capture group with optional number of letters/numbers followed by optional comma?

I'm altering the segment of a string in javascript and got the following working (first iteration -optional comma).
var foo = "wat:a,username:x,super:man"
foo.replace(/(username\:\w+)(?:,)*/,"go:home,");
//"wat:a,go:home,super:man"
The trick now is that I might actually replace a key/value with only the key ... so I need to capture the original group with both optional value + optional comma.
var foo = "wat:a,username:,super:man"
foo.replace(/ ????? /,"go:home,");
//"wat:a,go:home,super:man"
As a bonus I'd like the most concise way to capture both optional numbers/and letters (updating my original to also support)
var foo = "wat:a,username:999,super:man"
foo.replace(/ ????? /,"go:home,");
//"wat:a,go:home,super:man"

You need to replace the + (1 or more occurrences of the preceding subpattern) quantifier with the * (0 or more occurrences of the preceding subpattern).
See Quantifier Cheatsheet at rexegg.com:
A+ One or more As, as many as possible (greedy), giving up characters if the engine needs to backtrack (docile)
A* Zero or more As, as many as possible (greedy), giving up characters if the engine needs to backtrack (docile)
Besides, you are not using any of the capturing groups defined in the pattern, so I suggest removing them.
.replace(/username:\w*,*/,"go:home,")
^
And if you have just 1 optional ,, use just the ? quantifier (1 or 0 repetition of the preceding subpattern):
.replace(/username:\w*,?/,"go:home,")
^
Note that in case you can have any characters before the end of string or comma, you can also use Fede's suggestion of using a negated character class: /username:[^,]*,*/. The [^,]* matches any character (even a newline) other than a comma.
Also, please note that you do not need to escape a colon. The characters that must be escaped outside of character class to be treated as literals are ., *, +, ?, ^, $, {, (, ), |, [, ], \. See RegExp MDN reference.

I'm not sure if I understood your question, but if you want to match username:?? you can use below regex:
(username\:\w*)
Working demo
Update: As stribizhev, pointed in his comment \w* can do the trick, however if you want to extend the regex to any characters besides letters or numbers you can use:
(username\:[^,]*)

JavaScript regular expressions to match no digits, whitespace and selected symbols

Thanks for taking a look.
My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !#£$%^&*()+= or any other symbol I may choose.
I am however struggling to grasp precisely how regular expressions work.
I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.
So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.
Question 1
Is my reasoning up to this point correct?
The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.
This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.
Question 2
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Question 3
How would I go about writing the regular expression I'm trying to solve?
While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.

Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!
Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."
It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].
The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.
/^[^\d\s!#£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:
i. [] - match a range of characters
ii. []+ - match one or more of that range of characters
iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)
iv. [^\d\s!#£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match
v. ^[^\d\s!#£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match
A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!

Is my reasoning up to this point correct?
Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).
and [^\s] will match the first non-whitespace character
Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).
/^\D+[^\s]+$/ matching strings that contain spaces?
Yes, it does, because \D matches a space (space is not a digit).
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Because \D+ in /^\D+[^\s]+$/can match spaces.
Conclusion:
Use
^[^\d\s!#£$%^&*()+=]+$
It will match strings that have no digits and spaces, and the symbols you do not allow.
Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.

Just insert every character you don't want to include in a negated character class as follows:
^[^\s\d!#£$%^&*()+=]*$
DEMO
Debuggex Demo
^ - start of the string
[^...] - matches one character that is not in `...`
\s - matches a whitespace (space, newline,tab)
\d - matches a digit from 0 to 9
* - a quantifier that repeats immediately preceeding element by 0 or more times
so the regex matches any string that has
1. string that has a beginning
2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !#£$%^&*()+=) i.e., characters that are not included in the character class `[...]`
3.that has ending
NOTE:
If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class

We Keep Coding

JavaScript is the programming language of the Web.