Javascript replace method, replace with "$1" - javascript

I'm reading Sitepoints 2007 book "Simply Javascript" and I encountered some code I just can't understand.
It's the following code:
Core.removeClass = function(target, theClass)
{
var pattern = new RegExp("(^| )" + theClass + "( |$)");
target.className = target.className.replace(pattern, "$1");
target.className = target.className.replace(/ $/, "");
};
The first call to the replace method is what puzzles me, I don't understand where the "$1" value comes from or what it means. I would think that the call should replace the found pattern with "".

Each pair of parentheses (...) where the first character is not a ?* is a "capturing group", which places its result into $1,$2,$3,etc which can be used in the replacement pattern.
You might also see the same thing as \1,\2,\3 in other regex engines, (or indeed in the original expression sometimes, for repetition)
These are called "backreferences", because they generally refer back to (an earlier) part of in the expression.
(*The ? indicates various forms of special behaviour, including a non-capturing group which is (?:...) and simply groups without capturing.)
In your specific example, the $1 will be the group (^| ) which is "position of the start of string (zero-width), or a single space character".
So by replacing the whole expression with that, you're basically removing the variable theClass and potentially a space after it. (The closing expression ( |$) is the inverse - a space or the string end position - and since its value isn't used, could have been non-capturing with (?: |$) instead.)
Hopefully this explains everything ok - let me know if you want any more info.
Also, here's some further reading from the site regular-expressions.info:
Groups and Backreferences
Atomic Grouping (doesn't work in JS, but interesting)
Lookaround groups (partial support in JS regex)

$1 is a backreference. It will be replaced by whatever the first matching group (set of parenthesis) in your regex matches.

Related

Regex matching string inside brackets [duplicate]

My regex pattern looks something like
<xxxx location="file path/level1/level2" xxxx some="xxx">
I am only interested in the part in quotes assigned to location. Shouldn't it be as easy as below without the greedy switch?
/.*location="(.*)".*/
Does not seem to work.
You need to make your regular expression lazy/non-greedy, because by default, "(.*)" will match all of "file path/level1/level2" xxx some="xxx".
Instead you can make your dot-star non-greedy, which will make it match as few characters as possible:
/location="(.*?)"/
Adding a ? on a quantifier (?, * or +) makes it non-greedy.
Note: this is only available in regex engines which implement the Perl 5 extensions (Java, Ruby, Python, etc) but not in "traditional" regex engines (including Awk, sed, grep without -P, etc.).
location="(.*)" will match from the " after location= until the " after some="xxx unless you make it non-greedy.
So you either need .*? (i.e. make it non-greedy by adding ?) or better replace .* with [^"]*.
[^"] Matches any character except for a " <quotation-mark>
More generic: [^abc] - Matches any character except for an a, b or c
How about
.*location="([^"]*)".*
This avoids the unlimited search with .* and will match exactly to the first quote.
Use non-greedy matching, if your engine supports it. Add the ? inside the capture.
/location="(.*?)"/
Use of Lazy quantifiers ? with no global flag is the answer.
Eg,
If you had global flag /g then, it would have matched all the lowest length matches as below.
Here's another way.
Here's the one you want. This is lazy [\s\S]*?
The first item:
[\s\S]*?(?:location="[^"]*")[\s\S]* Replace with: $1
Explaination: https://regex101.com/r/ZcqcUm/2
For completeness, this gets the last one. This is greedy [\s\S]*
The last item:[\s\S]*(?:location="([^"]*)")[\s\S]*
Replace with: $1
Explaination: https://regex101.com/r/LXSPDp/3
There's only 1 difference between these two regular expressions and that is the ?
The other answers here fail to spell out a full solution for regex versions which don't support non-greedy matching. The greedy quantifiers (.*?, .+? etc) are a Perl 5 extension which isn't supported in traditional regular expressions.
If your stopping condition is a single character, the solution is easy; instead of
a(.*?)b
you can match
a[^ab]*b
i.e specify a character class which excludes the starting and ending delimiiters.
In the more general case, you can painstakingly construct an expression like
start(|[^e]|e(|[^n]|n(|[^d])))end
to capture a match between start and the first occurrence of end. Notice how the subexpression with nested parentheses spells out a number of alternatives which between them allow e only if it isn't followed by nd and so forth, and also take care to cover the empty string as one alternative which doesn't match whatever is disallowed at that particular point.
Of course, the correct approach in most cases is to use a proper parser for the format you are trying to parse, but sometimes, maybe one isn't available, or maybe the specialized tool you are using is insisting on a regular expression and nothing else.
Because you are using quantified subpattern and as descried in Perl Doc,
By default, a quantified subpattern is "greedy", that is, it will
match as many times as possible (given a particular starting location)
while still allowing the rest of the pattern to match. If you want it
to match the minimum number of times possible, follow the quantifier
with a "?" . Note that the meanings don't change, just the
"greediness":
*? //Match 0 or more times, not greedily (minimum matches)
+? //Match 1 or more times, not greedily
Thus, to allow your quantified pattern to make minimum match, follow it by ? :
/location="(.*?)"/
import regex
text = 'ask her to call Mary back when she comes back'
p = r'(?i)(?s)call(.*?)back'
for match in regex.finditer(p, str(text)):
print (match.group(1))
Output:
Mary

Javascript, matching multiple instances of all text between two flags [duplicate]

My regex pattern looks something like
<xxxx location="file path/level1/level2" xxxx some="xxx">
I am only interested in the part in quotes assigned to location. Shouldn't it be as easy as below without the greedy switch?
/.*location="(.*)".*/
Does not seem to work.
You need to make your regular expression lazy/non-greedy, because by default, "(.*)" will match all of "file path/level1/level2" xxx some="xxx".
Instead you can make your dot-star non-greedy, which will make it match as few characters as possible:
/location="(.*?)"/
Adding a ? on a quantifier (?, * or +) makes it non-greedy.
Note: this is only available in regex engines which implement the Perl 5 extensions (Java, Ruby, Python, etc) but not in "traditional" regex engines (including Awk, sed, grep without -P, etc.).
location="(.*)" will match from the " after location= until the " after some="xxx unless you make it non-greedy.
So you either need .*? (i.e. make it non-greedy by adding ?) or better replace .* with [^"]*.
[^"] Matches any character except for a " <quotation-mark>
More generic: [^abc] - Matches any character except for an a, b or c
How about
.*location="([^"]*)".*
This avoids the unlimited search with .* and will match exactly to the first quote.
Use non-greedy matching, if your engine supports it. Add the ? inside the capture.
/location="(.*?)"/
Use of Lazy quantifiers ? with no global flag is the answer.
Eg,
If you had global flag /g then, it would have matched all the lowest length matches as below.
Here's another way.
Here's the one you want. This is lazy [\s\S]*?
The first item:
[\s\S]*?(?:location="[^"]*")[\s\S]* Replace with: $1
Explaination: https://regex101.com/r/ZcqcUm/2
For completeness, this gets the last one. This is greedy [\s\S]*
The last item:[\s\S]*(?:location="([^"]*)")[\s\S]*
Replace with: $1
Explaination: https://regex101.com/r/LXSPDp/3
There's only 1 difference between these two regular expressions and that is the ?
The other answers here fail to spell out a full solution for regex versions which don't support non-greedy matching. The greedy quantifiers (.*?, .+? etc) are a Perl 5 extension which isn't supported in traditional regular expressions.
If your stopping condition is a single character, the solution is easy; instead of
a(.*?)b
you can match
a[^ab]*b
i.e specify a character class which excludes the starting and ending delimiiters.
In the more general case, you can painstakingly construct an expression like
start(|[^e]|e(|[^n]|n(|[^d])))end
to capture a match between start and the first occurrence of end. Notice how the subexpression with nested parentheses spells out a number of alternatives which between them allow e only if it isn't followed by nd and so forth, and also take care to cover the empty string as one alternative which doesn't match whatever is disallowed at that particular point.
Of course, the correct approach in most cases is to use a proper parser for the format you are trying to parse, but sometimes, maybe one isn't available, or maybe the specialized tool you are using is insisting on a regular expression and nothing else.
Because you are using quantified subpattern and as descried in Perl Doc,
By default, a quantified subpattern is "greedy", that is, it will
match as many times as possible (given a particular starting location)
while still allowing the rest of the pattern to match. If you want it
to match the minimum number of times possible, follow the quantifier
with a "?" . Note that the meanings don't change, just the
"greediness":
*? //Match 0 or more times, not greedily (minimum matches)
+? //Match 1 or more times, not greedily
Thus, to allow your quantified pattern to make minimum match, follow it by ? :
/location="(.*?)"/
import regex
text = 'ask her to call Mary back when she comes back'
p = r'(?i)(?s)call(.*?)back'
for match in regex.finditer(p, str(text)):
print (match.group(1))
Output:
Mary

match of javascript string for regular expressions is not tokenizing correctly

Trying to apply regular expression to the below string
Field "saveUserId" argument "idTwo" of type "String!" is required but not provided.
and have came up with a RegExp pattern such as this
var rePattern = new RegExp(/Field (.)+ argument (.)+ of type (.)+ is required but not provided./);
var arrMatches = e.message.match(rePattern);
console.log(arrMatches[0]);
console.log(arrMatches[1]);
am expecting arrMatches[0] to produce the output "saveUserId"
and arrMatches[1] to produce the output "idTwo".
However instead it is returning
arrMatches[0] = Field "saveUserId" argument "idTwo" of type "String!" is required but not provided.
arrMatches[1] = "
You've got two problems :
arrMatches[0] contains the full match, the groups are accessible from arrMatches[1] to arrMatches[1+n]
Your capturing groups only contains 1 character : you want to include the quantifier inside the group to avoid only capturing the last character it matches ; use (.+) instead of (.)+
As Wiktor Stribiżew mentions using lazy quantifiers would be an optimization as it would avoid backtracking : without it the .+ will match as much as it can (reaching the end of the string), then backtrack until the next tokens can match, while with .+? the next token will be tested after each character . matches.
Note that this isn't an optimisation you can apply blindly ; I think a good rule of thumb is estimating whether the end of your match is closer to the end of the text, in which case backtracking will be more efficient - or to the start of the match, in which case lazy quantifier will be more efficient. It all boils down to the number of time the next token(s) will have to be tested.
A better optimization yet if your fields are guaranteed not to contain any " (escaped or not) would be to match them using the negated character class [^"] instead of ., which will make sure not to match further than the enclosing quotes.

Need help writing a regex pattern

I am trying to find a pattern in a string that has a value that starts with ${ and ends with }. There will be a word between the curly brackets, but I won't know what word it is.
This is what I have \$\\{[a-zA-Z]\\}
${a} works, but ${aa} doesn't. It seems it's only looking for a single character.
I am unsure what I am doing wrong, or how to fix it and would appreciate any help anyone can provide.
I think this could help you
var str = "The quick brown ${fox} jumps over the lazy ${dog}";
var re = /\$\{([a-z]+)\}/gi;
var match;
while (match = re.exec(str)) {
console.log(match[1]);
}
Click Run code snippet and check your developer console for output
"fox"
"dog"
Explanation
+ means match 1 or more of the previous term — in this example, match 1 or more of [a-z]
the (...) parentheses will "capture" the match so you can actually do something with it — in my example, I'm just using console.log to output it
the i modifier (at the end of the regexp) means perform a case-insensitive match
the g modifier means match all instances of this regexp in the target string
The while loop will continue running for each match that re.exec finds. Once re.exec cannot match another instance, it will return null and the loop will exit.
Additional information
Try console.log(match) using the code above. Each match comes with other useful information such as the string index where the match occurred
Gotchas
This will not work for nested ${} sets
For example, this regexp will not work on "The quick brown ${fox jumps ${over}} the lazy ${dog}."
You're close!
All you need is to use a + to tell the expression that there will be one or more of whatever was just before it (in this case [a-zA-Z]) like this:
\${[a-zA-Z]+}
A good website for regex reference and testing is http://rubular.com/
It looks like you need to add a +, which tells the regex to look for one or more of a character.
Try: \${[a-zA-Z]+}
You need to use * (zero or more) or + (one or more). So this [a-zA-Z] would be [a-zA-Z]+, meaning 1 or more letters. The entire regex would look like:
\$\{[a-zA-Z]+\}

What does /;/ and /^ +/ denote?

I recently came across the statement :
var cookies = document.cookie.split(/;/);
and
var pair = allCookies[i].split("=", 2);
if (pair[0].replace(/^ +/, "") == "lastvisit")
In the first statement what does /;/ in the argument of split denote ?
In the second statement what does /^ +/ in the argument of replace denote ?
These are Regular Expressions.
Javascript supports them natively.
In this particular example:
.split(/;/) uses ; as the split character;
.replace(/^ +/, "") removes ("") any (+) leading (^) whitespace ().
In both examples, / surround or delimit the regular expression (or "regex"), informing Javascript that you're providing a regex.
Follow the links provided above for more information; regexes are broad in scope and worth learning.
Slashes delimit a regular expression, just like quotes delimit a string.
/;/ matches a semi-colon. Specifically:
var cookies = document.cookie.split(/;/);
Means we split the document.cookie string into an array, splitting it where there are semicolons. So it would take something like "a;b;c" and turn it into ["a", "b", "c"].
pair[0].replace(/^ +/, "")
Just strips all leading whitespace. It turns
" lastvisit"
into
"lastvisit"
The caret ^ means "beginning of line", it's followed by space, and the + means to repeat the space one or more times, as many as possible.
The // syntax denotes a regular expression (also known as a 'regex').
Regex is a syntax for searching and replacing strings.
The first example you gave is /;/. This is a very simply regex which just searches the string for semi-colons, and then splits it into an array based on the result. Since this is not using any special regex functionality, it could just as easily have been expressed as a simple string, ie split(";") (as has been done with the equal sign in your other example), without making any difference to the result.
The second example is /^ +/. This is more complex and requires a bit of knowledge of how regex works. In short, what it is doing is searching for leading spaces on a string, and removing them.
To learn more about regex, I recommend this site as a good starting point: http://www.regular-expressions.info/
Hope that helps.
I think that /^ +/ means: one or more no-" " characters

Categories