Understanding replacement using regex - javascript

I want to remove all trailing and leading dashes (-) and replace any repeating dashes with one dash otherwise in JavaScript. I've developed a regex to do it:
"----asdas----asd-as------q---".replace(/^-+()|()-+$|(-)+/g,'$3')
And it works:
asdas-asd-as-q
But I don't understand the $3 part (obtained through desperate experiment). Why not $1?

You can actually use this without any capturing groups:
"----asdas----asd-as------q---".replace(/^-+|-+$|-+(?=-)/g, '');
//=> "asdas-asd-as-q"
Here -+(?=-) is a positive lookahead that makes sure to match 1 or more hyphens except the last - in the match.

Because there are 3 capturing groups. (two redundant empty ones and (-)). $3 replaced with the string that matched the third group.
If you remove the first two empty capturing groups, you can use $1.
"----asdas----asd-as------q---".replace(/^-+|-+$|(-)+/g, '$1')
// => "asdas-asd-as-q"

As other answers say, $3 indicates the third captured subpattern, ie. third set of parentheses.
Personally, however, I would see that as two operations, and do it as such:
Trim leading and trailing -s
Condense duplicate -s
Like so:
"----asdas----asd-as------q---".replace(/^-+|-+$/g,"").replace(/--+/g,"-");
This kind of concept may mean more code, but I believe it makes it much easier to read and understand what's going on here, because you're doing one thing at a time instead of trying to do everything at once.

$ are the replacement groups being formed.
See demo.
http://regex101.com/r/pP3pN1/25
On the right side you can see the groups being generated by ().
Replace and see.$1 is blank in your case.

Related

Optional regex string/pattern sections with and without non-capturing groups

Here's what I'm trying to do:
http://i.imgur.com/Xqrf8Wn.png
Simply take a URL with 3 groups, $1 not so important, $2 & $3 are but $2 is totally optional including (obviously) the corresponding backslash when present, which is all I am trying to make optional. I get that it can/should? be in a non-cap group, but does it HAVE to be? I've seen enough now seems to indicate it does not HAVE to be. If possible, I'd really like to have someone explain it so I can try to fully understand it, and not just get one possible working answer handed to me to simply copy, like some come here seeking.
Here's my regex string(s) tried and at best only currently matching second URL string with optional present:
^https:\/\/([a-z]{0,2})\.?blah\.com(?:\/)(.*)\/required\/B([A-Z0-9]{9}).*
^https:\/\/([a-z]{0,2})\.?blah\.com(\/)?(.*)\/required\/B([A-Z0-9]{9}).*
^https:\/\/([a-z]{0,2})\.?blah\.com(?:\/)?(.*)?\/required\/B([A-Z0-9]{9}).*
Here are the two URLs that I want to capture group 2 & 3, with 1 and 2 being optional, but $2 being the problem. I've tried all the strings above and have yet to get it to match the string when the optional is NOT present and I believe it must be due to the backslashes?
https://blah.com/required/B7BG0Z0GU1A
https://blah.com/optional/required/B7BG0Z0GU1A
Making a part of the pattern optional is as simple as adding ?, and your last two attempts both work: https://regex101.com/r/RIKvYY/1
Your mistake is that your test is wrong - you are using ^ which matches the beginning of the string. You need to add the /m flag (multiline) to make it match the beginning of each line. This is the reason your patterns never match the second line...
Note that you're allowing two slashes (//required, for example). You can solve it by joining the first slash and the optional part to the same capturing group (of course, as long as you are using .* you can still match multiple slashes):
https:\/\/([a-z]{0,2})\.?blah\.com(?:\/(.*))?\/required\/B([A-Z0-9]{9}).*

Capture everything between constants

I want to capture everything between every instance of User. and a space, including User.
So given a test string of
psdojfsdf User.sdoinwpoiev.spoinwelsdknonfsjfnw ldnkfwwdf sdf User.sdoinffon.ribwgg
I want it to capture User.sdoinwpoiev.spoinwelsdknonfsjfnw and User.sdoinffon.ribwgg
I've gotten this far: /(User\..*)\s/, but this captures everything until the last space.
The way I believe is best is to tell it to match everything but space rather than everything. That gives:
/(User\.\S*)/
Another alternative is to use a non-greedy match, but I think that's less clear:
/(User\..*?)\s/
use a non-greedy quantifier:
/(User\..*?)\s/
See regular-expressions.info for details about greediness of repetition operators.
Note that this won't work if the word ends at the end of the input string, if there's no space at the end. Coenwulf's answer may be better, as it doesn't have this problem.
Use the *? non-greedy zero or more
/User\.[^ \s]*?/g
Also if you want to force it to have something between the dot and the space
/User\.[^ \s]+?/g
Or if you want it to be alphanumeric
/User\.[a-zA-Z_$]+?[a-zA-Z_$0-9]*?( | |\s)/g
If you want to allow line breaks between the dot and the property identifier
/User\.[a-zA-Z_$]+?[a-zA-Z_$0-9]*?(\n| | |\s)/gm

Match simple regex pattern using JS (key: value)

I have a simple scenario where I want to match the follow and capture the value:
stuff_in_string,
env: 'local', // want to match this and capture the content in quotes
more_stuff_in_string
I have never written a regex pattern before so excuse my attempt, I am well aware it is totally wrong.
This is what I am trying to say:
Match "env:"
Followed by none or more spaces
Followed by a single or double quote
Capture all until..
The next single or double quote
/env:*?\s+('|")+(.*?)+('|")/g
Thanks
PS here is a #failed fiddle: http://jsfiddle.net/DfHge/
Note: this is the regex I ended up using (not the answer below as it was overkill for my needs): /env:\s+(?:"|')(\w+)(?:"|')/
You can use this:
/\benv: (["'])([^"']*)\1/g
where \1 is a backreference to the first capturing group, thus your content is in the second. This is the simple way for simple cases.
Now, other cases like:
env: "abc\"def"
env: "abc\\"
env: "abc\\\def"
env: "abc'def"
You must use a more constraining pattern:
first: avoid the different quotes problem:
/\benv: (["'])((?:[^"']+|(?!\1)["'])*)\1/g
I put all the possible content in a non capturing group that i can repeat at will, and I use a negative lookahead (?!\1) to check if the allowed quote is not the same as the captured quote.
second: the backslash problem:
If a quote is escaped, it can't be the closing quote! Thus you must check if the quote is escaped or not and allow escaped quotes in the string.
I remove the backslashes from allowed content:
/\benv: (["'])((?:[^"'\\]+|(?!\1)["'])*)\1/g
I allow escaped characters:
/\benv: (["'])((?:[^"'\\]+|(?!\1)["']|\\[\s\S])*)\1/g
To allow a variable number of spaces before the quoted part, you can replace : by :\s*
/\benv:\s*(["'])((?:[^"'\\]+|(?!\1)["']|\\[\s\S])*)\1/g
You have now a working pattern.
third: pattern optimization
a simple alternation:
Using a capture group and a backreferences can be seducing to deal with the different type of quotes since it allow to write the pattern in a concise way. However, this way needs to create a capture group and to test a lookahead in this part (?!\1)["']`, so it is not so efficient. Writing a simple alternation increases the pattern length and needs to use two captures groups for the two cases but is more efficient:
/\benv:\s*(?:"((?:[^"\\]+|\\[\s\S])*)"|'((?:[^'\\]+|\\[\s\S])*)')/g
(note: if you decided to do that you must check which one of the two capture groups is defined.)
unrolling the loop:
To match the content inside quotes we use (?:[^"\\]+|\\[\s\S])* (for double quotes here) that works but can be improved to reduce the amount of steps needed. To do that we will unroll the loop that consists to avoid the alternation:
[^"\\]*(?:\\[\s\S][^"\\]*)*
finally the whole pattern can be written like this:
/\benv:\s*(?:"([^"\\]*(?:\\[\s\S][^"\\]*)*)"|'([^'\\]*(?:\\[\s\S][^'\\]*)*)')/g
env *('|").*?\1 is what you're looking for
the * means none or more
('|") matches either a single or double quote, and also saves it into a group for backreferencing
.*? is a reluctant greedy match all
\1 will reference the first group, which was either a single or double quote
regex=/env: ?['"]([^'"])+['"]/
answer=str.match(regex)[1]
even better:
regex=/env: ?(['"])([^\1]*)\1/

regular expression for ends with some word

I want to build regular expression for series
cd1_inputchk,rd_inputchk,optinputchk where inputchk is common (ending characters)
please guide for the same
Very simply, it's:
/inputchk$/
On a per-word basis (only testing matching /inputchk$/.test(word) ? 'matches' : 'doesn\'t match';). The reason this works, is it matches "inputchk" that comes at the end of a string (hence the $)
As for a list of words, it starts becoming more complicated.
Are there spaces in the list?
Are they needed?
I'm going to assume no is the answer to both questions, and also assume that the list is comma-separated.
There are then a couple of ways you could proceed. You could use list.split() to get an array of each word, and teast each to see if they end in inputchk, or you could use a modified regular expression:
/[^,]*inputchk(?:,|$)/g
This one's much more complicated.
[^,] says to match non-, characters
* then says to match 0 or more of those non-, chars. (it will be greedy)
inputchk matches inputchk
(?:...) is a non-capturing parenthesis. It says to match the characters, but not store the match as part of the result.
, matches the , character
| says match one side or the other
$ says to match the end of the string
Hopefully all of this together will select the strings that you're looking for, but it's very easy to make a mistake, so I'd suggest doing some rigorous testing to make sure there aren't any edge-conditions that are being missed.
This one should work (dollar sign basically means "end of string"):
/inputchk$/

Struggling with regex to match only two of a character, not three

I need to match all occurrences of // in a string in a Javascript regex
It can't match /// or /
So far I have (.*[^\/])\/{2}([^\/].*)
which is basically "something that isn't /, followed by // followed by something that isn't /"
The approach seems to work apart from when the string I want to match starts with //
This doesn't work:
//example
This does
stuff // example
How do I solve this problem?
Edit: A bit more context - I am trying to replace // with !, so I am then using:
result = result.replace(myRegex, "$1 ! $2");
Replace two slashes that either begin the string or do not follow a slash,
and are followed by anything not a slash or the end of the string.
s=s.replace(/(^|[^/])\/{2}([^/]|$)/g,'$1!$2');
It looks like it wouldn't work for example// either.
The problem is because you're matching // preceded and followed by at least one non-slash character. This can be solved by anchoring the regex, and then you can make the preceding/following text optional:
^(.*[^\/])?\/{2}([^\/].*)?$
Use negative lookahead/lookbehind assertions:
(.*)(?<!/)//(?!/)(.*)
Use this:
/([^/]*)(\/{2})([^/]*)/g
e.g.
alert("///exam//ple".replace(/([^/]*)(\/{2})([^/]*)/g, "$1$3"));
EDIT: Updated the expression as per the comment.
/[/]{2}/
e.g:
alert("//example".replace(/[/]{2}/, ""));
This does not answer the OP's question about using regex, but since some of the original comments suggested using .replaceAll, since not everyone who reads the question in the future wants to use regex, since people might mistakenly assume that regex is the only alternative, and since these details cannot be accommodated by submitting a comment, here's a poor man's non-regex approach:
Temporarily replace the three contiguous characters with something that would never naturally occur — really important when dealing with user-entered values.
Replace the remaining two contiguous characters using .replaceAll().
Return the original three contiguous characters.
For instance, let's say you wanted to remove all instances of ".." without affecting occurrences of "...".
var cleansedText = $(this).text().toString()
.replaceAll("...", "☰☸☧")
.replaceAll("..", "")
.replaceAll("☰☸☧", "...")
;
$(this).text(cleansedText);
Perhaps not as fast as regex for longer strings, but works great for short ones.

Categories