Javascript regex: Ignore closing bracket when enclosed in parentheses - javascript

I have a regex with a couple of (optional) capture groups. I'm trying to add a new feature that allows a user to add content to one of the capture groups that matches the closing bracket of the regex. I'm struggling to ignore this match
The current regex is /\[(.+?)=(.+?)(?:\|(.+?))?(?:\:(.+?))?\]/g
This allows a user to target data according to:
[key=value|filter:filtervalue]
where filter and filtervalue are optional.
The problem is that for the value it should now be possible to target indexes in an array. For example:
[data=example(products[0].id)]
However, the regex matches only up to .id so the second capture group is example(products[0. I would like it to be example(products[0].id). I think I should be fine if I can ignore the closing bracket when it is wrapped by parentheses, but I've been unable to figure out how.
Examples that should be matched:
[data=example(products[0].id)]
[data=example(products[index].id)]
[data=regular]
[data=test|number:2]
I created a regex101. Any help is appreciated.

You may use
/\[([^=\]]+)=((?:\[[^\][]*]|[^\]])+?)(?:\|([^:\]]+):((?:\[[^\][]*]|[^\]])+?))?]/g
See the regex demo
Note that lazy dot matching patterns are replaced with more restrictive negated character classes to make sure there is no overflow from one part of the bracketed string to another. To allow matching 1st-level nested [...] substrings, a \[[^\][]*]|[^\]] pattern is used to match [...] or non-] substrings is used.
Details:
\[ - a literal [
([^=\]]+) - Group 1 capturing 1+ chars other than = and ]
= - a = symbols
((?:\[[^\][]*]|[^\]])+?) - Group 2 capturing 1+ occurrences (as few as possible) of:
\[[^\][]*] - [, then 0+ chars other than ] and [, and then ]
| - or
[^\]] - any char other than ]
(?:\|([^:\]]+):((?:\[[^\][]*]|[^\]])+?))? - an optional group matching:
\| - a | char
([^:\]]+) - Group 3 capturing 1+ chars other than : and ]
: - a colon
((?:\[[^\][]*]|[^\]])+?) - Group 4 capturing the same text as Group 2.
] - a literal ] char.

I would probably break it down to two separate regular expressions and "or" them.
Below I have used your expression to match the first kind of string:
\[(?:(.+?)=(.+)(?:\|(.+?)?\:(.+?)))\]
[key=value|filter:filtervalue]
And another for the second one:
\[(.+?)=(.+)\]
[data=example(products[0].id)]
Then concatenating them to:
\[(?:(.+?)=(.+)(?:\|(.+?)?\:(.+?))|(.+?)=(.+))\]
Where it first tries to match the tricky part and if that fails, resorts to the more general one.
https://regex101.com/r/d6LwEt/1

Related

JS regex match zero or more optional start with key words

Given string like keydown.capture.once.prevent.shift.control[arrowdown,arrowup]:silent I want to match shift.control[arrowdown,arrowup]:silent
And if the same keydown.capture.once.prevent.shift[arrowdown,arrowup]:silent then it should match shift[arrowdown,arrowup]:silent
Here the keywords shift | control are optional, ex: keydown.capture.once.prevent.[arrowdown,arrowup]:silent, matching string [arrowdown,arrowup]:silen
I wrote the below regex, it can only capture zero or one keyword, but the expected result is zero or all keywords matching into separate group
(shift|control)?\[(.*?)\]:silent, how can we capture all the keywords if they exist?
Additional notes: Order of keywords doesn't matter, ex it can be control.shift[]:silent
You may use any one of these regex solutions:
(?:(?:shift|control)\.?)*\[([^\]]*)\]:silent
This matches shift or control` optionally followed by a dit and repeats this non-capture group 0 or more times.
RegEx Demo
(?:shift(?:\.control)?|control(?:\.shift)?)?\[([^\]]*)\]:silent
This one matches shift.control OR control.shift in an optional non-capture group with the parts after dot optional in each alternation.
RegEx Demo 2
You could try this:
(?:(shift|control|shift\.control|control\.shift))?(\[.*\]):silent
Test and output: https://regex101.com/r/rMSd2l/1

Replace second last occurrence of a char(dot) in an email string using regex

Please i would love to replace the second last occurrence of a char in a, the length of the strings can vary but the delimiter is always same I will give some examples below and what I have tried
Input 1: james.sam.uri.stackoverflow.com
Output 1: james.sam.uri#stackoverflow.com
Input 2: noman.stackoverflow.com
Output 2: noman#stackoverflow.com
Input 3: queen.elizabeth.empire.co.uk
Output 3: queen.elizabeth#empire.co.uk
My solution
//This works but I don't want this as its not a regex solution
const e = "noman.stackoverflow.com"
var index = e.lastIndexOf(".", email.lastIndexOf(".")-1)
return ${e.substring(0,index)}#${e.substring(index+1)}
Regex
e.replace(/\.(\.*)/, #$1)
//this works for Input 2 not Input 1, i need regex that would work for both, it only matches the first dot
The issue in the example data for the second last dot, is that the last example ends on .co.uk
One option for these specific examples could be using a pattern to exclude that specific part.
(\S+)\.(?!co\.uk$)(\S*?\.[^\s.]+)$
(\S+) Capture group 1, match 1+ non whitespace chars
\.(?!co\.uk$) Match a . followed by a negative lookahead asserting directly to the right is not co.uk
( Capture group 2
\S*?\. Match 0+ times a non whitspace char non greedy and then a .
[^\s.]+ Match 1+ times a non whitespace char except a .
) Close group 2
$ End of string
See a regex demo.
[
"james.sam.uri.stackoverflow.com",
"noman.stackoverflow.com",
"queen.elizabeth.empire.co.uk"
].forEach(s =>
console.log(s.replace(/(\S+)\.(?!co\.uk$)(\S*?\.[^\s.]+)$/, "$1#$2"))
);
Here's another approach:
(\S+)\.(\S+\.\S{3,}?)$
( )$ At the end of the string, capture by
\S{3,}? lazily matching 3+ non-whitespace characters
\S+\. and any non-whitespace characters with period in front.
(\S+)\. Also capture anything before the separating period.
Notably, it would fail for an email like test.stackoverflow.co.net. If that format is a requirement, I'd recommend a different approach.
[
"james.sam.uri.stackoverflow.com",
"noman.stackoverflow.com",
"queen.elizabeth.empire.co.uk",
"test.stackoverflow.co.net"
].forEach(s =>
console.log(s.replace(/(\S+)\.(\S+\.\S{3,}?)$/, "$1#$2"))
);

Replace last word with asterisk, or last two words

I need to hide surname of persons. For persons with three words in their name, just hide last word, ej:
Laura Torres Bermudez
shoud be
Laura Torres ********
and for
Maria Fernanda Gonzales Lopez
should be
Maria Fernanda ******** *****
I think they are two regex because based on the number of words, regex will be applied.
I know \w+ replaces all word by a single asterisk, and with (?!\s). I can replace chars except spaces. I hope you can help me. Thanks.
This is my example:
https://regex101.com/r/yW4aZ3/942
Try this:
(?<=\w+\s+\w+\s+.*)[^\s]
Explanation:
?<= is a negative lookbehind - match only occurrences preceded by specified pattern
[^\s] / match everything except whitespace (what you used - (?!\s). - is actually weird use of lookahead - "look to next character, if it is not a whitespate; then match any character")
summary: replace any non-whitespace space character preceded by at least two sequences of letters (\w) and spaces (\s).
Just note that it won't hide anything for persons with only two words in their name (which is common in many countries).
Also, the regex has to be slightly modified for that testing tool to match one name per line - see https://regex101.com/r/yW4aZ3/943 (^ was added to match from start of each line and a "multi line" flag was set).
A JavaScript solution that does not rely on the ECMAScript 2018 extended regex features is
s = s.replace(/^(\S+\s+\S+)([\s\S]*)/, function($0, $1, $2) {return $1 + $2.replace(/\S/g, '*');})
Details:
^ - start of string
(\S+\s+\S+) - Group 1: one or more non-whitespaces, 1 or more whitespaces and then 1 or more non-whitespaces
([\s\S]*) - Group 2: any 1 or more chars.
The replacement is Group 1 contents and the contents of Group 2 with each non-whitespace char replaced with an asterisk.
Java solution:
s = s.replaceAll("(\\G(?!^)\\s*|^\\S+\\s+\\S+\\s+)\\S", "$1*");
See the regex demo
Details
(\G(?!^)\s*|^\S+\s+\S+\s+) - Group 1: either then end of the previous match (\G(?!^)) and 0 or more whitespaces or (|) 1+ non-whitespaces, 1+ whitespaces and again 1+ non-whitespaces, 1+ whitespaces at the start of the string
\S - a non-whitespace char.
Interested if this can be done in JavaScript without a callback, I came up with
str = str.replace(/^(\w+\W+\w+\W+\b)\w?|(?!^)(\W*)\w/gy, '$1$2*');
See this demo at regex101
The idea might look a bit confusing but it seems to work fine. It should fail on one or two words but start as soon, as there appears a word character after the first two words. Important to use the sticky flag y which is similar to the \G anchor (continue on last match) but always is bound to start.
To not add an additional asterisk, the ...\b)\w?... part after the first two words is essential. The word boundary will force a third word to start but the first capturing group is closed after \b and the first character of the third word will be consumed but not captured to correctly match the asterisk count.
The second capturing group on the right side of the alternation will capture any optional non word characters appearing between any words after the third one.
var strs = ['Foo', 'Foo Bar B', 'Laura Torres Bermudez', 'Maria Fernanda Gonzales Lopez'];
strs = strs.map(str => str.replace(/^(\w+\W+\w+\W+\b)\w?|(?!^)(\W*)\w/gy, '$1$2*'));
console.log(strs);

Match group before nth character and after that

I want to match everything before the nth character (except the first character) and everything after it. So for the following string
/firstname/lastname/some/cool/name
I want to match
Group 1: firstname/lastname
Group 2: some/cool/name
With the following regex, I'm nearly there, but I can't find the correct regex to also correctly match the first group and ignore the first /:
([^\/]*\/){3}([^.]*)
Note that I always want to match the 3rd forward slash. Everything after that can be any character that is valid in an URL.
Your regex group are not giving proper result because ([^\/]*\/){3} you're repeating captured group which will overwrite the previous matched group Read this
You can use
^.([^/]+\/[^/]+)\/(.*)$
let str = `/firstname/lastname/some/cool/name`
let op = str.match(/^.([^/]+\/[^/]+)\/(.*)$/)
console.log(op)
Ignoring the first /, then capturing the first two words, then capturing the rest of the phrase after the /.
^(:?\/)([^\/]+\/[^\/]+)\/(.+)
See example
The quantifier {3} repeats 3 times the capturing group, which will have the value of the last iteration.
The first iteration will match /, the second firstname/ and the third (the last iteration) lastname/ which will be the value of the group.
The second group captures matching [^.]* which will match 0+ not a literal dot which does not take the the structure of the data into account.
If you want to match the full pattern, you could use:
^\/([^\/]+\/[^\/]+)\/([^\/]+(?:\/[^\/]+)+)$
Explanation
^ Start of string
( Capture group 1
[^\/]+/[^\/]+ Match 2 times not a / using a negated character class then a /
) Close group
\/ Match /
( Capture group 2
[^\/]+ Match 1+ times not /
(?:\/[^\/]+)+ Repeat 1+ times matching / and 1+ times not / to match the pattern of the rest of the string.
) Close group
$ End of string
Regex demo

Grab full regex word if pattern inside it matches

How do I retrieve an entire word that has a specific portion of it that matches a regex?
For example, I have the below text.
Using ^.[\.\?\!:;,]{2,} , I match the first 3, but not the last. The last should be matched as well, but $ doesn't seem to produce anything.
a!!!!!!
n.......
c..,;,;,,
huhuhu..
I want to get all strings that have an occurrence of certain characters equal to or more than twice. I produced the aforementioned regex, but on Rubular it only matches the characters themselves, not the entire string. Using ^ and $
I've read a few stackoverflow posts similar, but not quite what I'm looking for.
Change your regex to:
/^.*[.?!:;,]{2,}/gm
i.e. match 0 more character before 2 of those special characters.
RegEx Demo
If I understand well you are trying to match an entire string that contains at least the same punctuation character two times:
^.*?([.?!:;,])\1.*
Note: if your string has newline characters, change .* to [\s\S]*
The trick is here:
([.?!:;,]) # captures the punct character in group 1
\1 # refers to the character captured in group 1

Categories