Match group before nth character and after that - javascript

I want to match everything before the nth character (except the first character) and everything after it. So for the following string
/firstname/lastname/some/cool/name
I want to match
Group 1: firstname/lastname
Group 2: some/cool/name
With the following regex, I'm nearly there, but I can't find the correct regex to also correctly match the first group and ignore the first /:
([^\/]*\/){3}([^.]*)
Note that I always want to match the 3rd forward slash. Everything after that can be any character that is valid in an URL.

Your regex group are not giving proper result because ([^\/]*\/){3} you're repeating captured group which will overwrite the previous matched group Read this
You can use
^.([^/]+\/[^/]+)\/(.*)$
let str = `/firstname/lastname/some/cool/name`
let op = str.match(/^.([^/]+\/[^/]+)\/(.*)$/)
console.log(op)

Ignoring the first /, then capturing the first two words, then capturing the rest of the phrase after the /.
^(:?\/)([^\/]+\/[^\/]+)\/(.+)
See example

The quantifier {3} repeats 3 times the capturing group, which will have the value of the last iteration.
The first iteration will match /, the second firstname/ and the third (the last iteration) lastname/ which will be the value of the group.
The second group captures matching [^.]* which will match 0+ not a literal dot which does not take the the structure of the data into account.
If you want to match the full pattern, you could use:
^\/([^\/]+\/[^\/]+)\/([^\/]+(?:\/[^\/]+)+)$
Explanation
^ Start of string
( Capture group 1
[^\/]+/[^\/]+ Match 2 times not a / using a negated character class then a /
) Close group
\/ Match /
( Capture group 2
[^\/]+ Match 1+ times not /
(?:\/[^\/]+)+ Repeat 1+ times matching / and 1+ times not / to match the pattern of the rest of the string.
) Close group
$ End of string
Regex demo

Related

Replace second last occurrence of a char(dot) in an email string using regex

Please i would love to replace the second last occurrence of a char in a, the length of the strings can vary but the delimiter is always same I will give some examples below and what I have tried
Input 1: james.sam.uri.stackoverflow.com
Output 1: james.sam.uri#stackoverflow.com
Input 2: noman.stackoverflow.com
Output 2: noman#stackoverflow.com
Input 3: queen.elizabeth.empire.co.uk
Output 3: queen.elizabeth#empire.co.uk
My solution
//This works but I don't want this as its not a regex solution
const e = "noman.stackoverflow.com"
var index = e.lastIndexOf(".", email.lastIndexOf(".")-1)
return ${e.substring(0,index)}#${e.substring(index+1)}
Regex
e.replace(/\.(\.*)/, #$1)
//this works for Input 2 not Input 1, i need regex that would work for both, it only matches the first dot
The issue in the example data for the second last dot, is that the last example ends on .co.uk
One option for these specific examples could be using a pattern to exclude that specific part.
(\S+)\.(?!co\.uk$)(\S*?\.[^\s.]+)$
(\S+) Capture group 1, match 1+ non whitespace chars
\.(?!co\.uk$) Match a . followed by a negative lookahead asserting directly to the right is not co.uk
( Capture group 2
\S*?\. Match 0+ times a non whitspace char non greedy and then a .
[^\s.]+ Match 1+ times a non whitespace char except a .
) Close group 2
$ End of string
See a regex demo.
[
"james.sam.uri.stackoverflow.com",
"noman.stackoverflow.com",
"queen.elizabeth.empire.co.uk"
].forEach(s =>
console.log(s.replace(/(\S+)\.(?!co\.uk$)(\S*?\.[^\s.]+)$/, "$1#$2"))
);
Here's another approach:
(\S+)\.(\S+\.\S{3,}?)$
( )$ At the end of the string, capture by
\S{3,}? lazily matching 3+ non-whitespace characters
\S+\. and any non-whitespace characters with period in front.
(\S+)\. Also capture anything before the separating period.
Notably, it would fail for an email like test.stackoverflow.co.net. If that format is a requirement, I'd recommend a different approach.
[
"james.sam.uri.stackoverflow.com",
"noman.stackoverflow.com",
"queen.elizabeth.empire.co.uk",
"test.stackoverflow.co.net"
].forEach(s =>
console.log(s.replace(/(\S+)\.(\S+\.\S{3,}?)$/, "$1#$2"))
);

JS regex: one correct match out of three and one false match

This JS regex error is killing me - one correct match out of three and one false match.
If it makes a difference I am writing my script in Google Apps Script.
I have a string (xml formatted) I want to match three date nodes as follows:
<dateCreated>1619155581543</dateCreated>
<dispatchDate>1619478000000</dispatchDate>
<deliveryDate>1619564400000</deliveryDate>
I don't care about the tags so much - I just need enough to reliably replace them. I am using this regular expression:
var regex = new RegExp('[dD]ate(.{1,})?>[0-9]{13,}</');
These are the matches:
dateCreated>1619155581543</
Created
Obviously I understand number 1 - I wanted that. But I do not understand how 2 was matched. Also why were dispatchDate and deliveryDate not matched? All three targets are matched if I use the above regex in BBEdit and on https://ihateregex.io/playground and neither of those match "Created".
I've also tried this regular expression without success:
var regex = new RegExp('[dD]ate.{0,}>[0-9]{13,}</');
If you can't answer why my regex fails but you can offer a working solution I'd still be happy with that.
The first pattern that you tried [dD]ate(.{1,})?>[0-9]{13,}</ matches:
[dD]ate Match date or Date
(.{1,})? Optional capture group, match 1+ times any char (This group will capture Created)
> Match literally
[0-9]{13,} Match 13 or more digits 0-9
</ Match literally
What you will get are partial matches from date till </ and the first capture group will contain Created
The second pattern is almost the same, except for {0,} which matches 0 or more times, and there is no capture group.
Still this will give you partial matches.
What you could do to match the whole element is either harvest the power of an XML parser (which would be the recommended way) or use a pattern what assumes only digits between the tags and no < > chars between the opening an closing.
Note that this is a brittle solution.
<([^<>]*[dD]ate[^<>]*)>\d{13}<\/\1>
< Match literally
( Capture group 1 (This group is used for the backreference \1 at the end of the pattern
[^\s<>]* Match 0+ times any character except < or >
[dD]ate[^<>]* Match either date or Date followed 0+ times any char except < or >
) Close group 1
> Match literally
\d{13} Match 13 digits (or \d{13,} for 13 or more
<\/\1> Match </ then a backreference to the exact text that is captured in group 1 (to match the name of the closing tag) and then match >
Regex demo
A bit more restricted pattern could be allowing only word characters \w around matching date
<(\w*[dD]ate\w*)>\d{13}<\/\1>
Regex demo
const regex = /<([^<>]*[dD]ate[^<>]*)>\d{13}<\/\1>/;
[
"<dateCreated>1619155581543</dateCreated>",
"<dispatchDate>1619478000000</dispatchDate>",
"<deliveryDate>1619564400000</deliveryDate>",
"<thirteendigits>1619564400000</thirteendigits>",
].forEach(str => {
const match = str.match(regex);
console.log(match ? `Match --> ${str}` : `No match --> ${str}`)
});

Replace last word with asterisk, or last two words

I need to hide surname of persons. For persons with three words in their name, just hide last word, ej:
Laura Torres Bermudez
shoud be
Laura Torres ********
and for
Maria Fernanda Gonzales Lopez
should be
Maria Fernanda ******** *****
I think they are two regex because based on the number of words, regex will be applied.
I know \w+ replaces all word by a single asterisk, and with (?!\s). I can replace chars except spaces. I hope you can help me. Thanks.
This is my example:
https://regex101.com/r/yW4aZ3/942
Try this:
(?<=\w+\s+\w+\s+.*)[^\s]
Explanation:
?<= is a negative lookbehind - match only occurrences preceded by specified pattern
[^\s] / match everything except whitespace (what you used - (?!\s). - is actually weird use of lookahead - "look to next character, if it is not a whitespate; then match any character")
summary: replace any non-whitespace space character preceded by at least two sequences of letters (\w) and spaces (\s).
Just note that it won't hide anything for persons with only two words in their name (which is common in many countries).
Also, the regex has to be slightly modified for that testing tool to match one name per line - see https://regex101.com/r/yW4aZ3/943 (^ was added to match from start of each line and a "multi line" flag was set).
A JavaScript solution that does not rely on the ECMAScript 2018 extended regex features is
s = s.replace(/^(\S+\s+\S+)([\s\S]*)/, function($0, $1, $2) {return $1 + $2.replace(/\S/g, '*');})
Details:
^ - start of string
(\S+\s+\S+) - Group 1: one or more non-whitespaces, 1 or more whitespaces and then 1 or more non-whitespaces
([\s\S]*) - Group 2: any 1 or more chars.
The replacement is Group 1 contents and the contents of Group 2 with each non-whitespace char replaced with an asterisk.
Java solution:
s = s.replaceAll("(\\G(?!^)\\s*|^\\S+\\s+\\S+\\s+)\\S", "$1*");
See the regex demo
Details
(\G(?!^)\s*|^\S+\s+\S+\s+) - Group 1: either then end of the previous match (\G(?!^)) and 0 or more whitespaces or (|) 1+ non-whitespaces, 1+ whitespaces and again 1+ non-whitespaces, 1+ whitespaces at the start of the string
\S - a non-whitespace char.
Interested if this can be done in JavaScript without a callback, I came up with
str = str.replace(/^(\w+\W+\w+\W+\b)\w?|(?!^)(\W*)\w/gy, '$1$2*');
See this demo at regex101
The idea might look a bit confusing but it seems to work fine. It should fail on one or two words but start as soon, as there appears a word character after the first two words. Important to use the sticky flag y which is similar to the \G anchor (continue on last match) but always is bound to start.
To not add an additional asterisk, the ...\b)\w?... part after the first two words is essential. The word boundary will force a third word to start but the first capturing group is closed after \b and the first character of the third word will be consumed but not captured to correctly match the asterisk count.
The second capturing group on the right side of the alternation will capture any optional non word characters appearing between any words after the third one.
var strs = ['Foo', 'Foo Bar B', 'Laura Torres Bermudez', 'Maria Fernanda Gonzales Lopez'];
strs = strs.map(str => str.replace(/^(\w+\W+\w+\W+\b)\w?|(?!^)(\W*)\w/gy, '$1$2*'));
console.log(strs);

why do these characters belong to the first group in this JS regex match?

I am trying to write a regex to find two meaningful groups within a substring that's part of a text I'm working with.
The text and my attempt are here:
https://regex101.com/r/6Sc3aM/1
The complete regex:
Artikelnummer(?:(?:&&&))(.*)(?:\s*.*)\W?(?:Dokumentation&&&KKS-Nummer&&&Beschreibung&&&Seite)(&&&([^(&&&)]+)&&&([^(&&&)]+)&&&(\d+))+
The test string:
%5B"Deckblatt: Anlagendokumentation&&&Produktdaten&&&KKS-Nummer&&&Hersteller&&&Typ&&&Artikelnummer&&&MA-KF1&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF11&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF12&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF13&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF14&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF15&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF16&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF17&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF18&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF19&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF20&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF21&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF22&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF23&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF24&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF25&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF26&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF27&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF28&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF29&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF30&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF31&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF32&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF33&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF34&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF35&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF36&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF37&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF38&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF39&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF40&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&MA-KF41&&&Beckhoff&&&EK1100&&&BECK%2EEK1100&&&Dokumentation&&&KKS-Nummer&&&Beschreibung&&&Seite&&&all&&&Vorwort&&&6&&&all&&&Produktübersicht&&&7&&&all&&&Grundlagen&&&8&&&all&&&Montage und Verdrahtung&&&9&&&all&&&Inbetriebnahme%2FAnwendungshinweise&&&10&&&all&&&Fehlerbehandlung und Diagnose&&&11&&&all&&&Anhang 1&&&12&&&all&&&Anhang 2&&&13&&&all&&&Anhang 3&&&14&&&all&&&Anhang 4&&&15&&&all&&&Anhang 5&&&16&&&all&&&Anhang 6&&&17&&&all&&&Anhang 7&&&18&&&all&&&Anhang 8&&&19&&&all&&&Anhang 9&&&20&&&all&&&Anhang 10&&&21&&&all&&&Anhang 11&&&22&&&all&&&Anhang 12&&&23&&&all&&&Anhang 13&&&24&&&all&&&Anhang 14&&&25&&&all&&&Anhang 15&&&26&&&all&&&Anhang 16&&&27&&&all&&&Anhang 17&&&28&&&all&&&Anhang 18&&&29&&&all&&&Anhang 19&&&30&&&all&&&Anhang 20&&&31&&&all&&&Anhang 21&&&32&&&all&&&Anhang 22&&&33&&&all&&&Anhang 23&&&34&&&all&&&Anhang 24&&&35&&&all&&&Anhang 25&&&36&&&all&&&Anhang 26&&&37&&&all&&&Anhang 27&&&38&&&all&&&Anhang 28&&&39&&&all&&&Anhang 29&&&40&&&all&&&Anhang 30&&&41&&&all&&&Anhang 31&&&42&&&all&&&Anhang 32&&&43&&&all&&&Anhang 33&&&44&&&all&&&Anhang 34&&&45&&&all&&&Anhang 35&&&46&&&all&&&Anhang 36&&&47&&&all&&&Anhang 37&&&48&&&all&&&Anhang 38&&&49&&&all&&&Anhang 39&&&50&&&all&&&Anhang 40&&&51&&&all&&&Anhang 41&&&52&&&all&&&Anhang 42&&&53"%5D
The regex I wrote should get a first group, which appears after /Artikelnummer/ and before /Dokumentation&&&/ (etc), as well as a second group, which is what I'm having trouble with:
It should consist of repetitions of this pattern: (&&&([^(&&&)]+)&&&([^(&&&)]+)&&&(\d+)+
By my reckoning, that should capture the entire substring:
&&&all&&&Vorwort&&&6&&&all&&&Produktübersicht&&&7&&&all&&&Grundlagen&&&8&&&all&&&Montage und Verdrahtung&&&9&&&all&&&Inbetriebnahme%2FAnwendungshinweise&&&10&&&all&&&Fehlerbehandlung und Diagnose&&&11&&&all&&&Anhang 1&&&12&&&all&&&Anhang 2&&&13&&&all&&&Anhang 3&&&14&&&all&&&Anhang 4&&&15&&&all&&&Anhang 5&&&16&&&all&&&Anhang 6&&&17&&&all&&&Anhang 7&&&18&&&all&&&Anhang 8&&&19&&&all&&&Anhang 9&&&20&&&all&&&Anhang 10&&&21&&&all&&&Anhang 11&&&22&&&all&&&Anhang 12&&&23&&&all&&&Anhang 13&&&24&&&all&&&Anhang 14&&&25&&&all&&&Anhang 15&&&26&&&all&&&Anhang 16&&&27&&&all&&&Anhang 17&&&28&&&all&&&Anhang 18&&&29&&&all&&&Anhang 19&&&30&&&all&&&Anhang 20&&&31&&&all&&&Anhang 21&&&32&&&all&&&Anhang 22&&&33&&&all&&&Anhang 23&&&34&&&all&&&Anhang 24&&&35&&&all&&&Anhang 25&&&36&&&all&&&Anhang 26&&&37&&&all&&&Anhang 27&&&38&&&all&&&Anhang 28&&&39&&&all&&&Anhang 29&&&40&&&all&&&Anhang 30&&&41&&&all&&&Anhang 31&&&42&&&all&&&Anhang 32&&&43&&&all&&&Anhang 33&&&44&&&all&&&Anhang 34&&&45&&&all&&&Anhang 35&&&46&&&all&&&Anhang 36&&&47&&&all&&&Anhang 37&&&48&&&all&&&Anhang 38&&&49&&&all&&&Anhang 39&&&50&&&all&&&Anhang 40&&&51&&&all&&&Anhang 41&&&52&&&all&&&Anhang 42&&&53
But, for some reason, the only string in group 2 is:
&&&Anhang 42&&&53
Why is this happening?
You get &&&all&&&Anhang 42&&&53 in Group 2 because the (pattern)+ is a repeated capturing group that stores only the value captured at the last iteration.
It seems you need
/Artikelnummer&&&([\s\S]*?)&&&Dokumentation&&&KKS-Nummer&&&Beschreibung&&&Seite((?:(?:&&&[^&]*(?:&&?[^&]+)*){2}&&&\d+)+)/g
See the regex demo
The first capturing group just matches any 0+ chars from Artikelnummer&&& till the first occurrence of &&&Dokumentation..., and the second one grabs 1+ occurrences of &&&...&&&...&&& + digit(s).
Details
Artikelnummer&&& - a literal substring
([\s\S]*?) - Group 1 matching any 0+ chars, as few as possible up to the
&&&Dokumentation&&&KKS-Nummer&&&Beschreibung&&&Seite - literal substring
((?:&&&[^&]*(?:&&?[^&]+)*&&&[^&]*(?:&&?[^&]+)*&&&\d+)+) - Group 2 matching 1+ occurrences of:
(?:&&&[^&]*(?:&&?[^&]+)*){2} - two occurrences of:
&&& - a literal substring
[^&]*(?:&&?[^&]+)* - any 0+ chars other than & and then 0+ sequences of & or && followed with any 0+ chars other than &
&&& - a literal substring
\d+ - 1+ digits.
Notes on performance: the first capturing group pattern needs to be precised if you need better performance. Right now, the lazy dot pattern is too slow and if the substring between the first and second delimiter grows, then there might be performance issues.

Javascript regex: Ignore closing bracket when enclosed in parentheses

I have a regex with a couple of (optional) capture groups. I'm trying to add a new feature that allows a user to add content to one of the capture groups that matches the closing bracket of the regex. I'm struggling to ignore this match
The current regex is /\[(.+?)=(.+?)(?:\|(.+?))?(?:\:(.+?))?\]/g
This allows a user to target data according to:
[key=value|filter:filtervalue]
where filter and filtervalue are optional.
The problem is that for the value it should now be possible to target indexes in an array. For example:
[data=example(products[0].id)]
However, the regex matches only up to .id so the second capture group is example(products[0. I would like it to be example(products[0].id). I think I should be fine if I can ignore the closing bracket when it is wrapped by parentheses, but I've been unable to figure out how.
Examples that should be matched:
[data=example(products[0].id)]
[data=example(products[index].id)]
[data=regular]
[data=test|number:2]
I created a regex101. Any help is appreciated.
You may use
/\[([^=\]]+)=((?:\[[^\][]*]|[^\]])+?)(?:\|([^:\]]+):((?:\[[^\][]*]|[^\]])+?))?]/g
See the regex demo
Note that lazy dot matching patterns are replaced with more restrictive negated character classes to make sure there is no overflow from one part of the bracketed string to another. To allow matching 1st-level nested [...] substrings, a \[[^\][]*]|[^\]] pattern is used to match [...] or non-] substrings is used.
Details:
\[ - a literal [
([^=\]]+) - Group 1 capturing 1+ chars other than = and ]
= - a = symbols
((?:\[[^\][]*]|[^\]])+?) - Group 2 capturing 1+ occurrences (as few as possible) of:
\[[^\][]*] - [, then 0+ chars other than ] and [, and then ]
| - or
[^\]] - any char other than ]
(?:\|([^:\]]+):((?:\[[^\][]*]|[^\]])+?))? - an optional group matching:
\| - a | char
([^:\]]+) - Group 3 capturing 1+ chars other than : and ]
: - a colon
((?:\[[^\][]*]|[^\]])+?) - Group 4 capturing the same text as Group 2.
] - a literal ] char.
I would probably break it down to two separate regular expressions and "or" them.
Below I have used your expression to match the first kind of string:
\[(?:(.+?)=(.+)(?:\|(.+?)?\:(.+?)))\]
[key=value|filter:filtervalue]
And another for the second one:
\[(.+?)=(.+)\]
[data=example(products[0].id)]
Then concatenating them to:
\[(?:(.+?)=(.+)(?:\|(.+?)?\:(.+?))|(.+?)=(.+))\]
Where it first tries to match the tricky part and if that fails, resorts to the more general one.
https://regex101.com/r/d6LwEt/1

Categories