Capturing after the nth occurrence of a string using regex

Capturing after the nth occurrence of a string using regex - javascript

My test string:
/custom-heads/food-drinks/51374-easter-bunny-cake
I am trying to capture the number in the string. The constants in that string are the the number is always preceded by 3 /'s and followed by a -.
I am a regex noob and am struggling with this. I cobbled together (\/)(.*?)(-) and then figured I could get the last one programmatically, but I would really like to understand regex better and would love if someone could show me the regex to get the last occurrence of numbers between / and -.

Don't use regexes if possible, i reccomend you to read - https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/ blog post
To your question, its easier, faster, more bullet proof to get it using splits
const articleName = "/custom-heads/food-drinks/51374-easter-bunny-cake".split("/")[3]
// '51374-easter-bunny-cake'
const articleId = articleName.split("-")[0]
// '51374'
hope it helps

You may use this regex with a capture group:
^(?:[^\/]*\/){3}([^-]+)
Or in modern browsers you can use lookbehind assertion:
/(?<=^(?:[^\/]*\/){3})[^-]+/
RegEx Demo 1
RegEx Demo 2
RegEx Code:
^: Start
(?:[^\/]*\/){3}: Match 0 or more non-/ characters followed by a /. Repeat this group 3 times
([^-]+): Match 1+ of non-hyphen characters
Code:
const s = `/custom-heads/food-drinks/51374-easter-bunny-cake`;
const re = /^(?:[^\/]*\/){3}([^-]+)/;
console.log (s.match(re)[1]);

Use
const str = `/custom-heads/food-drinks/51374-easter-bunny-cake`
const p = /(?:\/[^\/]*){2}\/(\d+)-/
console.log(str.match(p)?.[1])
See regex proof.
EXPLANATION
Non-capturing group (?:\/[^\/]*){2}
{2} matches the previous token exactly 2 times
\/ matches the character / with index 4710 (2F16 or 578) literally (case sensitive)
Match a single character not present in the list below [^\/]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\/ matches the character / with index 4710 (2F16 or 578) literally (case sensitive)
\/ matches the character / with index 4710 (2F16 or 578) literally (case sensitive)
1st Capturing Group (\d+)
\d matches a digit (equivalent to [0-9])
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
- matches the character - with index 4510 (2D16 or 558) literally (case sensitive)

Related

Javascript: Regex to exclude whitespace and special characters

I need a regex to validate,
Should be of length 18
First 5 characters should be either (xyz34|xyz12)
Remaining 13 characters should be alphanumeric only letters and numbers, no whitespace or special characters is allowed.
I have a pattern like here, '/^(xyz34|xyz12)((?=.*[a-zA-Z])(?=.*[0-9])){13}/g'
But this is allowing whitespace and special characters like ($,% and etc) which is violating the rule #3.
Any suggestion to exclude this whitespace and special characters and to strictly check that it must be letters and numbers?

You should not quantify lookarounds. They are non-consuming patterns, i.e. the consecutive positive lookaheads check the presence of their patterns but do not advance the regex index, they check the text at the same position. It makes no sense repeating them 13 times. ^(xyz34|xyz12)((?=.*[a-zA-Z])(?=.*[0-9])){13} is equal to ^(xyz34|xyz12)(?=.*[a-zA-Z])(?=.*[0-9]), and means the string can start with xyz34 or xyz12 and then should have at least 1 letter and at least 1 digits.
You may consider fixing the issue by using a consuming pattern like this:
If you do not care if the last 13 chars contain only digits or only letters, use the patterns suggested by other users, like /^(?:xyz34|xyz12)[a-zA-Z\d]{13}$/ or /^xyz(?:34|12)[a-zA-Z0-9]{13}$/
If there must be at least 1 digit and at least 1 letter among those 13 alphanumeric chars, use /^xyz(?:34|12)(?=[a-zA-Z]*\d)(?=\d*[a-zA-Z])[a-zA-Z\d]{13}$/.
See the regex demo #1 and the regex demo #2.
NOTE: these are regex literals, do not use them inside single- or double quotes!
Details
^ - start of string
xyz - a common prefix
(?:34|12) - a non-capturing group matching 34 or 12
(?=[a-zA-Z]*\d) - there must be at least 1 digit after any 0+ letters to the right of the current location
(?=\d*[a-zA-Z]) - there must be at least 1 letter after any 0+ digtis to the right of the current location
[a-zA-Z\d]{13} - 13 letters or digits
$ - end of string.
JS demo:
var strs = ['xyz34abcdefghijkl1','xyz341bcdefghijklm','xyz34abcdefghijklm','xyz341234567890123','xyz14a234567890123'];
var rx = /^xyz(?:34|12)(?=[a-zA-Z]*\d)(?=\d*[a-zA-Z])[a-zA-Z\d]{13}$/;
for (var s of strs) {
console.log(s, "=>", rx.test(s));
}

.* will match any string, for your requirment you can use this:
/^xyz(34|12)[a-zA-Z0-9]{13}$/g
regex fiddle

/^(xyz34|xyz12)[a-zA-Z0-9]{13}$/
This should work,
^ asserts position at the start of a line
1st Capturing Group (xyz34|xyz12)
1st Alternative xyz34 matches the characters xyz34 literally (case sensitive)
2nd Alternative xyz12 matches the characters xyz12 literally (case sensitive)
Match a single character present in the list below [a-zA-Z0-9]{13}
{13} Quantifier — Matches exactly 13 times

Regex related confusion

I would like to extract mkghj.bmg and pp.kp from the following string using a regex used in javascript
avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp
Everything enclosed within ,,,, needs to be ignored. There could be multiple instances of ,,,, But they will always occur even number of times (non occurrence is also a possibility) in the string.
Also, avb#gh.lk has an # sign, therefore it needs to be ignored
I guess the rule I am looking for is this - if there is a dot (.) look ahead and look behind :-
If the dot is enclosed inside ,,,, then ignore it
If the dot has an # before it with no space between the dot and #, ignore it
In all other cases, capture an unbroken set of characters (until a space is encountered) on either side of the dot
I came up with this regex, but it is not helpful
[^\, ]+([^# \,]+\w+)[^\, ]+

Generally speaking that is (mind the capturing group):
not_this | neither_this_nor_this | (but_this_interesting_stuff)
For your specific example, this could be
,,,,.*?,,,,|\S+#\S+|(\S+)
You need to check for the existance of group 1, see a demo on regex101.com.
In JS this would be:
var myString = "avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp";
var myRegexp = /,,,,.*?,,,,|\S+#\S+|(\S+)/g;
match = myRegexp.exec(myString);
while (match != null) {
if (typeof(match[1]) != 'undefined') {
console.log(match[1]);
}
match = myRegexp.exec(myString);
}

Regex
^[^ ]+ ([^ ]+) ,,,,.*,,,,\s+(.*)
Description
^ asserts position at start of a line
Match a single character not present in the list below [^ ]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
matches the character literally (case sensitive)
1st Capturing Group ([^ ]+)
Match a single character not present in the list below [^ ]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
matches the character literally (case sensitive)
,,,, matches the characters ,,,, literally (case sensitive)
.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
,,,, matches the characters ,,,, literally (case sensitive)
\s+ matches any whitespace character (equal to [\r\n\t\f\v ])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
2nd Capturing Group (.*)
.* matches any character (except for line terminators)

You can replace the string between ,,, by '' and than from remaining you can split and search for # sign and filter.
let str = `avb#gh.lk mkghj.bmg ,,,,fsdsdf.fdfd pllk.kp sdfsdf.bb,,,, pp.kp`
let op = str.replace(/,,,.*?,,,,|/g,'').split(' ').filter(e=> !e.includes('#') && e );
console.log(op)

Javascript in regexp not matching something

I want to match everything except the one with the string '1AB' in it. How do I do that? When I tried it, it said nothing is matched.
var text = "match1ABmatch match2ABmatch match3ABmatch";
var matches = text.match(/match(?!1AB)match/g);
console.log(matches[0]+"..."+matches[1]);

Lookarounds do not consume the text, i.e. the regex index does not move when their patterns are matched. See Lookarounds Stand their Ground for more details. You still must match the text with a consuming pattern, here, the digits.
Add \w+ word matching pattern after the lookahead. NOTE: You may also use \S+ if there can be any one or more non-whitespace chars. If there can be any chars, use .+ (to match 1 or more chars other than line break chars) or [^]+ (matches even line breaks).
var text = "match100match match200match match300match";
var matches = text.match(/match(?!100(?!\d))\w+match/g);
console.log(matches);
Pattern details
match - a literal substring
(?!100(?!\d)) - a negative lookahead that fails the match if, immediately to the right of the current location, there is 100 substring not followed with a digit (if you want to fail the matches where the number starts with 100, remove the (?!\d) lookahead)
\w+ - 1 or more word chars (letters, digits or _)
match - a literal substring
See the regex demo online.

javascript - url regexp not proper

I am tryint to validate URL with js.
function validateURL(url) {
var urlregex = new RegExp("^(http:\/\/www.|https:\/\/www.|ftp:\/\/www.|www.|https:\/\/|http:\/\/){1}([0-9A-Za-z]+\.)");
return urlregex.test(url);
}
but but i want that google.com will also pass, but it is not passing thru this regexp.
what is wrong with regexp?
I want these urls to pass thru regexp:
http://www.google.com
http://google.com
https://www.google.com
https://google.com
google.com
www.google.com

Try this instead:
function validateURL(url) {
var urlregex = new RegExp("^((?:https?:\/\/)?(?:www\.)?[0-9A-Za-z]+\.[0-9A-Za-z]+)$");
return urlregex.test(url);
}
´
DEMO
http://regex101.com/r/rG8wP9
OR:
function validateURL(url) {
if (/^((?:https?:\/\/)?(?:www\.)?[0-9A-Za-z]+\.[0-9A-Za-z]+)$/im.test(url)) {
return true;
} else {
return false;
}
}
EXPLANATION:
^ assert position at start of a line
1st Capturing group ((?:https?:\/\/)?(?:www\.)?[0-9A-Za-z]+\.[0-9A-Za-z]+)
(?:https?:\/\/)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
http matches the characters http literally (case sensitive)
s? matches the character s literally (case sensitive)
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
: matches the character : literally
\/ matches the character / literally
\/ matches the character / literally
(?:www\.)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
www matches the characters www literally (case sensitive)
\. matches the character . literally
[0-9A-Za-z]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
\. matches the character . literally
[0-9A-Za-z]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
0-9 a single character in the range between 0 and 9
A-Z a single character in the range between A and Z (case sensitive)
a-z a single character in the range between a and z (case sensitive)
$ assert position at end of a line
g modifier: global. All matches (don't return on first match)
m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)

what is wrong with regexp?
This part of regexp:
([0-9A-Za-z]+\.)
can't be matched with "google.com" in your example. But it matches with wrong domain addresses like google. Something like this:
(?:[0-9A-Za-z]+\.)+[A-Za-z]{2,}
is more correct and flexible.

Several things not working in this regexp. For instance nothing happen after the period at the end. Also never need {1} as every expression matched once by default. Main problem is the | because the branches are not inside () so after each | the match restart completely
This is your regexp cleaned up. For instance pulling things into groups, see the | inside ().
isMatch = (/^(?:(?:https?|ftp):\/\/)?(?:www\.)?[0-9a-z]+\.[a-z]+/im.test(url))

Regex - Not containing a string and not ending with a /

How do I create a regular expression which don't contain the string "umbraco" and doesn't end with a /
This is the what I have so far but I'm unable to get it fully working, any help would be appreciated.
(?!umbraco)(?![/]$)
Test strings would be:
http://www.domain.com/umbraco/login.aspx - shouldn't match
http://www.domain.com/pages/1/ - shouldn't match
http://www.domain.com/pages/1 - should match

It should be this regex:
^(?!.*?umbraco).*?[^\/]$
Online Demo: http://regex101.com/r/lM0cS9
Explanation:
^ assert position at start of a line
(?!.*?umbraco) Negative Lookahead - Assert that it is impossible to match the regex below
.*? matches any character (except newline)
Quantifier: Between zero and unlimited times, as few times as possible, expanding as needed
umbraco matches the characters umbraco literally (case sensitive)
.*? matches any character (except newline)
Quantifier: Between zero and unlimited times, as few times as possible, expanding as needed
[^\/] match a single character not present in the list below
\/ matches the character / literally
$ assert position at end of a line

This should be the regex
^(?!.*?umbraco).*?[^\/\s*\n*]$
demo http://rubular.com/r/tEhY7JFjXK

We Keep Coding

JavaScript is the programming language of the Web.

Capturing after the nth occurrence of a string using regex - javascript

Related

Javascript: Regex to exclude whitespace and special characters

Regex related confusion

Javascript in regexp not matching something

javascript - url regexp not proper

Regex - Not containing a string and not ending with a /

Categories

Resources