I am looping through all the links on a page and matching their href values against the following pattern:
([^/]+)/([0-9]+)/([^/]+)
Problem is there are 2 types of link formats on the page:
1. /video/123/slug
2. /video/123
Number 1. gets captured fine with the above regex but the 2nd fails. I want to make the third piece of the regex (the slug) optional so that both link formats return true when matched agains the regex. How to do this?
Put the last bit in brackets of a non-capturing group and add a ?:
([^/]+)/([0-9]+)(?:/([^/]+))?
Use ? quantifier, which makes your pattern optional. It matches either 0 or 1 occurrence of the pattern.
Also, you need to group the last slash, with your last part of your regex, in a non-capturing group.
([^/]+)/([0-9]+)(?:/([^/]+))?
If you add a ? that should make the last part of the pattern optional.
Related
I have the following regex: (?:\/us)?\/[a-z]{2}[_-][a-z]{2}(?:\/?$|(?=\/))|\/[a-z]{2}(?:\/?$|(?=\/))^([a-z]{2}\/retail)
As you can see, it's not particularly easy on the eyes. You can see it in action here: https://regex101.com/r/4AZwuP/1 (enable substitutions to see the desired result - the removal of matches)
Here's a few entries it's supposed to match:
/us/en_us/retail/en (matches /us/ and /us/en_us/)
/us/en_us/retail (matches /us/ and /en_us/)
/gb/en_gb/retail/en-uk (matches /en_gb and /en-uk)
Note that, these are just prefixes and the full url might look something like:
/de/de_de/retail/de_de/products/catalog
The goal is to run the regex and delete matches so that this lines becomes:
de/retail/products/catalog
The above Regex accomplishes this with one exception: in the first example, I need it to match not only /us/en_us but also /en (or /de or /mx - in other words, there's an additional country code there; it unfortunately does not.
What I do know for a fact is that if those two characters are present, it'll be one of these two:
.../retail/en
.../retail/en/something/or/other
In either case it's always two characters either alone or followed by a forward slash.
How can I modify the original regex to deal with this annoying edge case?
Bonus: how does the original work?
If a lookbehind is supported you might use:
(?:\/[a-z]{2})?\/[a-z]{2}[-_][a-z]{2}\b|(?<=\/retail)\/[a-z]{2}\b
(?:\/[a-z]{2})? Optionally match / and 2 chars a-z
\/[a-z]{2}[-_][a-z]{2}\b Match / 2 chars a-z. Then either - or _ and 2 char a-z
| Or
(?<=\/retail)\/[a-z]{2}\b Match 2 chars a-z asserting /retail directly to the left
Regex demo
Or use a capture group, and in the callback of replace check if group 1 exists. If it does, use it in the replacement to keep it.
(?:\/[a-z]{2})?\/[a-z]{2}[-_][a-z]{2}\b|\/(retail)\/[a-z]{2}\b
Regex demo
I suppose you want remove country code.then the begin /gb is country code also.
My regex is this (\/\w{2}(?=\/|$))|(\/\w{2}(-|_)\w{2}(?=\/|$))
let break in into two regex
(\/\w{2}(?=\/|$)) match two letter after / and end with / or nothing
(\/\w{2}(-|_)\w{2}(?=\/|$)) match two letter plus _|- and plus two letter,also start with / end with /
it match all example in your regex101,but it will failed if there has other two letters in your url
I need to write a little RegEx matcher which will match any occurrence of strings in the form of
[a-zA-Z]+(_[a-zA-Z0-9]+)?
If I use the regex above it does match the sections needed but would also match onto the abc part of 4_abc which is not intended. I tried to exclude it with:
(?:[^a-zA-Z0-9_]|^)([a-zA-Z]+(_[a-zA-Z0-9]+)?)(?:[^a-zA-Z0-9_]|$)
The problem is that the 'not' matches at the beginning and end are not really working like I hoped they would. If I use them on the example
a_d Dd_da 4_d d_4
they would block matching the second Dd_da because the space was used in the first match.Sadly I can't use lookarounds because I am using JS.
So the input:
a_d Dd_da 4_d d_4
should match: a_d, Dd_da and d_4
but matches: a_d (there is a space at the end)
Is there another way to match the needed sections, or to not consume the 'anchor' matches?
I really appreciate your help.
You can make use of \b:
\b[a-zA-Z]+(_[a-zA-Z0-9]+)?\b
\b matches the (zero-width) point where either the preceding character or following character is a letter, digit or underscore, but not both. It also matches with the start/end of the string if the first/last character is a letter, digit or underscore.
Here's what I'm trying to do:
http://i.imgur.com/Xqrf8Wn.png
Simply take a URL with 3 groups, $1 not so important, $2 & $3 are but $2 is totally optional including (obviously) the corresponding backslash when present, which is all I am trying to make optional. I get that it can/should? be in a non-cap group, but does it HAVE to be? I've seen enough now seems to indicate it does not HAVE to be. If possible, I'd really like to have someone explain it so I can try to fully understand it, and not just get one possible working answer handed to me to simply copy, like some come here seeking.
Here's my regex string(s) tried and at best only currently matching second URL string with optional present:
^https:\/\/([a-z]{0,2})\.?blah\.com(?:\/)(.*)\/required\/B([A-Z0-9]{9}).*
^https:\/\/([a-z]{0,2})\.?blah\.com(\/)?(.*)\/required\/B([A-Z0-9]{9}).*
^https:\/\/([a-z]{0,2})\.?blah\.com(?:\/)?(.*)?\/required\/B([A-Z0-9]{9}).*
Here are the two URLs that I want to capture group 2 & 3, with 1 and 2 being optional, but $2 being the problem. I've tried all the strings above and have yet to get it to match the string when the optional is NOT present and I believe it must be due to the backslashes?
https://blah.com/required/B7BG0Z0GU1A
https://blah.com/optional/required/B7BG0Z0GU1A
Making a part of the pattern optional is as simple as adding ?, and your last two attempts both work: https://regex101.com/r/RIKvYY/1
Your mistake is that your test is wrong - you are using ^ which matches the beginning of the string. You need to add the /m flag (multiline) to make it match the beginning of each line. This is the reason your patterns never match the second line...
Note that you're allowing two slashes (//required, for example). You can solve it by joining the first slash and the optional part to the same capturing group (of course, as long as you are using .* you can still match multiple slashes):
https:\/\/([a-z]{0,2})\.?blah\.com(?:\/(.*))?\/required\/B([A-Z0-9]{9}).*
I need some help with RegEx, it may be a basic stuff but I cannot find a correct way how to do it. Please help!
So, here's my question:
I have a list of URLs, that are invalid because of double slash, like this:
http://website.com//wp-content/folder/file.jpg, to fix it I need to remove all double slashes except the first one followed by colon (http://), so fixed URL is this: http://website.com/wp-content/folder/file.jpg.
I need to do it with RegExp.
Variant 1
url.replace(/\/\//g,'/'); // => http:/website.com/wp-content/folder/file.jpg
will replace all double slashed (//), including the first one, which is not correct.
example here:
https://regex101.com/r/NhCVMz/2
You may use
url = url.replace(/(https?:\/\/)|(\/){2,}/g, "$1$2")
See the regex demo
Note: a ^ anchor at the beginning of the pattern might be used if the strings are entire URLs.
This pattern will match and capture http:// or https:// and will restore it in the resulting string with the $1 backreference and all other cases of 2 or more / will be matched by (\/){2,} and only 1 occurrence will be put back into the resulting string since the capturing group does not include the quantifier.
Find (^|[^:])/{2,}
Replace $1/
delimited: /(^|[^:])\/{2,}/
I want to match all valid prefixes of substitute followed by other characters, so that
sub/abc/def matches the sub part.
substitute/abc/def matches the substitute part.
subt/abc/def either doesn't match or only matches the sub part, not the t.
My current Regex is /^s(u(b(s(t(i(t(u(te?)?)?)?)?)?)?)?)?/, which works, however this seems a bit verbose.
Is there any better (as in, less verbose) way to do this?
This would do like the same as you mentioned in your question.
^s(?:ubstitute|ubstitut|ubstitu|ubstit|ubsti|ubst|ubs|ub|u)?
The above regex will always try to match the large possible word. So at first it checks for substitute, if it finds any then it will do matching else it jumps to next pattern ie, substitut , likewise it goes on upto u.
DEMO 1 DEMO 2
you could use a two-step regex
find first word of subject by using this simple pattern ^(\w+)
use the extracted word from step 1 as your regex pattern e.g. ^subs against the word substitute