Adding an additional letter matching group to an existing regex - javascript

I have the following regex: (?:\/us)?\/[a-z]{2}[_-][a-z]{2}(?:\/?$|(?=\/))|\/[a-z]{2}(?:\/?$|(?=\/))^([a-z]{2}\/retail)
As you can see, it's not particularly easy on the eyes. You can see it in action here: https://regex101.com/r/4AZwuP/1 (enable substitutions to see the desired result - the removal of matches)
Here's a few entries it's supposed to match:
/us/en_us/retail/en (matches /us/ and /us/en_us/)
/us/en_us/retail (matches /us/ and /en_us/)
/gb/en_gb/retail/en-uk (matches /en_gb and /en-uk)
Note that, these are just prefixes and the full url might look something like:
/de/de_de/retail/de_de/products/catalog
The goal is to run the regex and delete matches so that this lines becomes:
de/retail/products/catalog
The above Regex accomplishes this with one exception: in the first example, I need it to match not only /us/en_us but also /en (or /de or /mx - in other words, there's an additional country code there; it unfortunately does not.
What I do know for a fact is that if those two characters are present, it'll be one of these two:
.../retail/en
.../retail/en/something/or/other
In either case it's always two characters either alone or followed by a forward slash.
How can I modify the original regex to deal with this annoying edge case?
Bonus: how does the original work?

If a lookbehind is supported you might use:
(?:\/[a-z]{2})?\/[a-z]{2}[-_][a-z]{2}\b|(?<=\/retail)\/[a-z]{2}\b
(?:\/[a-z]{2})? Optionally match / and 2 chars a-z
\/[a-z]{2}[-_][a-z]{2}\b Match / 2 chars a-z. Then either - or _ and 2 char a-z
| Or
(?<=\/retail)\/[a-z]{2}\b Match 2 chars a-z asserting /retail directly to the left
Regex demo
Or use a capture group, and in the callback of replace check if group 1 exists. If it does, use it in the replacement to keep it.
(?:\/[a-z]{2})?\/[a-z]{2}[-_][a-z]{2}\b|\/(retail)\/[a-z]{2}\b
Regex demo

I suppose you want remove country code.then the begin /gb is country code also.
My regex is this (\/\w{2}(?=\/|$))|(\/\w{2}(-|_)\w{2}(?=\/|$))
let break in into two regex
(\/\w{2}(?=\/|$)) match two letter after / and end with / or nothing
(\/\w{2}(-|_)\w{2}(?=\/|$)) match two letter plus _|- and plus two letter,also start with / end with /
it match all example in your regex101,but it will failed if there has other two letters in your url

Related

Javascript regular expression: Want to exclude function words in all caps

I have the following regular expression to parse google script formula to get precedents
([A-z]{2,}!)?:?\$?[A-Z]\$?[A-Z]?(\$?[1-9]\$?[0-9]?)?
I needed to make the numbers optional to cater to ranges that are entire columns- see image. Because the numbers are optional I am also matching items that are functions -all caps words- that I want to exclude. I suppose I could do this after the fact but I would like to modify the regex to exclude them. How do I do that?
Example:
=IFERROR(VLOOKUP($AA16,Account_List_S!$AA:$AC,3,0),0)
IFERROR(IF(AD3=1,INDEX(CapEx!$AB$15:$AE$15,1,YEAR(AD$13)-
YEAR($Z$13)-1)*IF(Import_CapEx!AD$15>=0,Import_CapEx!AD$15,0),0),0)";
The words I want to match refer to cells with an optional sheet name, and optional $ before the row or column identifier. They can be ranges or single cells.
Examples of words I want to match:
$AA16
$AB$15
AD$15
$Z$13
Account_List_S!$AA:$AC
CapEx!$AB$15:$AE$15
Import_CapEx!AD$15
The words I want to exclude are the functions:
IFERROR
VLOOKUP
IF
YEAR
Try this regex:
/[\(,+\-\*/><=]((\w+!)?\$?[A-Z]{1,2}(\$?[\d]{0,3})?(:\$?[A-Z]{1,2}(\$?\d{0,3})?)?(?=[\),+\-\*/><=]))/g
While a little long, this has the advantage that it will reject these when found in the formula:
Anything that has [A-Z] and [0-9] but not a column, e.g. ZIP50210
Anything that has [A-Z] and [0-9] but in the wrong order, e.g. 25E
Any variables like "AR" or 'JOHN'
Any constants in the formula like TRUE, FALSE or other argument values
Explanation:
[\(,+\-\*/><=] look for starting literal ( or , or operands like +,-,/,*,>,<,=. We expect column identifiers to start with these characters.
( now we start our matching group
(\w+!)? allow for optional sheet names like 'Account_List_S!'
\$?[A-Z]{1,2}(\$?[\d]{0,3})? will match columns like A or $B1 or $AB$12 or AB123
(:\$?[A-Z_$]{1,2}(\$?[\d]{0,3}))? adds optional match for a range of columns, e.g. trailing :DD or :$C1 or :AC$1 or :AC123 or some such
(?=[,\)=:><]) lookahead for ending literal ) or , or operands like +,-,/,*,>,<,=. We expect column identifiers to end with these characters.
) close matching group
g global match (more than one instance)
Demo:
let regex = /[\(,+\-\*/><=]((\w+!)?\$?[A-Z]{1,2}(\$?[\d]{0,3})?(:\$?[A-Z]{1,2}(\$?\d{0,3})?)?(?=[\),+\-\*/><=]))/g;
let str = '=IFERROR(VLOOKUP($AA16,Account_List_S!$AA:$AC,3,0),0)IFERROR(IF(AD3=1,INDEX(CapEx!$AB$15:$AE$15,1,YEAR(AD$13)-YEAR($Z$13)-1)*IF(Import_CapEx!AD$15>=0,Import_CapEx!AD$15,0),0),0)";';
let arr = []
while(match = regex.exec(str)) {
arr.push(match[1]); //we only want the first matching group
}
console.log(arr);
/*
[ '$AA16',
'Account_List_S!$AA:$AC',
'AD3',
'CapEx!$AB$15:$AE$15',
'AD$13',
'$Z$13',
'Import_CapEx!AD$15',
'Import_CapEx!AD$15' ] */
This feels like a bad fit for a regular expression, but I cant pass up a good regex challenge.
My solution involves alot of conditional checks
(\w+\!)?\$?[A-Z]{1,}(?:\d+)?(\:?\$\w+)*(?!\()\b
Breakdown
(
\w+\! Words followed by an !
)? which might exist.
\$? A $ which might exist
[A-Z]{1,} At least 1 capitalized letter maybe more
(?:
\d+ A non capturing group of digits after our letters
)? but they might not exist
(
\:? A : which might exist
\$\w+ A $ followed by characters
)* With none or many of them
(?!\() All of this, ONLY IF we DONT have a ( after it
\b All of this, ONLY IF we have a word break
The magic really happens at the end with the conditional breaks, without them you capture alot of other stuff.
Sample
let text = `=IFERROR(VLOOKUP($AA165,Account_List_S!$AA:$AC,3,0),0)
IFERROR(IF(AD3=1,INDEX(CapEx!$AB$15:$AE$15,1,YEAR(AD$13)-
YEAR($Z$13)-1)*IF(Import_CapEx!AD$15>=0,Import_CapEx!AD$15,0),0),0)";`
let exp = /(\w+\!)?\$?[A-Z]{1,}(?:\d+)?(\:?\$\w+)*(?!\()\b/gm
let match;
while((match=exp.exec(text))) {
console.log(match[0]);
}
Ouput:
$AA165
Account_List_S!$AA:$AC
AD3
CapEx!$AB$15:$AE$15
AD$13
$Z$13
Import_CapEx!AD$15
Import_CapEx!AD$15
Simple change to the expression making the $ after the : optional makes it work for your added use case
(\w+\!)?\$?[A-Z]{1,}(?:\d+)?(\:?\$?\w+)*(?!\()\b
let text = `$X74,Calc_Named_HC!AE$32:AE$103)-Calc_General_HC!AE74";`
let exp = /(\w+\!)?\$?[A-Z]{1,}(?:\d+)?(\:?\$\w+)*(?!\()\b/gm
let match;
while((match=exp.exec(text))) {
console.log(match[0]);
}
First shot: filter out full upper case words
This answer is not perfect yet, but using a negative look-ahead at the beginning of the expression can allow you to filter out IF and any sequence of 3+ upper case letters:
(?!\b[A-Z]{3,}\b|\bIF\b)(\b[A-z]{2,}!)?:?\$?\b[A-Z]\$?[A-Z]?(\$?[1-9]\$?[0-9]?)?\b
The \b in several places is to make sure the positive and negative matches go from the beginning of the letter sequence to the end.
The problem that remains is that it matches Account_List_S!$AA:$AC in two matches, Account_List_S!$AA and :$AC. So...
Second shot: fix the positive matching part of the regex
Here is a more complicated version that matches the ranges correctly:
EDIT: fixed to handle the examples given by OP in the comments.
(?!\b[A-Z]{3,}\b|\bIF\b)(\b[A-z]{2,}!)?\$?\b[A-Z]{1,3}(\$?[1-9]{1,3})?(:\$?[A-Z]{1,3}(\$?\d{1,3})?)?\b
With this version, Account_List_S!$AA:$AC is matched as a whole, as I believe you want, and so is Calc_Named_HC!AE$32:AE$103 added in the comments below.
Third shot: accepts some spurious patterns but easier to read
If you are willing to accept matching a superfluous : before the first address, this simpler expression would work:
EDIT: fixed to handle the examples given in the comments.
(?!\b[A-Z]{3,}\b|\bIF\b)(\b[A-z]{2,}!)?(:?\$?\b[A-Z]{1,3}(\$?\d{1,3})?){1,2}\b
Note that I kept your [A-z] range as is, but [A-Za-z_] might be more appropriate, as pointed out by #sp00m in his comment.

JavaScript regex to match links with uppercase

I have this regex code that I want it to match any link preceded by -
this is my regex code
/-(\s+)?[-a-zA-Z0-9#:%_\+.~#?&//=]{1,256}\.[a-z]{2,4}\b(\/[-a-zA-Z0-9#:%_\+.~#?&//=]*)?/
it already match these links
- www.demo.com
- http://foo.co.uk/
But it doesn't match these
- WWW.TELEGRAM.COM
- WWW.c.COM
- t.mE/rrbot
you can go to this link to check it http://regexr.com/3gnb1
There's two possible ways to go about it. Your regex currently excludes capital letters in the domain name, so you'd have to swap .[a-z]{2,4} for .[a-zA-Z]{2,4} or then make the whole regex case insensitive. In the latter case, you can remove A-Z from the previous groups as well, resulting in:
/-(\s+)?[-a-z0-9#:%_\+.~#?&//=]{1,256}\.[a-z]{2,4}\b(\/[-a-z0-9#:%_\+.~#?&//=]*)?/i
Why are you limiting the TLD to 4 characters? There are many valid TLDs that exceed beyond that such as .finance, .movie, .academy, etc.
You can use my answer from a previous post and make some minor adjustments.
(?(DEFINE)
(?<scheme>[a-z][a-z0-9+.-]*)
(?<userpass>([^:#\/](:[^:#\/])?#))
(?<domain>[a-z0-9]+(-[a-z0-9]+)*(\.[a-z0-9]+(-[a-z0-9]+)*)+)
(?<ip>(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])))
(?<host>((?&domain)|(?&ip)))
(?<port>(:[\d]{1,5}))
(?<path>([^?;\#\s]*))
(?<query>(\?[^\#;\s]*))
(?<anchor>(\#\S*))
)
(?:^)?-\ +((?:(?&scheme):\/\/)?(?&userpass)?(?&host)(?&port)?\/?(?&path)?(?&query)?(?&anchor)?)(?:$|\s+)
You can see this regex in use here. This should catch all valid URLs (albeit the scheme is considered optional in your case, so I've made the scheme optional in the regex)

Regex - Allow capital letters just at the beginning of words

I have to check for capital letters to exist just at the beginning of words.
My regex now looks like this:
/^([A-ZÁÉÚŐÓÜÖÍ]([a-záéúőóüöí]*\s?))+$/
It's at the words beginning works good, but if the problem not at the beginning of the word it's fails.
For example: John JohnJ got validated.
What should i alternate in my regex to works well?
In your regex pattern the space is optional, allowing combinations like JJohn or JohnJ - the key is to make it required between words. There are two ways to do this:
Roll out your pattern:
/^[A-ZÁÉÚŐÓÜÖÍ][a-záéúőóüöí]*(?:\s[A-ZÁÉÚŐÓÜÖÍ][a-záéúőóüöí]*)*$/
Or make the space in your pattern required, but alternatively allow it to be the end of line (this allows a trailing space though).
/^(?:[A-ZÁÉÚŐÓÜÖÍ][a-záéúőóüöí]*(?:\s|$))+$/
In both patterns I have removed some superfluous groups of your original and turned all groups into non-capturing ones.
You can do this: /^([A-ZÁÉÚŐÓÜÖÍ]{0,1}([a-záéúőóüöí]*\s?))+$/
With {a,b}, a is the least amount of characters it will match, whereas b is the most amount of characters it will match.
If there is ALWAYS going to be a capital letter at the beginning, instead you can simply use: /^([A-ZÁÉÚŐÓÜÖÍ]{1}([a-záéúőóüöí]*\s?))+$/
In this preceding case, {c}, c is the exact number of characters it will match.
Here is a resource with good information.

Regex match word after negated set

I'm currently trying to match the following cases with Regex.
Current regex
\.\/[^/]\satoms\s\/[^/]+\/index\.js
Cases
// Should match
./atoms/someComponent/index.js
./molecules/someComponent/index.js
./organisms/someComponent/index.js
// Should not match
./atomsdsd/someComponent/index.js
./atosdfms/someComponent/index.js
./atomssss/someComponent/index.js
However none of the cases are matching, what am I doing wrong?
Hope this will help you out. You have added some addition characters which lets your regex to fail.
Regex: \.\/(atoms|molecules|organisms)\/[^\/]+\/index\.js
1. \.\/ This will match ./
2. (atoms|molecules|organisms) This will match either atoms or molecules or organisms
3. \/[^\/]+\/ This will match / and then till /
4. index\.js This will match index.js
Regex demo
why not just this simpler pattern?
\.\/(atoms|molecules|organisms)\/.*?index\.js
Try the following:
\.\/(atoms|molecules|organisms)\/[a-zA-Z]*\/index\.js
Forward slashes (and other special characters) should be escaped with a back slash \.
\.\/(atoms|molecules|organisms)\/ matches '.atoms/' or .molecules or organisms strictly. Without the parenthesis it will match partial strings. the | is an alternation operator that matches either everything to the left or everything to the right.
[a-zA-Z]* will match a string of any length with characters in any case. a-z accounts for lower case while A-Z accounts for upper case. * indicates one or more characters. Depending on what characters may be in someCompenent you may need to account for numbers using [a-zA-Z\d]*.
\/index\.js will match '/index.js'

Regular expression match specific key words

I am trying to use regexp to match some specific key words.
For those codes as below, I'd like to only match those IFs at first and second line, which have no prefix and postfix. The regexp I am using now is \b(IF|ELSE)\b, and it will give me all the IFs back.
IF A > B THEN STOP
IF B < C THEN STOP
LOL.IF
IF.LOL
IF.ELSE
Thanks for any help in advance.
And I am using http://regexr.com/ for test.
Need to work with JS.
I'm guessing this is what you're looking for, assuming you've added the m flag for multiline:
(?:^|\s)(IF|ELSE)(?:$|\s)
It's comprised of three groups:
(?:^|\s) - Matches either the beginning of the line, or a single space character
(IF|ELSE) - Matches one of your keywords
(?:$|\s) - Matches either the end of the line, or a single space character.
Regexr
you can do it with lookaround (lookahead + lookbehind). this is what you really want as it explicitly matches what you are searching. you don't want to check for other characters like string start or whitespaces around the match but exactly match "IF or ELSE not surrounded by dots"
/(?<!\.)(IF|ELSE)(?!\.)/g
explanation:
use the g-flag to find all occurrences
(?<!X)Y is a negative lookbehind which matches a Y not preceeded by an X
Y(?!X) is a negative lookahead which matches a Y not followed by an X
working example: https://regex101.com/r/oS2dZ6/1
PS: if you don't have to write regex for JS better use a tool which supports the posix standard like regex101.com

Categories