RegExp for an identifier

RegExp for an identifier - javascript

I am trying to write a regular expression for an ID which comes in the following formats:
7_b4718152-d9ed-4724-b3fe-e8dc9f12458a
b4718152-d9ed-4724-b3fe-e8dc9f12458a
[a_][b]-[c]-[d]-[e]-[f]
a - optional 0-3 digits followed by an underscore if there's at least
a digit (if there is underscore is required)
b - 8 alphanumeric characters
c - 4 alphanumeric characters
d - 4 alphanumeric characters
e - 4 alphanumeric characters
f - 12 alphanumeric characters
I have came up with this regexp but I would appreciate any guidance and/or corrections. I am also not too sure how to handle the optional underscore in the first segment if there are no digits up front.
/([a-zA-Z0-9]{0,3}_[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12})+/g

Your regex looks good. To optionally match the first 3 digits with an underscore, you can wrap that group with ()?. Also you can force the presence of a digit before the underscore by using {1,3} instead of {0,3}.
Unless you expect that multiple identifiers are following each other without space and should be matched as one, you can drop the last + (for multiple matches on the same line, you already have the g option).
The final regex is ([a-zA-Z0-9]{1,3}_)?[a-zA-Z0-9]{8}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{4}-[a-zA-Z0-9]{12}
See here for a complete example.
If you also do not need to capture the individual 4-alphanumeric groups, you can simplify your regex into:
([a-zA-Z0-9]{1,3}_)?[a-zA-Z0-9]{8}-([a-zA-Z0-9]{4}-){3}[a-zA-Z0-9]{12}
See here for an example.

Related

Javascript: Regex to exclude whitespace and special characters

I need a regex to validate,
Should be of length 18
First 5 characters should be either (xyz34|xyz12)
Remaining 13 characters should be alphanumeric only letters and numbers, no whitespace or special characters is allowed.
I have a pattern like here, '/^(xyz34|xyz12)((?=.*[a-zA-Z])(?=.*[0-9])){13}/g'
But this is allowing whitespace and special characters like ($,% and etc) which is violating the rule #3.
Any suggestion to exclude this whitespace and special characters and to strictly check that it must be letters and numbers?

You should not quantify lookarounds. They are non-consuming patterns, i.e. the consecutive positive lookaheads check the presence of their patterns but do not advance the regex index, they check the text at the same position. It makes no sense repeating them 13 times. ^(xyz34|xyz12)((?=.*[a-zA-Z])(?=.*[0-9])){13} is equal to ^(xyz34|xyz12)(?=.*[a-zA-Z])(?=.*[0-9]), and means the string can start with xyz34 or xyz12 and then should have at least 1 letter and at least 1 digits.
You may consider fixing the issue by using a consuming pattern like this:
If you do not care if the last 13 chars contain only digits or only letters, use the patterns suggested by other users, like /^(?:xyz34|xyz12)[a-zA-Z\d]{13}$/ or /^xyz(?:34|12)[a-zA-Z0-9]{13}$/
If there must be at least 1 digit and at least 1 letter among those 13 alphanumeric chars, use /^xyz(?:34|12)(?=[a-zA-Z]*\d)(?=\d*[a-zA-Z])[a-zA-Z\d]{13}$/.
See the regex demo #1 and the regex demo #2.
NOTE: these are regex literals, do not use them inside single- or double quotes!
Details
^ - start of string
xyz - a common prefix
(?:34|12) - a non-capturing group matching 34 or 12
(?=[a-zA-Z]*\d) - there must be at least 1 digit after any 0+ letters to the right of the current location
(?=\d*[a-zA-Z]) - there must be at least 1 letter after any 0+ digtis to the right of the current location
[a-zA-Z\d]{13} - 13 letters or digits
$ - end of string.
JS demo:
var strs = ['xyz34abcdefghijkl1','xyz341bcdefghijklm','xyz34abcdefghijklm','xyz341234567890123','xyz14a234567890123'];
var rx = /^xyz(?:34|12)(?=[a-zA-Z]*\d)(?=\d*[a-zA-Z])[a-zA-Z\d]{13}$/;
for (var s of strs) {
console.log(s, "=>", rx.test(s));
}

.* will match any string, for your requirment you can use this:
/^xyz(34|12)[a-zA-Z0-9]{13}$/g
regex fiddle

/^(xyz34|xyz12)[a-zA-Z0-9]{13}$/
This should work,
^ asserts position at the start of a line
1st Capturing Group (xyz34|xyz12)
1st Alternative xyz34 matches the characters xyz34 literally (case sensitive)
2nd Alternative xyz12 matches the characters xyz12 literally (case sensitive)
Match a single character present in the list below [a-zA-Z0-9]{13}
{13} Quantifier — Matches exactly 13 times

How to create Regex to filter out results with few complex conditions regarding length, case and classes of characters

I have the following filtered:
2 digits (?=..*\d)
2 uppercase characters (?=..*[a-z])
2 lowercase characters (?=..*[A-Z])
10 to 63 characters .{10,63}$
Which translates to:
(?=.{2,}\d)(?=..*[a-z])(?=..*[A-Z]).{10,63}
Then I want to exclude a word starting with the letter u, and ending with three to six digits:
([uU][0-9]{3,6})
However, how can I merge these two patterns to do the following:
It should not allow the following because it respectively:
# does not have the required combination of characters
aaaaaaaaaaaaaaa
# is too long
asadsfdfs12BDFsdfsdfdsfsdfsdfdsfdsfdfsdfsdfsdfsdsfdfsdfsdfssdfdfsdfssdfdfsdfssdfdfsdfsdfsdfsdfsfdsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfsdfs
# contains the pattern that shouldn't be allowed
U0000ABcd567890
ABcd56U00007890
D4gF3U432234
D4gF3u432234
ABcd567890U123456
should allow the following:
# it has the required combination of characters
ABcd5678990
ABcd567890
# does contain a part of the disallowed pattern (`([uU][0-9]{3,6})`), but does not fit that pattern entirely
ABcd567890U12
ABcd5U12abcdf
s3dU00sDdfgdg
ABcd56U007890
Created and example here: https://regex101.com/r/4b2Hu9/3

In your pattern you make use of a lookahead (?=..*\d) which has a different meaning than you assume.
It means if what is directly on the right is 2 or more times any char except a newline followed by a single digit and the same for the upper and lowercase variants.
You could update your pattern to:
^(?!.*[uU]\d{3,6})(?=(?:\D*\d){2})(?=(?:[^a-z]*[a-z]){2})(?=(?:[^A-Z]*[A-Z]){2}).{10,63}$
In parts
^ Start of string
(?!.*[uU]\d{3,6}) Negative lookahead, assert not u or U followed by 3-6 digits
(?=(?:\D*\d){2}) Assert 2 digits
(?=(?:[^a-z]*[a-z]){2}) Assert 2 lowercase chars
(?=(?:[^A-Z]*[A-Z]){2}) Assert 2 uppercase chars
.{10,63} Match any char except a newline 10-63 times
$ End of string
Regex demo

First, the way to ensure that the string contains, for example, two digits would be to use a positive lookahead:
(?=.*\d.*\d)
You can generalize this to your other filters.
To make sure the string contains 10 - 63 characters:
.{10,63}
You say you do not want the string to begin with u or U followed by 3 to 6 digits (presumbaly 7 digits is okay), use a negative lookahead:
(?![uU]\d{3,6}\D)
The \D is required to make sure that if there is a 7th digit, then the string will be accepted.
Putting it all together:
r'^(?=.*\d.*\d)(?=.*[a-z].*[a-z])(?=.*[A-Z].*[A-Z])(?![uU]\d{3,6}\D).{10,63}$'

Regex to match only when expression match is no more than 12 characters long

I am trying to create a regular expression (Java/JavaScript) that matches the following regex, but only when there are fewer than 13 characters total (and a minimum of 4).
(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?) ← originally posted
(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ [A-Z]+)?)
These values should (and do) match:
MED-123
COTA-1224
MED4
COTB-892K777
MED-33 DDD
MED-234J5678
This value matches, but I don't want it to (I want to only match if there are fewer than 12 characters total):
COT-1111J11111111111111
See http://regexr.com/3bs7b http://regexr.com/3bsfv
I have tried grouping my expression and putting {4,12} at the end, but that just makes it look for 4 to 12 instances of the whole expression matching.
I feel like I am missing something simple...thanks in advance for your help!

You can use negative look-ahead:
(?!.{13,})(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?)
Since your expression already make sure that a match starts with COT or MED and there is at least one digit after that, it already guarantees that there are at least 4 characters

I have tried grouping my expression and putting {4,12} at the end, but
that just makes it look for 4 to 12 instances of the whole expression
matching.
This looks for 4 to 12 instances of the whole expression because you didn't add a word boundary \b. Your regex works fine, just add a word boundary and your desired outcome would be achieved. Take a look at this DEMO.
Your regex seems to be very clumsy and looks a little bit hard to read. It is also very limited to certain characters example JK except if you want it to be that way. For a more general pattern, you can check this out
(COT|MED)[AB]?-?[\dJK]{1,8}(\s+D{1,3})?\b
(COT|MED): matches either COT or MED
[AB]?: matches A or B which is optional because of the presence of ?
-?: matches - which is also optional
[\dJK]{1,8}: This matches a number,or J or K with a length of at least one character and a maximum of eight characters.
(\s+D{1,3})?: matches a space or a D at least one time and a maximum of 3 times and this is optional
\b: with respect to your question this seems to be the most important and it creates a boundary for the words that have already been matched. This means that anything exceeding the matched pattern would not be captured.
See the demo here DEMO2

The answer you are looking for is
(?!\S{13})(?:COT|MED)[ABCD]?-?\d{1,4}(?:[JK]+\d*|(?: [A-Z]+)?)
See regex demo
Note that it is almost impossible to check the length of a phrase that is not a whole string or that has spaces inside since boundaries are a bit "blurred". Thus, (?!\S{13}) is a kind of a workaround that just makes sure you do not have a string without whitespace that is 13 characters long or longer.
The regex breakdown:
(?!\S{13}) - Check if the substring that follows does not consist of 13 non-whitespace characters
(?:COT|MED) - Any of the values in the alternation (COTorMED`)
[ABCD]?-? - Optional A, B, C, D and then an optional -
\d{1,4} - 1 to 4 digits
(?:[JK]+\d*|(?: [A-Z]+)?) - a group of 2 alternatives:
[JK]+\d* - J or K, 1 or more times, and then 0 or more digits
(?: [A-Z]+)? - optional space and 1 or more Latin uppercase letters

As this answer suggests, you could solve this this way:
(?=(COT|MED)[ABCD]?-?[0-9]{1,4}(([JK]+[0-9]*)|(\ DDD)?))(?={4 , 12})

JavaScript regular expressions to match no digits, whitespace and selected symbols

Thanks for taking a look.
My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !#£$%^&*()+= or any other symbol I may choose.
I am however struggling to grasp precisely how regular expressions work.
I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.
So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.
Question 1
Is my reasoning up to this point correct?
The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.
This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.
Question 2
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Question 3
How would I go about writing the regular expression I'm trying to solve?
While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.

Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!
Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."
It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].
The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.
/^[^\d\s!#£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:
i. [] - match a range of characters
ii. []+ - match one or more of that range of characters
iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)
iv. [^\d\s!#£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match
v. ^[^\d\s!#£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match
A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!

Is my reasoning up to this point correct?
Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).
and [^\s] will match the first non-whitespace character
Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).
/^\D+[^\s]+$/ matching strings that contain spaces?
Yes, it does, because \D matches a space (space is not a digit).
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Because \D+ in /^\D+[^\s]+$/can match spaces.
Conclusion:
Use
^[^\d\s!#£$%^&*()+=]+$
It will match strings that have no digits and spaces, and the symbols you do not allow.
Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.

Just insert every character you don't want to include in a negated character class as follows:
^[^\s\d!#£$%^&*()+=]*$
DEMO
Debuggex Demo
^ - start of the string
[^...] - matches one character that is not in `...`
\s - matches a whitespace (space, newline,tab)
\d - matches a digit from 0 to 9
* - a quantifier that repeats immediately preceeding element by 0 or more times
so the regex matches any string that has
1. string that has a beginning
2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !#£$%^&*()+=) i.e., characters that are not included in the character class `[...]`
3.that has ending
NOTE:
If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class

regex for password

I'm trying to get regex for minimum requirements of a password to be minimum of 6 characters; 1 uppercase, 1 lowercase, and 1 number. Seems easy enough? I have not had any experience in regex's that "look ahead", so I would just do:
if(!pwStr.match(/[A-Z]+/) || !pwStr.match(/[a-z]+/) || !pwStr.match(/[0-9]+/) ||
pwStr.length < 6)
//was not successful
But I'd like to optimize this to one regex and level up my regex skillz in the process.

^.*(?=.{6,})(?=.*[a-zA-Z])(?=.*\d)(?=.*[!&$%&? "]).*$
^.*
Start of Regex
(?=.{6,})
Passwords will contain at least 6 characters in length
(?=.*[a-zA-Z])
Passwords will contain at least 1 upper and 1 lower case letter
(?=.*\d)
Passwords will contain at least 1 number
(?=.*[!#$%&? "])
Passwords will contain at least given special characters
.*$
End of Regex
here is the website that you can check this regex - http://rubular.com/

Assuming that a password may consist of any characters, have a minimum length of at least six characters and must contain at least one upper case letter and one lower case letter and one decimal digit, here's the one I'd recommend: (commented version using python syntax)
re_pwd_valid = re.compile("""
# Validate password 6 char min with one upper, lower and number.
^ # Anchor to start of string.
(?=[^A-Z]*[A-Z]) # Assert at least one upper case letter.
(?=[^a-z]*[a-z]) # Assert at least one lower case letter.
(?=[^0-9]*[0-9]) # Assert at least one decimal digit.
.{6,} # Match password with at least 6 chars
$ # Anchor to end of string.
""", re.VERBOSE)
Here it is in JavaScript:
re_pwd_valid = /^(?=[^A-Z]*[A-Z])(?=[^a-z]*[a-z])(?=[^0-9]*[0-9]).{6,}$/;
Additional: If you ever need to require more than one of the required chars, take a look at my answer to a similar password validation question
Edit: Changed the lazy dot star to greedy char classes. Thanks Erik Reppen - nice optimization!

My experience is if you can separate out Regexes, the better the code will read. You could combine the regexes with positive lookaheads (which I see was just done), but... why?
Edit:
Ok, ok, so if you have some configuration file where you could pass string to compile into a regex (which I've seen done and have done before) I guess it is worth the hassle. But otherwise, Even if the answers provided are corrected to match what you need, I'd still advise against it unless you intend to create such a thing. Separate regexes are just so much nicer to deal with.

I haven't tested thoroughly but here's a more efficient version of Amit's. I think his also allowed unspecified characters into the mix (which wasn't technically listed as a rule). This one won't go berserk on you if you accidentally target a large hunk of text, it will fail sooner on strings that are too long and it only allows the characters in the final class.
'.' should be used sparingly. Think of the looping it has to do to determine a match with all the characters it can represent. It's much more efficient to use negating classes.
`^(?=[^0-9]{0,9}[0-9])(?=[^a-z]{0,9}[a-z])(?=[^A-Z]{0,9}[A-Z])(?=[^##$%]{0,9}[##$%])[0-9a-zA-Z##$%]{6,10`}$
There's nothing wrong with trying to find the ideal regEx. But split it up when you need to.
RegEx tends to be explained poorly. I'll add a breakdown:
a - a single 'a' character
ab - a single 'a' character followed by a single b character
a* - 0 or more 'a' characters
a+ - one or more 'a' characters
a+b - one or any number of a characters followed by a single b character.
a{6,} - at least 6 'a' characters (would match more)
a{6,10} - 6-10 'a' characters
a{10} - exactly 10 'a' characters iirc - not very useful
^ - beginning of a string - so ^a+ would not math 'baaaa'
$ - end of a string - b$ would not find a match 'aaaba'
[] signifies a character class. You can put a variety of characters inside it and every character will be checked. By itself only whatever string character you happen to be on is matched against. It can be modified by + and * as above.
[ab]+c - one or any number of a or b characters followed by a single c character
[a-zA-Z0-9] - any letter, any number - there are a bunch of \<some key> characters representing sets like \d for 'digits' I'm guessing. \w iirc is basically [a-zA-Z_]
note: '\' is the escape key for character classes. [a\-z] for 'a' or '-' or 'z' rather than anything from a to z which is what [a-z] means
[^<stuff>] a character class with the caret in front means everything but the characters or <stuff> listed - this is critical to performance in regEx matches hitting large strings.
. - wildcard character representing most characters (exceptions are a handful of really old-school whitespace characters). Not a big deal in very small sets of characters but avoid using it.
(?=<regex stuff>) - a lookahead. Doesn't move the parser further down the string if it matches. If a lookahead fails, the whole match fails. If it succeeds, you go back to the same character before it. That's why we can string a bunch together to search if there's at least one of a given character.
So:
^ - at the beginning followed by whatever is next
(?=[^0-9]{0,9}[0-9]) - look for a digit from 0-9 preceded by up to 9 or 0 instances of anything that isn't 0-9 - next lookahead starts at the same place
etc. on the lookaheads
[0-9a-zA-Z##$%]{6,10} - 6-10 of any letter, number, or ##$% characters
No '$' is needed because I've limited everything to 10 characters anyway

We Keep Coding

JavaScript is the programming language of the Web.