I have the following specifications for a regex:
-> The string starts with a string of three numbers
-> It is followed by a '-'
-> That is followed by three uppercase vowels
-> That is followed by a '-'
-> That is followed by three numbers
-> That is followed by a final '-'
-> That is followed by the last three uppercase vowels.
-> Second set of numbers can not equal the first.
-> The second group of letters can not equal the first.
-> The groups of numbers may not contain zero.
A passable string is:
368-IOU-789-AIO.
An invalid string is:
368-AEO-368-AEI
354-AOU-431-AOU
Currently, I have something like this:
([0-9]+[0-9]+[0-9]+[/AEIOU/]+[0-9]+[0-9]+[0-9])
What you have won't work since + means "one or more of". For example, the sequence [0-9]+[0-9]+[0-9]+ will match anywhere between three and an infinite number of digits.
In addition, your current attempt:
allows for one to an infinite number of vowels (and possibly / character);
doesn't require a vowel set at the end;
doesn't require the - separators; and
may allow for arbitrary content before and after the match.
You should be able to use the {count} specifier to get an exact quantity. All but one of those limitations can be done with any basic regex engine, with something like:
^[1-9]{3}-[AEIOU]{3}-[1-9]{3}-[AEIOU]{3}$
The ^ and $ anchors means start and end of string, [1-9]{3} gives you exactly three non-zero digits, [AEIOU]{3} gives you exactly three vowels, and - gives you the literal separator character.
The "groups cannot be identical" rule is a little more problematic. I would just post process for that to ensure it's not violated. The following pseudo-code is what I mean:
def isValid(str):
if not str.regex_match("^[1-9]{3}-[AEIOU]{3}-[1-9]{3}-[AEIOU]{3}$"):
return false
return str[0..2] != str[8..10]
and str[4..6] != str[12..14]
The alternative will be a rather complex regex that future developers will probably curse you for inflicting on them :-)
Note that your "The groups of numbers may not contain zero" is a little ambiguous in that it may mean no zeros are allowed or just 000 is not allowed. I've assumed the former but it's easy adjustable to cater for the latter:
def isValid(str):
if not str.regex_match("^[0-9]{3}-[AEIOU]{3}-[0-9]{3}-[AEIOU]{3}$"):
return false
return str[0..2] != str[8..10]
and str[4..6] != str[12..14]
and str[0..2] != "000"
and str[8..10] != "000"
You can use capture group and backreferences
^([0-9]{3})-([AEIOU]{3})-(?:(?!\1)[0-9]){3}-(?:(?!\2)[AEIOU]){3}$
Regex Demo
The groups of numbers may not contain zero. From this if you meant only digits between 1 to 9 then you can replace [0-9] with [1-9]
If you don't want to have 000 then you can add a negative lookahead ^(?!.*000) to avoid matching 000
Related
This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 1 year ago.
What is the difference between:
(.+?)
and
(.*?)
when I use it in my php preg_match regex?
They are called quantifiers.
* 0 or more of the preceding expression
+ 1 or more of the preceding expression
Per default a quantifier is greedy, that means it matches as many characters as possible.
The ? after a quantifier changes the behaviour to make this quantifier "ungreedy", means it will match as little as possible.
Example greedy/ungreedy
For example on the string "abab"
a.*b will match "abab" (preg_match_all will return one match, the "abab")
while a.*?b will match only the starting "ab" (preg_match_all will return two matches, "ab")
You can test your regexes online e.g. on Regexr, see the greedy example here
The first (+) is one or more characters. The second (*) is zero or more characters. Both are non-greedy (?) and match anything (.).
In RegEx, {i,f} means "between i to f matches". Let's take a look at the following examples:
{3,7} means between 3 to 7 matches
{,10} means up to 10 matches with no lower limit (i.e. the low limit is 0)
{3,} means at least 3 matches with no upper limit (i.e. the high limit is infinity)
{,} means no upper limit or lower limit for the number of matches (i.e. the lower limit is 0 and the upper limit is infinity)
{5} means exactly 4
Most good languages contain abbreviations, so does RegEx:
+ is the shorthand for {1,}
* is the shorthand for {,}
? is the shorthand for {,1}
This means + requires at least 1 match while * accepts any number of matches or no matches at all and ? accepts no more than 1 match or zero matches.
Credit: Codecademy.com
+ matches at least one character
* matches any number (including 0) of characters
The ? indicates a lazy expression, so it will match as few characters as possible.
A + matches one or more instances of the preceding pattern. A * matches zero or more instances of the preceding pattern.
So basically, if you use a + there must be at least one instance of the pattern, if you use * it will still match if there are no instances of it.
Consider below is the string to match.
ab
The pattern (ab.*) will return a match for capture group with result of ab
While the pattern (ab.+) will not match and not returning anything.
But if you change the string to following, it will return aba for pattern (ab.+)
aba
+ is minimal one, * can be zero as well.
A star is very similar to a plus, the only difference is that while the plus matches 1 or more of the preceding character/group, the star matches 0 or more.
I think the previous answers fail to highlight a simple example:
for example we have an array:
numbers = [5, 15]
The following regex expression ^[0-9]+ matches: 15 only.
However, ^[0-9]* matches both 5 and 15. The difference is that the + operator requires at least one duplicate of the preceding regex expression
This is from an exercise on FCC beta and i can not understand how the following code means two consecutive numbers seeing how \D* means NOT 0 or more numbers and \d means number, so how does this accumulate to two numbers in a regexp?
let checkPass = /(?=\w{5,})(?=\D*\d)/;
This does not match two numbers. It doesn't really match anything except an empty string, as there is nothing preceding the lookup.
If you want to match two digits, you can do something like this:
(\d)(\d)
Or if you really want to do a positive lookup with the (?=\D*\d) section, you will have to do something like this:
\d(?=\D*\d)
This will match against the last digit which is followed by a bunch of non-digits and a single digit. A few examples (matched numbers highlighted):
2 hhebuehi3
^
245673
^^^^^
2v jugn45
^ ^
To also capture the second digit, you will have to put brackets around both numbers. Ie:
(\d)(?=\D*(\d))
Here it is in action.
In order to do what your original example wants, ie:
number
5+ \w characters
a non-number character
a number
... you will need to precede your original example with a \d character. This means that your lookups will actually match something which isn't just an empty string:
\d(?=\w{5,})(?=\D*\d)
IMPORTANT EDIT
After playing around a bit more with a JavaScript online console, I have worked out the problem with your original Regex.
This matches a string with 5 or more characters, including at least 1 number. This can match two numbers, but it can also match 1 number, 3 numbers, 12 numbers, etc. In order to match exactly two numbers in a string of 5-or-more characters, you should specify the number of digits you want in the second half of your lookup:
let regex = /(?=\w{5,})(?=\D*\d{2})/;
let string1 = "abcd2";
let regex1 = /(?=\w{5,})(?=\D*\d)/;
console.log("string 1 & regex 1: " + regex1.test(string1));
let regex2 = /(?=\w{5,})(?=\D*\d{2})/;
console.log("string 1 & regex 2: " + regex2.test(string1));
let string2 = "abcd23";
console.log("string 2 & regex 2: " + regex2.test(string2));
My original answer was about Regex in a vacuum and I glossed over the fact that you were using Regex in conjunction with JavaScript, which works a little differently when comparing Regex to a string. I still don't know why your original answer was supposed to match two numbers, but I hope this is a bit more helpful.
?= Positive lookahead
w{5,} matches any word character (equal to [a-zA-Z0-9_])
{5,}. matches between 5 and unlimited
\D* matches any character that\'s not a digit (equal to [^0-9])
* matches between zero and unlimited
\d matches a digit (equal to [0-9])
This expression is global - so tries to match all
You can always check your expression using regex101
I recently needed to create a regular expression to check input in JavaScript. The input could be 5 or 6 characters long and had to contain exactly 5 numbers and one optional space, which could be anywhere in the string. I am not regex-savvy at all and even though I tried looking for a better way, I ended up with this:
(^\d{5}$)|(^ \d{5}$)|(^\d{5} $)|(^\d{1} \d{4}$)|(^\d{2} \d{3}$)|(^\d{3} \d{2}$)|(^\d{4} \d{1}$)
This does what I need, so the allowed inputs are (if 0 is any number)
'00000'
' 00000'
'0 0000'
'00 000'
'000 00'
'0000 0'
'00000 '
I doubt that this is the only way to achieve such matching with regex, but I haven't found a way to do it in a cleaner way. So my question is, how can this be written better?
Thank you.
Edit:
So, it is possible! Tom Lord's answer does what I needed with regular expressions, so I marked it as a correct answer to my question.
However, soon after I posted this question, I realized that I wasn't thinking right, since every other input in the project was easily 'validatable' with regex, I was immediately assuming I could validate this one with it as well.
Turns out I could just do this:
const validate = function(value) {
const v = value.replace(/\s/g, '')
const regex = new RegExp('^\\d{5}$');
return regex.test(v);
}
Thank you all for the cool answers and ideas! :)
Edit2: I forgot to mention a possibly quite important detail, which is that the input is limited, so the user can only enter up to 6 characters. My apologies.
Note: Using a regular expression to solve this problem might not be
the best answer. As answered
below, it may be
easier to just count the digits and spaces with a simple function!
However, since the question was asking for a regex answer, and in some
scenarios you may be forced to solve this with a regex (e.g. if
you're tied down to a certain library's implementation), the following
answer may be helpful:
This regex matches lines containing exactly 5 digits:
^(?=(\D*\d){5}\D*$)
This regex matches lines containing one optional space:
^(?=[^ ]* ?[^ ]*$)
If we put them together, and also ensure that the string contains only digits and spaces ([\d ]*$), we get:
^(?=(\D*\d){5}\D*$)(?=[^ ]* ?[^ ]*$)[\d ]*$
You could also use [\d ]{5,6} instead of [\d ]* on the end of that pattern, to the same effect.
Demo
Explanation:
This regular expression is using lookaheads. These are zero-width pattern matchers, which means both parts of the pattern are "anchored" to the start of the string.
\d means "any digit", and \D means "any non-digit".
means "space", and [^ ] means "any non-space".
The \D*\d is being repeated 5 times, to ensure exactly 5 digits are in the string.
Here is a visualisation of the regex in action:
Note that if you actually wanted the "optional space" to include things like tabs, then you could instead use \s and \S.
Update: Since this question appears to have gotten quite a bit of traction, I wanted to clarify something about this answer.
There are several "simpler" variant solutions to my answer above, such as:
// Only look for digits and spaces, not "non-digits" and "non-spaces":
^(?=( ?\d){5} *$)(?=\d* ?\d*$)
// Like above, but also simplifying the second lookahead:
^(?=( ?\d){5} *$)\d* ?\d*
// Or even splitting it into two, simpler, problems with an "or" operator:
^(?:\d{5}|(?=\d* \d*$).{6})$
Demos of each line above: 1 2 3
Or even, if we can assume that the string is no more than 6 characters then even just this is sufficient:
^(?:\d{5}|\d* \d*)$
So with that in mind, why might you want to use the original solution, for similar problems? Because it's generic. Look again at my original answer, re-written with free-spacing:
^
(?=(\D*\d){5}\D*$) # Must contain exactly 5 digits
(?=[^ ]* ?[^ ]*$) # Must contain 0 or 1 spaces
[\d ]*$ # Must contain ONLY digits and spaces
This pattern of using successive look-aheads can be used in various scenarios, to write patterns that are highly structured and (perhaps surprisingly) easy to extend.
For example, suppose the rules changed and you now wanted to match 2-3 spaces, 1 . and any number of hyphens. It's actually very easy to update the regex:
^
(?=(\D*\d){5}\D*$) # Must contain exactly 5 digits
(?=([^ ]* ){2,3}[^ ]*$) # Must contain 2 or 3 spaces
(?=[^.]*\.[^.]*$) # Must contain 1 period
[\d .-]*$ # Must contain ONLY digits, spaces, periods and hyphens
...So in summary, there are "simpler" regex solutions, and quite possibly a better non-regex solution to OP's specific problem. But what I have provided is a generic, extensible design pattern for matching patterns of this nature.
I suggest to first check for exactly five numbers ^\d{5}$ OR look ahead for a single space between numbers ^(?=\d* \d*$) among six characters .{6}$.
Combining those partial expressions yields ^\d{5}$|^(?=\d* \d*$).{6}$:
let regex = /^\d{5}$|^(?=\d* \d*$).{6}$/;
console.log(regex.test('00000')); // true
console.log(regex.test(' 00000')); // true
console.log(regex.test('00000 ')); // true
console.log(regex.test('00 000')); // true
console.log(regex.test(' 00000')); // false
console.log(regex.test('00000 ')); // false
console.log(regex.test('00 000')); // false
console.log(regex.test('00 0 00')); // false
console.log(regex.test('000 000')); // false
console.log(regex.test('0000')); // false
console.log(regex.test('000000')); // false
console.log(regex.test('000 0')); // false
console.log(regex.test('000 0x')); // false
console.log(regex.test('0000x0')); // false
console.log(regex.test('x00000')); // false
Alternatively match the partial expressions separately via e.g.:
/^\d{5}$/.test(input) || input.length == 6 && /^\d* \d*$/.test(input)
This seems more intuitive to me and is O(n)
function isInputValid(input) {
const length = input.length;
if (length != 5 && length != 6) {
return false;
}
let spaceSeen = false;
let digitsSeen = 0;
for (let character of input) {
if (character === ' ') {
if (spaceSeen) {
return false;
}
spaceSeen = true;
}
else if (/^\d$/.test(character)) {
digitsSeen++;
}
else {
return false;
}
}
return digitsSeen == 5;
}
You can split it in half:
var input = '0000 ';
if(/^[^ ]* [^ ]*$/.test(input) && /^\d{5,6}$/.test(input.replace(/ /, '')))
console.log('Match');
Here's a simple regex to do the job:
^(?=[\d ]{5,6}$)\d*\s?\d*$
Explanation:
^ asserts position at start of the string
Positive Lookahead (?=[\d ]{5,6}$)
Assert that the Regex below matches
Match a single character present in the list below [\d ]{5,6}
{5,6} Quantifier — Matches between 5 and 6 times, as many times as possible, giving back as needed (greedy)
\d matches a digit (equal to [0-9])
matches the character literally (case sensitive)
$ asserts position at the end of the string
\d* matches a digit (equal to [0-9])
Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\d* matches a digit (equal to [0-9])
Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string
string="12345 ";
if(string.length<=6 && string.replace(/\s/g, '').length<=5 && parseInt(string,10)){
alert("valid");
}
You could simply check the length and if its a valid number...
This is how I would do it without regex:
string => [...string].reduce(
([spaces,digits], char) =>
[spaces += char == ' ', digits += /\d/.test(char)],
[0,0]
).join(",") == "1,5";
I'm trying to get regex for minimum requirements of a password to be minimum of 6 characters; 1 uppercase, 1 lowercase, and 1 number. Seems easy enough? I have not had any experience in regex's that "look ahead", so I would just do:
if(!pwStr.match(/[A-Z]+/) || !pwStr.match(/[a-z]+/) || !pwStr.match(/[0-9]+/) ||
pwStr.length < 6)
//was not successful
But I'd like to optimize this to one regex and level up my regex skillz in the process.
^.*(?=.{6,})(?=.*[a-zA-Z])(?=.*\d)(?=.*[!&$%&? "]).*$
^.*
Start of Regex
(?=.{6,})
Passwords will contain at least 6 characters in length
(?=.*[a-zA-Z])
Passwords will contain at least 1 upper and 1 lower case letter
(?=.*\d)
Passwords will contain at least 1 number
(?=.*[!#$%&? "])
Passwords will contain at least given special characters
.*$
End of Regex
here is the website that you can check this regex - http://rubular.com/
Assuming that a password may consist of any characters, have a minimum length of at least six characters and must contain at least one upper case letter and one lower case letter and one decimal digit, here's the one I'd recommend: (commented version using python syntax)
re_pwd_valid = re.compile("""
# Validate password 6 char min with one upper, lower and number.
^ # Anchor to start of string.
(?=[^A-Z]*[A-Z]) # Assert at least one upper case letter.
(?=[^a-z]*[a-z]) # Assert at least one lower case letter.
(?=[^0-9]*[0-9]) # Assert at least one decimal digit.
.{6,} # Match password with at least 6 chars
$ # Anchor to end of string.
""", re.VERBOSE)
Here it is in JavaScript:
re_pwd_valid = /^(?=[^A-Z]*[A-Z])(?=[^a-z]*[a-z])(?=[^0-9]*[0-9]).{6,}$/;
Additional: If you ever need to require more than one of the required chars, take a look at my answer to a similar password validation question
Edit: Changed the lazy dot star to greedy char classes. Thanks Erik Reppen - nice optimization!
My experience is if you can separate out Regexes, the better the code will read. You could combine the regexes with positive lookaheads (which I see was just done), but... why?
Edit:
Ok, ok, so if you have some configuration file where you could pass string to compile into a regex (which I've seen done and have done before) I guess it is worth the hassle. But otherwise, Even if the answers provided are corrected to match what you need, I'd still advise against it unless you intend to create such a thing. Separate regexes are just so much nicer to deal with.
I haven't tested thoroughly but here's a more efficient version of Amit's. I think his also allowed unspecified characters into the mix (which wasn't technically listed as a rule). This one won't go berserk on you if you accidentally target a large hunk of text, it will fail sooner on strings that are too long and it only allows the characters in the final class.
'.' should be used sparingly. Think of the looping it has to do to determine a match with all the characters it can represent. It's much more efficient to use negating classes.
`^(?=[^0-9]{0,9}[0-9])(?=[^a-z]{0,9}[a-z])(?=[^A-Z]{0,9}[A-Z])(?=[^##$%]{0,9}[##$%])[0-9a-zA-Z##$%]{6,10`}$
There's nothing wrong with trying to find the ideal regEx. But split it up when you need to.
RegEx tends to be explained poorly. I'll add a breakdown:
a - a single 'a' character
ab - a single 'a' character followed by a single b character
a* - 0 or more 'a' characters
a+ - one or more 'a' characters
a+b - one or any number of a characters followed by a single b character.
a{6,} - at least 6 'a' characters (would match more)
a{6,10} - 6-10 'a' characters
a{10} - exactly 10 'a' characters iirc - not very useful
^ - beginning of a string - so ^a+ would not math 'baaaa'
$ - end of a string - b$ would not find a match 'aaaba'
[] signifies a character class. You can put a variety of characters inside it and every character will be checked. By itself only whatever string character you happen to be on is matched against. It can be modified by + and * as above.
[ab]+c - one or any number of a or b characters followed by a single c character
[a-zA-Z0-9] - any letter, any number - there are a bunch of \<some key> characters representing sets like \d for 'digits' I'm guessing. \w iirc is basically [a-zA-Z_]
note: '\' is the escape key for character classes. [a\-z] for 'a' or '-' or 'z' rather than anything from a to z which is what [a-z] means
[^<stuff>] a character class with the caret in front means everything but the characters or <stuff> listed - this is critical to performance in regEx matches hitting large strings.
. - wildcard character representing most characters (exceptions are a handful of really old-school whitespace characters). Not a big deal in very small sets of characters but avoid using it.
(?=<regex stuff>) - a lookahead. Doesn't move the parser further down the string if it matches. If a lookahead fails, the whole match fails. If it succeeds, you go back to the same character before it. That's why we can string a bunch together to search if there's at least one of a given character.
So:
^ - at the beginning followed by whatever is next
(?=[^0-9]{0,9}[0-9]) - look for a digit from 0-9 preceded by up to 9 or 0 instances of anything that isn't 0-9 - next lookahead starts at the same place
etc. on the lookaheads
[0-9a-zA-Z##$%]{6,10} - 6-10 of any letter, number, or ##$% characters
No '$' is needed because I've limited everything to 10 characters anyway
I'm using this /[-\+,\.0-9]+/ to match numbers in strings like +4400,00 % or -3500,00 % or 0.00 %.
The matched results I want is +4400,00 and I correctly get it.
What if I wanted the same results for a string like +4.400,00 % (dot for thousands) ?
EDIT
How do I have to modify my RegEx for matching numbers in strings like <font color="red">+44.500 %</font>?
/[\-\+]?\s*[0-9]{1,3}(\.[0-9]{3})*,[0-9]+/
That should cover strings that
may start with a + or -, and then perhaps some whitespaces
then have between one and three numbers
then have groups of three numbers, prefixed with a period
then have a comma and at least one number behind the comma
Regarding your additional question (matching numbers inside strings), you should look into the manual of whatever regex API you're using. Most APIs have separate search and match methods; match wants the whole string to be part of your regular expression's language, while search will also match substrings.
[\+-]? - plus or minus
\d{1,3} - some digits
(\.\d{3})* - groups of 3 digits with point before
,\d{2} comma and 2 more digits
And so we get:
/[+-]?\d{1,3}(\.\d{3})*,\d{2}/
Your regex will already match ".". But it sounds like you also want to strip "." out? if that's the case, you need a substiution. In Perl,
if ($input =~ /(-|\+)[0-9][,\.0-9]+/) {
$input =~ s/\.//;
} else {
die;
}
I've also changed the regex so it will only match - and + at the start, and so it requires an initial digit