Looking for another regex explanation

Looking for another regex explanation - javascript

In my regex expression, I was trying to match a password between 8 and 16 character, with at least 2 of each of the following: lowercase letters, capital letters, and digits.
In my expression I have:
^((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,16})$
But I don't understand why it wouldn't work like this:
^((?=\d)(?=[a-z])(?=[A-Z])(?=\d)(?=[a-z])(?=[A-Z]){8,16})$
Doesnt ".*" just meant "zero or more of any character"? So why would I need that if I'm just checking for specific conditions?
And why did I need the period before the curly braces defining the limit of the password?
And one more thing, I don't understand what it means to "not consume any of the string" in reference to "?=".

Your last two questions are related. The ?= (which is called a lookahead, by the way) doesn't consume any of the string, meaning that it tests a condition of the string but itself is zero-characters long. If the lookahead is true, then the matching continues, but the next part of the expression starts from where you were before you checked the lookahead.
Because all your stuff is made up of lookaheads, they all add up to zero characters in length. So, for {8,16} to match something, you need to supply the . first. .{8,16} means "8 to 16 characters, I don't care what those characters are." {8,16} without anything before it isn't a valid expression (or at least won't mean what .{8,16} means).
Regarding your first question, you need .* in each of your lookaheads because your expression starts with ^. That means "starting at the very beginning of the string" rather than "matching anywhere within the string". Since you're not trying to match only at the beginning of the string, .* allows you to have the lookaheads affect anywhere in the string.
Lastly, I'm afraid your regexp doesn't work. Because the lookaheads are zero-length, putting the same lookahead in twice as you have done will match the same thing twice. So this expression only checks if you have a single instance of each of the types of characters that you want to enforce there being two instances of. The expression you want is more like this:
^((?=.*\d.*\d)(?=.*[a-z].*[a-z])(?=.*[A-Z].*[A-Z]).{8,16})$
And that expression is equivalent to the more elegant:
^((?=(.*\d){2})(?=(.*[a-z]){2})(?=(.*[A-Z]){2}).{8,16})$
(And, giving credit where it's due, Dennis beat me to that last expression. Well done, sir.)

The problem is that this character ^ means something like 'Right on start'. It means that these specific characters SHOULD BE strictly at the start of text you're searching in, which is not what you want.

Your expression will not work as you want it to.
Because of the lookaheads, both instances of (?=.*\d) will actually match the same digit, thus validating passwords with only one digit.
This should work:
^(?=(.*\d){2})(?=(.*[a-z]){2})(?=(.*[A-Z]){2}).{8,16}$

The difference between (?=.*\d) and (?=\d) is that, while they are both zero-width lookaheads, is that the former will match if there is a digit anywhere in the string (after the current location), but the latter will match only if that digit is immediately after the current location. So, that first regex looks for 8-16 characters, including one digit, lowercase, and uppercase each. The second regex requires the first character to be a digit, and a lowercase, and an uppercase, which is absurd. If you want to math two digits, then instead of (?=.*\d)(?=.*\d), do (?=.*\d.*\d).

Related

Regex - must contain number and must not contain special character

I want to check by regex if:
String contains number
String does not contain special characters (!<>?=+#{}_$%)
Now it looks like:
^[^!<>?=+#{}_$%]+$
How should I edit this regex to check if there is number anywhere in the string (it must contain it)?

you can add [0-9]+ or \d+ into your regex, like this:
^[^!<>?=+#{}_$%]*[0-9]+[^!<>?=+#{}_$%]*$
or
^[^!<>?=+#{}_$%]*\d+[^!<>?=+#{}_$%]*$
different between [0-9] and \d see here

Just look ahead for the digit:
var re = /^(?=.*\d)[^!<>?=+#{}_$%]+$/;
console.log(re.test('bob'));
console.log(re.test('bob1'));
console.log(re.test('bob#'))
The (?=.*\d) part is the lookahead for a single digit somewhere in the input.

You only needed to add the number check, is that right? You can do it like so:
/^(?=.*\d)[^!<>?=+#{}_$%]+$/
We do a lookahead (like peeking at the following characters without moving where we are in the string) to check to see if there is at least one number anywhere in the string. Then we do our normal check to see if none of the characters are those symbols, moving through the string as we go.
Just as a note: If you want to match newlines (a.k.a. line breaks), then you can change the dot . into [\W\w]. This matches any character whatsoever. You can do this in a number of ways, but they're all pretty much as clunky as each other, so it's up to you.

Lookahead (?=pattern) without preceding pattern [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I learned that the lookahead regex is like this x(?=y) and means
Matches x only if x is followed by y.
according to the MDN. However I find this code on w3school:
<p>A form with a password field that must contain 8 or more characters that are of at least one number, and one uppercase and lowercase letter:</p>
<form action="demo_form.asp">
Password: <input type="password" name="pw" pattern="(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,}" title="Must contain at least one number and one uppercase and lowercase letter, and at least 8 or more characters">
<input type="submit">
</form>
Why does (?=.*\d) indicate "at least one number appears in the string"? And the three pair of parentheses don't matter where the match is, because as I look at this, it should be first one or more digit followed by one or more lowercase letters and then one or more uppercase letters and then 8 or more characters, what is wrong?
After a little search, it seems regex is different in various languages, is that what this is about?
edit:
I don't think you guys got my question. I meant the lookahead is like x(?=y), but the (?=.*\d) doesn't precede with anything, so what to match? And the second question, the three parentheses comes with specific order, but the match doesn't have to be same order, since /abc/ matches "abcdd" not "cbdda" ---- why doesn't the order matter?
update:
OK, probably I have a misunderstanding of lookahead, and thanks to whoever changed my title for this problem. So here's my final update if there's no more need after:
My problem is like the title says, a lookahead (?=pattern) can omit the preceding pattern, so what does it mean when nothing before the parentheses? I searched for 'lookahead', almost all explanation comes with a preceding pattern.
And I tried something on regex tester:
/(?=\d)/ will create an infinite match if the string contains a digit, like "a2", but it will show "no match" if the string has no digit, like "a"
Interestingly /(?=\d)./ will match for any digit, now it seems equals to \d
I have no idea what's going on right now, I'll go and learn the lookahead again but any further answers are welcomed, thanks

The (?=pattern) is a regex lookahead. It's a zero-width, "true or false" part of the pattern that doesn't actually "eat" any characters, but must match (be true) for the expression to succeed. So,
(?=.*\d)
means "lookahead to see .*\d, which is 'anything' (any number of times, greedy), followed by a number". Since the .* will by default eat up all characters until the end of the string, obviously the \d wouldn't have anything left to eat for itself. The .* backtracks, or gives up, a character at a time until the \d can match. Since * means 'zero or more', .* will give up everything it has matched, if necessary, to let the \d match. Thus, at least one digit somewhere in the string is enough to let the pattern match.

Regex returns with incorrect value

I'm trying to create a function with a regex that can decide if my string value is correct or not. It should be true, if the string begins with lower or uppercase alphabetical characters or underscore. If it begins with any others, the function must return false.
My test input is something like this: ".dasfh"
The expressions, what I tried to use: [_a-zA-Z]..., [:alpha:]..., but both of them returned true.
I tried a bit easier task also:
"Hadfg" where the expression is [a-z]...: returns true
BUT
"hadfg" where the expression is [A-Z]...: returns false
Could anybody help me to understand this behaviour?

You're trying to match the first character in the string to be something in particular, this means you have to tell regex that it has to be the first character in the string.
The regex engine just tries to find any match in the entire string.
All you're telling it with [a-z] is "find me a lowercase character anywhere in the string". This means that:
"Hadfg" will equal true because it can find a, d, f or g as a match.
"HADFG" will equal false because there are no lowercase letters.
the same will happen for "hADFG" when matched with [A-Z] for instance, it will be able to find an A, D, F or G as a match whereas "hadfg" will return false because there is no uppercase character.
What you are looking for here is ^ in your regex, it is a special kind of modifier that indicates "start of line"
So when you apply this to your regex it will look like this: /^[a-z]/.
The regex on the previous line basically says "from the start of the string, is the first character following up a lowercase a-z?"
Try it out and you'll see.
For your solution you'd need /^[_a-zA-Z]/ to check if the first character is an _, a-z or A-Z character.
For reference, you can find cheatsheets within these tools (and test your regexes with it ofcourse!)
Regexr - My personal favorite (Uses your browsers JS regex engine)
Rubular - A Ruby regex tester
Regex101 - A Python / PCRE / PHP / JavaScript
And for a reference or tutorial (I'd recommend reading from start to finish if you want to start understanding regexp and how they work) theres regular-expressions.info.
Regex is never easy and be careful with what you do with it, it's a powerful but sometimes ugly beast to deal with :)
PS
I see you tagged your question as email-validation so I'll add a little bonus regex that validates the minimum requirements for an email address to be absolutely correct, I use this one personally:
.+#.+\..{2,}
which when broken up, looks like this:
.+ - one or more of any character
# - followed by a literal # character
.+ - one or more of any character
\. - followed by a literal . character
.{2,} - two or more of any character
Optionally you could replace {2,} with a + to make it one or more but this would allow a TLD with 1 character.
To see a RFC email-regex at work check this link.
When I look at that regex I basically just want to cry in a corner somewhere, there are definitely things you cannot do in an email address that my regex doesn't address but at least it makes sure it's something that looks like it's e-mailable anyways, if a new user decides to fill in some bull that's not my problem anymore and I wouldn't want to force them to change that 1 character just because the huge regex doesn't agree with it either.

need a regex for password validation that allows all special characters [duplicate]

This question already has an answer here:
Password validation (regex?)
(1 answer)
Closed 8 years ago.
The password requirements are:
at least two letters
at least two numbers
at least one special character (any special character)
at least 8 characters
This one is close but isn't working:
/^(?=.*\d)(?=.*[a-zA-Z])(?=.*[\W]).{8,}$/
What am I doing wrong?

This regex meets your requirements:
/^(?=(?:[^a-z]*[a-z]){2})(?=(?:[^0-9]*[0-9]){2})(?=.*[!-\/:-#\[-`{-~]).{8,}$/i
Play with the demo to see what matches and doesn't match.
Explanation
This is a classic password validation technique with lookarounds as explained in this article
The i flag at the end makes it case-insensitive so we don't have to say a-zA-Z
The ^ anchor asserts that we are at the beginning of the string
The first lookahead (?=(?:[^a-z]*[a-z]){2}) asserts that what follows at this position (the beginning of the string) is any characters that are not a letter, followed by one letter... twice, ensuring there are at least two letters
The second lookahead (?=(?:[^0-9]*[0-9]){2}) asserts that what follows at this position (still the beginning of the string) is any characters that are not a digit, followed by one digit... twice, ensuring there are at least two letters
The third lookahead (?=.*[!-\/:-#\[-{-~])` asserts that what follows at this position (still the beginning of the string) is any characters, followed by one special character
The $ anchor asserts that we are at the end of the string
Note about special characters
The regex [!-\/:-#\[-{-~]` specifically picks out all printable chars that are neither digits nor letters from the ASCII table. If this includes chars you don't want, make it more restrictive.

A regex is probably inappropriate for this; it's hard to glance at the regex you've got and immediately have any idea what the requirements are, let alone how to modify them. You might want to just count the number of characters in each group directly, then check that those counts all pass the appropriate threshold.
That said: consider that this would enforce really awkward passwords, yet disallow xkcd-style passwords. I strongly encourage you to take a more heuristic approach, where a longer password loosens the other restrictions. There are other considerations to enforcing a strong password, too, like similarity to dictionary words and number of unique characters.
Honestly you might be best off just requiring passphrases :)

I'd say:
/^(?=.*\d.*\d)(?=.*[a-zA-Z].*[a-zA-Z])(?=.*[\W]).{8,}$/
Your regex was missing the 2 digits and 2 letters requirements.

How about:
/^(?=.{2,}\d)(?=.{2,}[a-zA-Z])(?=.*[\W]).{8,}$/
It should meet your requirement.

Depends on what you consider to be a "special character". If a special character is anything that is not a digit or a letter, and if Spaces are not allowed in the password, then:
^(?=(?:\S*\d){2})(?=(?:\S*[A-Za-z]){2})(?=\S*[^A-Za-z0-9])\S{8,}
or, with the "escapes":
"^(?=(?:\\S*\\d){2})(?=(?:\\S*[A-Za-z]){2})(?=\\S*[^A-Za-z0-9])\\S{8,}"
If you choose to allow spaces, replace \S with a dot .
If you want to define "special characters" as only including certain characters, or as excluding other characters in addition to letters and digits, edit the character class in the final lookahead.

What does ?=^ mean in a regexp?

I want to write regexp which allows some special characters like #-. and it should contain at least one letter. I want to understand below things also:
/(?=^[A-Z0-9. '-]{1,45}$)/i
In this regexp what is the meaning of ?=^ ? What is a subexpression in regexp?

(?=) is a lookahead, it's looking ahead in the string to see if it matches without actually capturing it
^ means it matches at the BEGINNING of the input (for example with the string a test, ^test would not match as it doesn't start with "test" even though it contains it)
Overall, your expression is saying it has to ^ start and $ end with 1-45 {1,45} items that exist in your character group [A-Z0-9. '-] (case insensitive /i). The fact it is within a lookahead in this case just means it's not going to capture anything (zero-length match).

?= is a positive lookahead
Read more on regex

We Keep Coding

JavaScript is the programming language of the Web.