Lookahead (?=pattern) without preceding pattern [duplicate] - javascript

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
I learned that the lookahead regex is like this x(?=y) and means
Matches x only if x is followed by y.
according to the MDN. However I find this code on w3school:
<p>A form with a password field that must contain 8 or more characters that are of at least one number, and one uppercase and lowercase letter:</p>
<form action="demo_form.asp">
Password: <input type="password" name="pw" pattern="(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,}" title="Must contain at least one number and one uppercase and lowercase letter, and at least 8 or more characters">
<input type="submit">
</form>
Why does (?=.*\d) indicate "at least one number appears in the string"? And the three pair of parentheses don't matter where the match is, because as I look at this, it should be first one or more digit followed by one or more lowercase letters and then one or more uppercase letters and then 8 or more characters, what is wrong?
After a little search, it seems regex is different in various languages, is that what this is about?
edit:
I don't think you guys got my question. I meant the lookahead is like x(?=y), but the (?=.*\d) doesn't precede with anything, so what to match? And the second question, the three parentheses comes with specific order, but the match doesn't have to be same order, since /abc/ matches "abcdd" not "cbdda" ---- why doesn't the order matter?
update:
OK, probably I have a misunderstanding of lookahead, and thanks to whoever changed my title for this problem. So here's my final update if there's no more need after:
My problem is like the title says, a lookahead (?=pattern) can omit the preceding pattern, so what does it mean when nothing before the parentheses? I searched for 'lookahead', almost all explanation comes with a preceding pattern.
And I tried something on regex tester:
/(?=\d)/ will create an infinite match if the string contains a digit, like "a2", but it will show "no match" if the string has no digit, like "a"
Interestingly /(?=\d)./ will match for any digit, now it seems equals to \d
I have no idea what's going on right now, I'll go and learn the lookahead again but any further answers are welcomed, thanks

The (?=pattern) is a regex lookahead. It's a zero-width, "true or false" part of the pattern that doesn't actually "eat" any characters, but must match (be true) for the expression to succeed. So,
(?=.*\d)
means "lookahead to see .*\d, which is 'anything' (any number of times, greedy), followed by a number". Since the .* will by default eat up all characters until the end of the string, obviously the \d wouldn't have anything left to eat for itself. The .* backtracks, or gives up, a character at a time until the \d can match. Since * means 'zero or more', .* will give up everything it has matched, if necessary, to let the \d match. Thus, at least one digit somewhere in the string is enough to let the pattern match.

Related

Can somebody tell me how can this regular expression match anything? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 6 years ago.
This is the javascript regular expression I'm confused about. I know that (?=) is the positive lookahead, but is there suppose to have a main expression before that?
/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])\w{8,}$/
The answer says it matches a password which is:
at least
one number, one lowercase and one uppercase letter and at least 8 characters that
are letters, numbers or the underscore
But I don't see why. Can somebody explain a little?
Let's break it down:
/^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])\w{8,}$/
^ // Match the start of the string
(?=.*\d) // Make sure the string contains at least one digit
(?=.*[a-z]) // Make sure the string contains at least one lowercase letter
(?=.*[A-Z]) // Make sure the string contains at least one uppercase letter
\w{8,} // Match at least eight word characters (alphanumeric or underscore)
$ // Match the end of the string
(?=.*PATTERN) is a common way to ensure that a match string contains PATTERN.
It works because .* matches anything (except newline characters); the lookahead literally means "This regex should only match if you find PATTERN after something."

need a regex for password validation that allows all special characters [duplicate]

This question already has an answer here:
Password validation (regex?)
(1 answer)
Closed 8 years ago.
The password requirements are:
at least two letters
at least two numbers
at least one special character (any special character)
at least 8 characters
This one is close but isn't working:
/^(?=.*\d)(?=.*[a-zA-Z])(?=.*[\W]).{8,}$/
What am I doing wrong?
This regex meets your requirements:
/^(?=(?:[^a-z]*[a-z]){2})(?=(?:[^0-9]*[0-9]){2})(?=.*[!-\/:-#\[-`{-~]).{8,}$/i
Play with the demo to see what matches and doesn't match.
Explanation
This is a classic password validation technique with lookarounds as explained in this article
The i flag at the end makes it case-insensitive so we don't have to say a-zA-Z
The ^ anchor asserts that we are at the beginning of the string
The first lookahead (?=(?:[^a-z]*[a-z]){2}) asserts that what follows at this position (the beginning of the string) is any characters that are not a letter, followed by one letter... twice, ensuring there are at least two letters
The second lookahead (?=(?:[^0-9]*[0-9]){2}) asserts that what follows at this position (still the beginning of the string) is any characters that are not a digit, followed by one digit... twice, ensuring there are at least two letters
The third lookahead (?=.*[!-\/:-#\[-{-~])` asserts that what follows at this position (still the beginning of the string) is any characters, followed by one special character
The $ anchor asserts that we are at the end of the string
Note about special characters
The regex [!-\/:-#\[-{-~]` specifically picks out all printable chars that are neither digits nor letters from the ASCII table. If this includes chars you don't want, make it more restrictive.
A regex is probably inappropriate for this; it's hard to glance at the regex you've got and immediately have any idea what the requirements are, let alone how to modify them. You might want to just count the number of characters in each group directly, then check that those counts all pass the appropriate threshold.
That said: consider that this would enforce really awkward passwords, yet disallow xkcd-style passwords. I strongly encourage you to take a more heuristic approach, where a longer password loosens the other restrictions. There are other considerations to enforcing a strong password, too, like similarity to dictionary words and number of unique characters.
Honestly you might be best off just requiring passphrases :)
I'd say:
/^(?=.*\d.*\d)(?=.*[a-zA-Z].*[a-zA-Z])(?=.*[\W]).{8,}$/
Your regex was missing the 2 digits and 2 letters requirements.
How about:
/^(?=.{2,}\d)(?=.{2,}[a-zA-Z])(?=.*[\W]).{8,}$/
It should meet your requirement.
Depends on what you consider to be a "special character". If a special character is anything that is not a digit or a letter, and if Spaces are not allowed in the password, then:
^(?=(?:\S*\d){2})(?=(?:\S*[A-Za-z]){2})(?=\S*[^A-Za-z0-9])\S{8,}
or, with the "escapes":
"^(?=(?:\\S*\\d){2})(?=(?:\\S*[A-Za-z]){2})(?=\\S*[^A-Za-z0-9])\\S{8,}"
If you choose to allow spaces, replace \S with a dot .
If you want to define "special characters" as only including certain characters, or as excluding other characters in addition to letters and digits, edit the character class in the final lookahead.

only letterNumber but not all numbers

I know how to do a regex to validate if it's just letter number without no white spaces:
/^[0-9a-zA-Z]+$/
but how do I add to this regex also such that it cannot contain just numbers, so for example this is not valid:
08128912382
Any ideas?
"Must contain only letters and numbers and at least one letter" is equivalent to "must contain a letter surrounded by numbers or letters":
/^[0-9a-zA-Z]*[a-zA-Z][0-9a-zA-Z]*$/
I would like to add that this answer shows a way you can think about the problem so writing the regexp is simpler. It is not meant to be the best solution to the problem. I just took what you had and gave it a nudge in the right direction.
With several more nudges, you end up with other different answers (posted by ZER0, Tomalak and OGHaza respectively) :
You could notice that if there is a letter in the first or last group, the middle part is satisfied. In other words, since you have the middle part, you don't need to allow letters in the first or last part (but not both!):
/^[0-9]*[a-zA-Z][0-9a-zA-Z]*$/ - some numbers, followed by a letter, followed by some more numbers and letters
/^[0-9a-zA-Z]*[a-zA-Z][0-9]*$/ - equivalent if you read from the end
Knowing about lookaheads you can assert that there is at least one letter in the string:
/^(?=.*[a-z])/ - matches the start of any string that contains at least 1 letter
Or the other way around, as you expressed it, assert that there aren't only numbers in the string:
/^(?!\d+$)/ - matches the start of any string which doesn't contain just digits
The 2nd and 3rd solutions should also be combined with your original regexp that validates that the string contains only the characters you want it to (letters and numbers)
I for one am particularly fond of the 2nd solution which is i believe the fastest of all attempted so far.
A look-ahead can do it:
/^(?=.*[a-z])[0-9a-z]+$/i
I think the most elegant solution is a negative lookahead to check it's not only numbers
/^(?!\d+$)[0-9a-zA-Z]+$/
RegExr Example
So basically you need at that at least one letter is in the string. In that case you can just check the presence of one or more letter, preceded maybe by one or more numbers, and maybe followed by both:
/^[0-9]*[a-z][0-9a-z]*$/i
Notice that it will returns true if you test against string like "A" for instance, because in this case all the numbers are considered optional.

Javascript regular expression

I am making form and there is only one more thing which I cant figure it out :(
I need regular expression for password which must be at least 7 characters long. There can be small and big letters and must contain at least one number.
I tried
[0-9]+[a-zA-Z]){7}$
You can use lookahead:
^(?=.*\d)[a-zA-Z\d]{7,}$
(?=.*\d) is a lookahead which checks for a digit in the string. Basically, .* matches the whole string and then backtracks 1 by 1 to match a digit. If it matches a digit, the regex engine comes back to its position before match. So, it just checks for a pattern.
{7,} is a quantifier which matches previous pattern 7 to many times
^ is the beginning of a string

Looking for another regex explanation

In my regex expression, I was trying to match a password between 8 and 16 character, with at least 2 of each of the following: lowercase letters, capital letters, and digits.
In my expression I have:
^((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).{8,16})$
But I don't understand why it wouldn't work like this:
^((?=\d)(?=[a-z])(?=[A-Z])(?=\d)(?=[a-z])(?=[A-Z]){8,16})$
Doesnt ".*" just meant "zero or more of any character"? So why would I need that if I'm just checking for specific conditions?
And why did I need the period before the curly braces defining the limit of the password?
And one more thing, I don't understand what it means to "not consume any of the string" in reference to "?=".
Your last two questions are related. The ?= (which is called a lookahead, by the way) doesn't consume any of the string, meaning that it tests a condition of the string but itself is zero-characters long. If the lookahead is true, then the matching continues, but the next part of the expression starts from where you were before you checked the lookahead.
Because all your stuff is made up of lookaheads, they all add up to zero characters in length. So, for {8,16} to match something, you need to supply the . first. .{8,16} means "8 to 16 characters, I don't care what those characters are." {8,16} without anything before it isn't a valid expression (or at least won't mean what .{8,16} means).
Regarding your first question, you need .* in each of your lookaheads because your expression starts with ^. That means "starting at the very beginning of the string" rather than "matching anywhere within the string". Since you're not trying to match only at the beginning of the string, .* allows you to have the lookaheads affect anywhere in the string.
Lastly, I'm afraid your regexp doesn't work. Because the lookaheads are zero-length, putting the same lookahead in twice as you have done will match the same thing twice. So this expression only checks if you have a single instance of each of the types of characters that you want to enforce there being two instances of. The expression you want is more like this:
^((?=.*\d.*\d)(?=.*[a-z].*[a-z])(?=.*[A-Z].*[A-Z]).{8,16})$
And that expression is equivalent to the more elegant:
^((?=(.*\d){2})(?=(.*[a-z]){2})(?=(.*[A-Z]){2}).{8,16})$
(And, giving credit where it's due, Dennis beat me to that last expression. Well done, sir.)
The problem is that this character ^ means something like 'Right on start'. It means that these specific characters SHOULD BE strictly at the start of text you're searching in, which is not what you want.
Your expression will not work as you want it to.
Because of the lookaheads, both instances of (?=.*\d) will actually match the same digit, thus validating passwords with only one digit.
This should work:
^(?=(.*\d){2})(?=(.*[a-z]){2})(?=(.*[A-Z]){2}).{8,16}$
The difference between (?=.*\d) and (?=\d) is that, while they are both zero-width lookaheads, is that the former will match if there is a digit anywhere in the string (after the current location), but the latter will match only if that digit is immediately after the current location. So, that first regex looks for 8-16 characters, including one digit, lowercase, and uppercase each. The second regex requires the first character to be a digit, and a lowercase, and an uppercase, which is absurd. If you want to math two digits, then instead of (?=.*\d)(?=.*\d), do (?=.*\d.*\d).

Categories