regex lookahead confusion [duplicate] - javascript

I want a regex which returns true when there is at least 5 characters et 2 digits. For that, I use a the lookahead (i. e. (?=...)).
// this one works
let pwRegex = /(?=.{5,})(?=\D*\d{2})/;
let result = pwRegex.test("bana12");
console.log("result", result) // true
// this one won't
pwRegex = /(?=.{5,})(?=\d{2})/;
result = pwRegex.test("bana12");
console.log("result", result) // false
Why we need to add \D* to make it work ?
For me, \d{2} is looser than \D*\d{2} so it should not allow an acceptance of the test?

Your lookaheads only test from the current match position. Since you don't match anything, this means from the start. Since bana12 doesn't start with two digits, \d{2} fails. Its as simple as that ;)
Also, note that having \d{2} means your digits has to be adjacent. Is that your intention?
To simply require 2 digits, that doesn't need to be adjacent, try
/(?=.{5,})(?=\D*\d\D*\d)/

Note that lookaheads are zero-width assertions and when their patterns are matched the regex index stays at the same place where it has been before. The lookaheads in the patterns above are executed at the same locations.
The /(?=.{5,})(?=\d{2})/ pattern will match a location that has any 5 chars other than line break chars immediately to the right of the current location and the first 2 chars in this 5 char substring are digits.
You need to add \D* to let other types of chars before the 2 digits.
See more about that behavior at Lookarounds Stand their Ground.

Related

Regex with lowercase, uppercase, alphanumeric, special characters and no more than 2 identical characters in a row with a minimum length of 8 chars [duplicate]

This question already has answers here:
Regex for password must contain at least eight characters, at least one number and both lower and uppercase letters and special characters
(42 answers)
Closed 3 years ago.
I'm trying to create a regex that allows the 4 main character types (lowercase, uppercase, alphanumeric, and special chars) with a minimum length of 8 and no more than 2 identical characters in a row.
I've tried searching for a potential solution and piecing together different regexes but no such luck! I was able to find this one on Owasp.org
^(?:(?=.*\d)(?=.*[A-Z])(?=.*[a-z])|(?=.*\d)(?=.*[^A-Za-z0-9])(?=.*[a-z])|(?=.*[^A-Za-z0-9])(?=.*[A-Z])(?=.*[a-z])|(?=.*\d)(?=.*[A-Z])(?=.*[^A-Za-z0-9]))(?!.*(.)\1{2,})[A-Za-z0-9!~<>,;:_=?*+#."&§%°()\|\[\]\-\$\^\#\/]{8,32}$
but it uses at least 3 out of the 4 different characters when I need all 4. I tried modifying it to require all 4 but I wasn't getting anywhere. If someone could please help me out I would greatly appreciate it!
Can you try the following?
var strongRegex = new RegExp("^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!##\$%\^&\*])(?=.{8,})");
Explanations
RegEx Description
(?=.*[a-z]) The string must contain at least 1 lowercase alphabetical character
(?=.*[A-Z]) The string must contain at least 1 uppercase alphabetical character
(?=.*[0-9]) The string must contain at least 1 numeric character
(?=.[!##\$%\^&]) The string must contain at least one special character, but we are escaping reserved RegEx characters to avoid conflict
(?=.{8,}) The string must be eight characters or longer
or try with
(?=.{8,100}$)(([a-z0-9])(?!\2))+$ The regex checks for lookahead and rejects if 2 chars are together
var strongerRegex = new RegExp("^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!##\$%\^&\*])(?=.{8,100}$)(([a-z0-9])(?!\2))+$");
reference
I think this might work from you (note: the approach was inspired by the solution to this SO question).
/^(?:([a-z0-9!~<>,;:_=?*+#."&§%°()|[\]$^#/-])(?!\1)){8,32}$/i
The regex basically breaks down like this:
// start the pattern at the beginning of the string
/^
// create a "non-capturing group" to run the check in groups of two
// characters
(?:
// start the capture the first character in the pair
(
// Make sure that it is *ONLY* one of the following:
// - a letter
// - a number
// - one of the following special characters:
// !~<>,;:_=?*+#."&§%°()|[\]$^#/-
[a-z0-9!~<>,;:_=?*+#."&§%°()|[\]$^#/-]
// end the capture the first character in the pair
)
// start a negative lookahead to be sure that the next character
// does not match whatever was captured by the first capture
// group
(?!\1)
// end the negative lookahead
)
// make sure that there are between 8 and 32 valid characters in the value
{8,32}
// end the pattern at the end of the string and make it case-insensitive
// with the "i" flag
$/i
You could use negative lookaheads based on contrast using a negated character class to match 0+ times not any of the listed, then match what is listed.
To match no more than 2 identical characters in a row, you could also use a negative lookahead with a capturing group and a backreference \1 to make sure there are not 3 of the same characters in a row.
^(?=[^a-z]*[a-z])(?=[^A-Z]*[A-Z])(?=[^0-9]*[0-9])(?=[^!~<>,;:_=?*+#."&§%°()|\[\]$^#\/-]*[!~<>,;:_=?*+#."&§%°()|\[\]$^#\/-])(?![a-zA-Z0-9!~<>,;:_=?*+#."&§%°()|\[\]$^#\/-]*([a-zA-Z0-9!~<>,;:_=?*+#."&§%°()|\[\]$^#\/-])\1\1)[a-zA-Z0-9!~<>,;:_=?*+#."&§%°()|\[\]$^#\/-]{8,}$
^ Start of string
(?=[^a-z]*[a-z]) Assert a-z
(?=[^A-Z]*[A-Z]) Assert A-Z
(?=[^0-9]*[0-9]) Assert 0-9
(?= Assert a char that you would consider special
[^!~<>,;:_=?*+#."&§%°()|\[\]$^#\/-]*
[!~<>,;:_=?*+#."&§%°()|\[\]$^#\/-]
)
(?! Assert not 3 times an identical char from the character class in a row
[a-zA-Z0-9!~<>,;:_=?*+#."&§%°()|\[\]$^#\/-]*
([a-zA-Z0-9!~<>,;:_=?*+#."&§%°()|\[\]$^#\/-])\1\1
)
[a-zA-Z0-9!~<>,;:_=?*+#."&§%°()|\[\]$^#\/-]{8,} Match any of the listed 8 or more times
$ End of string
Regex demo

How does the following code mean two consecutive numbers?

This is from an exercise on FCC beta and i can not understand how the following code means two consecutive numbers seeing how \D* means NOT 0 or more numbers and \d means number, so how does this accumulate to two numbers in a regexp?
let checkPass = /(?=\w{5,})(?=\D*\d)/;
This does not match two numbers. It doesn't really match anything except an empty string, as there is nothing preceding the lookup.
If you want to match two digits, you can do something like this:
(\d)(\d)
Or if you really want to do a positive lookup with the (?=\D*\d) section, you will have to do something like this:
\d(?=\D*\d)
This will match against the last digit which is followed by a bunch of non-digits and a single digit. A few examples (matched numbers highlighted):
2 hhebuehi3
^
245673
^^^^^
2v jugn45
^ ^
To also capture the second digit, you will have to put brackets around both numbers. Ie:
(\d)(?=\D*(\d))
Here it is in action.
In order to do what your original example wants, ie:
number
5+ \w characters
a non-number character
a number
... you will need to precede your original example with a \d character. This means that your lookups will actually match something which isn't just an empty string:
\d(?=\w{5,})(?=\D*\d)
IMPORTANT EDIT
After playing around a bit more with a JavaScript online console, I have worked out the problem with your original Regex.
This matches a string with 5 or more characters, including at least 1 number. This can match two numbers, but it can also match 1 number, 3 numbers, 12 numbers, etc. In order to match exactly two numbers in a string of 5-or-more characters, you should specify the number of digits you want in the second half of your lookup:
let regex = /(?=\w{5,})(?=\D*\d{2})/;
let string1 = "abcd2";
let regex1 = /(?=\w{5,})(?=\D*\d)/;
console.log("string 1 & regex 1: " + regex1.test(string1));
let regex2 = /(?=\w{5,})(?=\D*\d{2})/;
console.log("string 1 & regex 2: " + regex2.test(string1));
let string2 = "abcd23";
console.log("string 2 & regex 2: " + regex2.test(string2));
My original answer was about Regex in a vacuum and I glossed over the fact that you were using Regex in conjunction with JavaScript, which works a little differently when comparing Regex to a string. I still don't know why your original answer was supposed to match two numbers, but I hope this is a bit more helpful.
?= Positive lookahead
w{5,} matches any word character (equal to [a-zA-Z0-9_])
{5,}. matches between 5 and unlimited
\D* matches any character that\'s not a digit (equal to [^0-9])
* matches between zero and unlimited
\d matches a digit (equal to [0-9])
This expression is global - so tries to match all
You can always check your expression using regex101

Regex to match exactly 5 numbers and one optional space

I recently needed to create a regular expression to check input in JavaScript. The input could be 5 or 6 characters long and had to contain exactly 5 numbers and one optional space, which could be anywhere in the string. I am not regex-savvy at all and even though I tried looking for a better way, I ended up with this:
(^\d{5}$)|(^ \d{5}$)|(^\d{5} $)|(^\d{1} \d{4}$)|(^\d{2} \d{3}$)|(^\d{3} \d{2}$)|(^\d{4} \d{1}$)
This does what I need, so the allowed inputs are (if 0 is any number)
'00000'
' 00000'
'0 0000'
'00 000'
'000 00'
'0000 0'
'00000 '
I doubt that this is the only way to achieve such matching with regex, but I haven't found a way to do it in a cleaner way. So my question is, how can this be written better?
Thank you.
Edit:
So, it is possible! Tom Lord's answer does what I needed with regular expressions, so I marked it as a correct answer to my question.
However, soon after I posted this question, I realized that I wasn't thinking right, since every other input in the project was easily 'validatable' with regex, I was immediately assuming I could validate this one with it as well.
Turns out I could just do this:
const validate = function(value) {
const v = value.replace(/\s/g, '')
const regex = new RegExp('^\\d{5}$');
return regex.test(v);
}
Thank you all for the cool answers and ideas! :)
Edit2: I forgot to mention a possibly quite important detail, which is that the input is limited, so the user can only enter up to 6 characters. My apologies.
Note: Using a regular expression to solve this problem might not be
the best answer. As answered
below, it may be
easier to just count the digits and spaces with a simple function!
However, since the question was asking for a regex answer, and in some
scenarios you may be forced to solve this with a regex (e.g. if
you're tied down to a certain library's implementation), the following
answer may be helpful:
This regex matches lines containing exactly 5 digits:
^(?=(\D*\d){5}\D*$)
This regex matches lines containing one optional space:
^(?=[^ ]* ?[^ ]*$)
If we put them together, and also ensure that the string contains only digits and spaces ([\d ]*$), we get:
^(?=(\D*\d){5}\D*$)(?=[^ ]* ?[^ ]*$)[\d ]*$
You could also use [\d ]{5,6} instead of [\d ]* on the end of that pattern, to the same effect.
Demo
Explanation:
This regular expression is using lookaheads. These are zero-width pattern matchers, which means both parts of the pattern are "anchored" to the start of the string.
\d means "any digit", and \D means "any non-digit".
means "space", and [^ ] means "any non-space".
The \D*\d is being repeated 5 times, to ensure exactly 5 digits are in the string.
Here is a visualisation of the regex in action:
Note that if you actually wanted the "optional space" to include things like tabs, then you could instead use \s and \S.
Update: Since this question appears to have gotten quite a bit of traction, I wanted to clarify something about this answer.
There are several "simpler" variant solutions to my answer above, such as:
// Only look for digits and spaces, not "non-digits" and "non-spaces":
^(?=( ?\d){5} *$)(?=\d* ?\d*$)
// Like above, but also simplifying the second lookahead:
^(?=( ?\d){5} *$)\d* ?\d*
// Or even splitting it into two, simpler, problems with an "or" operator:
^(?:\d{5}|(?=\d* \d*$).{6})$
Demos of each line above: 1 2 3
Or even, if we can assume that the string is no more than 6 characters then even just this is sufficient:
^(?:\d{5}|\d* \d*)$
So with that in mind, why might you want to use the original solution, for similar problems? Because it's generic. Look again at my original answer, re-written with free-spacing:
^
(?=(\D*\d){5}\D*$) # Must contain exactly 5 digits
(?=[^ ]* ?[^ ]*$) # Must contain 0 or 1 spaces
[\d ]*$ # Must contain ONLY digits and spaces
This pattern of using successive look-aheads can be used in various scenarios, to write patterns that are highly structured and (perhaps surprisingly) easy to extend.
For example, suppose the rules changed and you now wanted to match 2-3 spaces, 1 . and any number of hyphens. It's actually very easy to update the regex:
^
(?=(\D*\d){5}\D*$) # Must contain exactly 5 digits
(?=([^ ]* ){2,3}[^ ]*$) # Must contain 2 or 3 spaces
(?=[^.]*\.[^.]*$) # Must contain 1 period
[\d .-]*$ # Must contain ONLY digits, spaces, periods and hyphens
...So in summary, there are "simpler" regex solutions, and quite possibly a better non-regex solution to OP's specific problem. But what I have provided is a generic, extensible design pattern for matching patterns of this nature.
I suggest to first check for exactly five numbers ^\d{5}$ OR look ahead for a single space between numbers ^(?=\d* \d*$) among six characters .{6}$.
Combining those partial expressions yields ^\d{5}$|^(?=\d* \d*$).{6}$:
let regex = /^\d{5}$|^(?=\d* \d*$).{6}$/;
console.log(regex.test('00000')); // true
console.log(regex.test(' 00000')); // true
console.log(regex.test('00000 ')); // true
console.log(regex.test('00 000')); // true
console.log(regex.test(' 00000')); // false
console.log(regex.test('00000 ')); // false
console.log(regex.test('00 000')); // false
console.log(regex.test('00 0 00')); // false
console.log(regex.test('000 000')); // false
console.log(regex.test('0000')); // false
console.log(regex.test('000000')); // false
console.log(regex.test('000 0')); // false
console.log(regex.test('000 0x')); // false
console.log(regex.test('0000x0')); // false
console.log(regex.test('x00000')); // false
Alternatively match the partial expressions separately via e.g.:
/^\d{5}$/.test(input) || input.length == 6 && /^\d* \d*$/.test(input)
This seems more intuitive to me and is O(n)
function isInputValid(input) {
const length = input.length;
if (length != 5 && length != 6) {
return false;
}
let spaceSeen = false;
let digitsSeen = 0;
for (let character of input) {
if (character === ' ') {
if (spaceSeen) {
return false;
}
spaceSeen = true;
}
else if (/^\d$/.test(character)) {
digitsSeen++;
}
else {
return false;
}
}
return digitsSeen == 5;
}
You can split it in half:
var input = '0000 ';
if(/^[^ ]* [^ ]*$/.test(input) && /^\d{5,6}$/.test(input.replace(/ /, '')))
console.log('Match');
Here's a simple regex to do the job:
^(?=[\d ]{5,6}$)\d*\s?\d*$
Explanation:
^ asserts position at start of the string
Positive Lookahead (?=[\d ]{5,6}$)
Assert that the Regex below matches
Match a single character present in the list below [\d ]{5,6}
{5,6} Quantifier — Matches between 5 and 6 times, as many times as possible, giving back as needed (greedy)
\d matches a digit (equal to [0-9])
matches the character literally (case sensitive)
$ asserts position at the end of the string
\d* matches a digit (equal to [0-9])
Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\d* matches a digit (equal to [0-9])
Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
$ asserts position at the end of the string
string="12345 ";
if(string.length<=6 && string.replace(/\s/g, '').length<=5 && parseInt(string,10)){
alert("valid");
}
You could simply check the length and if its a valid number...
This is how I would do it without regex:
string => [...string].reduce(
([spaces,digits], char) =>
[spaces += char == ' ', digits += /\d/.test(char)],
[0,0]
).join(",") == "1,5";

javascript regular expression for -999x999

I need a regular expression for:
-[n digits]x[n digits]
I already tried this:
var s = "path/path/name-799x1024.jpg";
s.replace(/\d/g, "");
But this gets only the digits.
Here is a small jsfiddle: http://jsfiddle.net/aq6dp49n/
The outcome I try to get is:
pfad/pfade/name.jpg
How do I add the - and the small x between the two digits?
The regular expression that would match that is /-\d+x\d+/. Hence:
s.replace(/-\d+x\d+/, "")
Should work.
Here's what the regex means: the first - tells it that it should look for a - character. Then you have \d+ which means "one or more of \d", where \d is short-hand for the character class [0-9], i.e., all digits. After that you have x, which means it will look for the character x, and finally you have \d+ again, which is the same as before.
To match
-[n digits]x[n digits]
You would want
match(/-[0-9]{n}x[0-9]{n}\b/)
Though if you want an arbitrary (one or more) number of digits, use + in place of {n}. In the case of your example, you'd want 3 and 4 for your values of n.
Here's a step-by-step explanation of what this does:
/-[0-9]{3}x[0-9]{4}\b/
- matches the character - literally
[0-9]{3} match a single character present in the list below
Quantifier: {3} Exactly 3 times
0-9 a single character in the range between 0 and 9
x matches the character x literally (case sensitive)
[0-9]{4} match a single character present in the list below
Quantifier: {4} Exactly 4 times
0-9 a single character in the range between 0 and 9
\b assert position at a word boundary (^\w|\w$|\W\w|\w\W)
To remove the last size-like part of a string, this should do:
"path/path/name-799x1024.jpg".replace(/(.*)-[0-9]+x[0-9]+/, "$1");
// "path/path/name.jpg"
"path/path/name-10x12-799x1024.jpg".replace(/(.*)-[0-9]+x[0-9]+/, "$1");
// "path/path/name-10x12.jpg"
This takes advantage of the fact that regexps are greedy, so the (.*) absorbs (and saves) as much preceding text as possible before finding the next match.
(I prefer to use [0-9] in place of \d because it's more specific (\d also matches non-latin numerals) and therefore slightly faster, though in this case it shouldn't matter.)

Regex improvement match

I need some help to improve a regex!
In JavaScript I have a regular expression which looks for pairs of numbers in a filename
var nums = str.match(/[\d]{1,}[\d]{1,}/gi);
This will match
DV_Banner_1200x627.jpg
DV_Banner_1200y627.jpg
DV_Banner_1200 x 627.jpg
DV_Banner_1200 x627.jpg
DV_Banner_1200 627.jpg
with (1200,627)
I have tried to improve the reg ex, just incase there are more than two pairs of numbers, to look for the following
number(1 digit or more) + whitspace(1 or more) + x (zero or once) + whitspace(1 or more) + number(1 digit or more)
Which should fail on the second example (using a 'y' instead on an 'x'), which I thought would be:
[\d]{1,}[\s]?[x]?[\s]?[\d]{1,}
but it grabs all the digits in
DV_Banner_1200 x 627 01.jpg
with (1200,627,01) whereas I only want the first two numbers. I've written the code to deal only with the first two, but I was wondering where I was going wrong. Only a level 17 regex wizard can save me now! Thanks
I used \d+\s?x?\s?\d+ as my regex (same thing just replacing + for {1,} and removing the unnecessary []). You can see the outcome of it here.
The reason it's matching the 01 is because of all the ?. So it's matching the first /d+ (1 digit: 0), and then 0 of \s, 0 of x, and 0 of \s followed by \d+ (another 1 digit: 1)
The regex
(\d+)(?:\s?x\s?|\s)(\d+)
should do the trick. Test it here
(?:...) is a non-capture group. So it allows alternation while not assigning a back reference to it. This part matches the characters in between the two numbers (either has an x or a <space>).
Just try with following regex:
(\d+)(?:(?: ?x ?)| )(\d+)
demo
You say you want "one or more" whitespace characters between the "x", but you have used the ? quantifier which means "zero or one". Thus, because you've also marked the "x" as optional, it will match any two-or-more digit number: Your first [\d]{1,} will match against 0 then your second one will match on 1.
Note that you do not need to enclose single atoms into a character range: [\d] can be more simply written as \d. Also {1,} -- meaning "one or more" -- is more easily encoded as +.
As you want "one or more" whitespace character on either side of the "x", I would go with:
\d+(?:(?:\s+x\s+)|\s+)\d+
Note that (?: ... ) is a "non-capture group", so these bits won't form part of your match array. However, I don't think you want "one or more" whitespace character, as that won't match your first example. Instead, try this:
\d+(?:(?:\s*x\s*)|\s+)\d+
Where the * quantifier means "zero-or-more".

Categories