Regexp methodology for password expression

Regexp methodology for password expression - javascript

I have been using this regexp to verify specific password requirements are met:
$scope.userObj.user_password.match(/^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!##\$%\^&\*])(?=.{8,})/)
Its been working fine...but I just encountered not one, but two users in the same day who tried to set passwords with period (.) in them. Never occurred to me that users would do that...funny thing about users is they always find ways to do things you never thought of. Anyway, I also thought the period was defined as a normal character and not as a special character...so why isn't the above validating as a good password if a period is used?
Second, obviously the above isn't working so to make it work I need to modify the special symbols part to (I think) the following:
(?=.*[!##\$%\^&\*\.\,\(\)\-\+\=])
In my DB, I encrypt the password with PHP SHA512 and then save that into a standard mysql schar(128) column.
A: Will this suffice for my regexp to properly include periods? The
use of periods also makes me wonder if I need to include other
standard keyboard symbols like , ( ) - + = etc. (also included in the new regexp).
B: And then, how far down the rabbit hole do you go - is ~ and ` and
[, ], {, }, \, | characters that should be considered too? Is there a
better way of defining them all without having to list them
individually?
C: Considering how I store the password and allowing all these extra
special characters...are there any specific issues or security
problems I need to be aware of...or things I should avoid?

Answer to A: If you test your very first regex (https://www.phpliveregex.com/) you will see that it already accepts periods, because you are including .* after every positive lookahead (?=).
Your regex is efficient to ensure that your input have AT LEAST one lower case AND one upper case AND one special character AND a minimum of eight characters, but it also accepts everything else.
In other words, your regex is a good 'white list', but you have no 'black list'. You should consider making another test to accept just the characters you want
Like: inputValue.replace(/[^a-zA-Z0-9\!\.\#\$\^]/, "");
Answer to B: About what characters you will accept on a password... there is no rule. You are the admin and you can accept whatever you want since you sanitize/escape properly your inputs before parsing and embedding.
If you have a big list of special characters, you will finish with big regexes. But you don't need a big list, it can be as simple as ! . # $ ^
Answer to C: Security issues with user inputs happens when you execute SQL commands or PHP echo, print with a user input without sanitizing/proper escape.
Here is a really good answer about the subject https://stackoverflow.com/a/130323/10677269
Finally, you should consider the comments above. SHA512 is not encryption (it's a digest algorithm) and should not be used to store password in your database unless you 'salt' them (even then, encryption is a better option).

Related

Regex: How to valid domain part not in email have all numeric

I am struggling with one issue where I need to verify domain part should not all be numeric.
For example:
abc#123.com -> Invalid
abc#1abc.com -> valid
Regex:
^(?=(.{1,64}#.{1,255}))((?!.*?[._]{2})[!#$%&'*+\-\/=?\^_`{|}~a-zA-Z0-9}]{1,64}(\.[!#$%&'*+\-\/=?\^_`{|}~a-zA-Z0-9]{0,}(?<!\.)){0,})#((\[(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}\])|((?!-)(?=.*[a-zA-Z])[a-zA-Z0-9-]{1,63}(?<!-)(\.(?!-)[a-zA-Z0-9-]{1,63}(?<!-)){1,}))$
Above regex need modification because there are some other validation which is working fine with above regex. Only thing is pending to validate domain part should not all numeric.
Updated:
After some research on above regex
I am able to segregate emails in to different groups. Now for group 10 need to add validation if all characters in group 10 string are aplha numeric.
Regex:
^(?=(.{1,64}#.{1,255}))((?!.*?[._]{2})[!#$%&'*+\-\/=?\^_`{|}~a-zA-Z0-9}]{1,64}(\.[!#$%&'*+\-\/=?\^_`{|}~a-zA-Z0-9]{0,}(?<!\.)){0,})#((\[(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}\])|((?!-)(?=.*[a-zA-Z])((?:.*[a-zA-Z0-9]))[a-zA-Z0-9-]{0,63}(?<!-)(\.(?!-)[a-zA-Z0-9-]{1,63}(?<!-)){1,}))$
Explore regex on : https://regex101.com/
TIA

There's no point in doing this - the fact that an email fulfills the requirements as set forth in RFC5322 does not mean it's a valid email address: The only way to know that, is to send an email to it, and have the user reply to it, follow a link inside it, or copy a code/token inside it.
Given that you have to do that anyway, that will also pick up any issues with invalid email addresses. Thus, the correct validation for email is:
Pattern.compile("^.+#.+\\..+$")
(Assuming you don't want single
and this does what you want, which is, filter out obvious incorrect entries, and that's all you need.
If you insist in continuing your mistake, there's always emailregex.com, which has the regex and explains how it works.
NB: Note that you're just wrong. 12345#678.cde can easily be valid - com may not allow you to register a domain that consists solely of digits, but it's not an inherent limitation of the DNS system: Domain parts can be all numbers. The top level domain cannot be, at least, for now, but any other part of it can be. Thus, rejecting foo#123.com is only possible if you program in, on a per-TLD basis, the exact rules. Which also means you need to sign up to the mailing list of every TLD operator to check for any changes they make. You'll be updating that regex every other week. Told you it's a silly thing to want to do!

u can use this to detect the invalid ones.
^\w+([-+.']\w+)*+#\d+.com
just change the .com to which postfix you like.

Custom - Javascript regular expression to validate custom email addresses

I want it to be correct my JavaScript regex pattern to validate below email address scenarios
msekhar#yahoo.com
msekhar#cs.aau.edu
ms.sekhar#yahoo.com
ms_sekhar#yahoo.com
msekhar#cs2.aau.edu
msekhar#autobots.ai
msekhar#interior.homeland1.myanmar.mm
msekhar1922#yahoo.com
msekhar#21#autobots.com
\u001\u002#autobots.com
I have tried the following regex pattern but it's not validating all the above scenarios
/^[_a-z0-9]+(\.[_a-z0-9]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$/
Could any one please help me with this where am doing wrong?

The following regex should do:
^(([^<>()\[\]\.,;:\s#\"]+(\.[^<>()\[\]\.,;:\s#\"]+)*)|(\".+\"))#(([^<>()[\]\.,;:\s#\"]+\.)+[^<>()[\]\.,;:\s#\"]{2,})$
Test it: https://regex101.com/r/7gH0BR/2
EDIT: I have added all your test cases

I have always used this one but note it doesn't trigger on escaped unicode:
^([\w\d._\-#])+#([\w\d._\-#]+[.][\w\d._\-#]+)+$
You can see how it works here: https://regex101.com/r/caa7b2/4

First off [_a-z0-9]+ is going to match the username fields for the majority of those testcases. Anything further testing of username field content will result in a mismatch. If you write a pattern that expects two .-delimitered fields, it'll match when you provide two .-delimitered fields and only then, not anything else. Make a mental note of that. I think you probably meant to put the . in the first character set, and omit this part here: (.[_a-z0-9]+)...
As for the domain part of the email address, similar story there... if you're trying to match domains containing two labels (yahoo and com) against a pattern that expects three... it's going to fail because there's one less label, right? There are domain names that only contain one label which you might want to recognise as email addresses, too, like localhost...
You know, there is a point to where you can dig yourself down a very deep rabbit hole trying to parse email addresses, much to the effect of this question and answer sequence. If you're making this complex using regular expressions... I think maybe a better tool is a proper parser generator... otherwise, write the following:
A pattern that matches anything up until an # character
A pattern that matches the # character (this will help you learn how to avoid your .-related error)
A pattern that matches everything (this will help you understand your .-related error)
Combine the three above in the order presented.

How to make sure that the user can only submit specific pattern while inserting a value into an input? [duplicate]

This question already has an answer here:
html password regular expression validation
(1 answer)
Closed 5 years ago.
I was wondering if I have a form and the form contain some inputs that I want the user to be only able to submit a type of inputs I select , Like if I want to make sure that the password contain at least a CAPITAL letter , a number , a symbol and at least 8 letters , How to make sure even if the Javascript is disabled by the user?

Brief
You'll want to minimalize the checking on the client-side. Any checking done at this point is pretty useless when security and/or validation is concerned. I would suggest doing a simple validation (such as minimum length) but nothing else as any method you try client-side can easily be circumvented.
Doing all your validation server-side prevents users from editing client-side code or disabling JavaScript to prevent validation. As an added bonus, if you do everything server-side (and use minimal validation client-side) it increases maintainability since you're only defining your patterns once and you don't have to worry about compatibility across multiple regex engines (which is a pain).
For example, character classes (such as \p{L}) allow you to specify groups of Unicode characters. These are fantastic when you're talking about ensuring your program works well with multiple languages (i.e. French and the inclusion of characters such as é), but they're not available in HTML or JavaScript!
You should:
Define the pattern once (coders don't like duplication)
Do the validation server-side (forget about true validation client-side, anything you implement at this step can easily be bypassed). KISS
When you're talking about password validation don't limit the characters to specific ranges (as your pattern would client-side using something like [A-Z]). You may think this increases password strength, but it may actually do exactly the opposite. Instead, allow users to use special characters as well (it's simple but using Ä is more secure than A).
Code
Client-Side
(?=.*[A-Z])(?=.*\d)(?=.*[^\w_]).{8,}
Although, honestly, I'd suggest simply using .{8,} and doing the checks solely on the server-side.
<form action="">
<input type="text" pattern="(?=.*[A-Z])(?=.*\d)(?=.*[^\w_]).{8,}" title="Must contain at least one uppercase letter, number and symbol, and at least 8 or more characters"/>
<input type="submit"/>
</form>
Server-Side
See regex in use here
^(?=.*\p{Lu})(?=.*\p{N})(?=.*[^\p{L}\p{N}\p{C}]).{8,}$
Usage
Where $str in the code below is the submitted password
$re = '^(?=.*\p{Lu})(?=.*\p{N})(?=.*[^\p{L}\p{N}\p{C}]).{8,}$';
if(preg_match($re, $str)) {
// Valid password
} else {
// Invalid password - provide user feedback and allow them to try again
}
Explanation
The HTML regex is just a simpler variation of the regex below (without using Unicode classes). I would, once again, suggest using .{8,} for the pattern in HTML and let PHP do the actual password validation.
^ Assert position at the start of the line
(?=.*\p{Lu}) Positive lookahead ensuring at least one uppercase Unicode character exists
(?=.*\p{N}) Positive lookahead ensuring at least one Unicode number exists
(?=.*[^\p{L}\p{N}\p{C}]) Positive lookahead ensuring at least one character that isn't a letter, number, or control character exists (includes punctuation, symbols, separators, marks)
.{8,} Match any character 8 or more times
$ Assert position at the end of the line

This is not simple to answer as it is written but here is the idea.
First check client-side using javascript, match it against the desired pattern before allowing submit. There are a handfull of libraries out there if you dont want to puzzle it out yourself.
Second, and to satisfy the no javascript issue, check server-side. The user may have gotten past your form with faulty data but a server-side check will ensure that it matches what you like before you actually make a change to your database.

How to approach string length constraints when localization is brought into the equation?

Once there was a search input.
It was responsible for filtering data in a table based on user input.
But this search input was special: it would not do anything unless a minimum of 3 characters was entered.
Not because it was lazy, but because it didn't make sense otherwise.
Everything was good until a new and strange (compared to English) language came to town.
It was Japanese and now the minimum string length of 3 was stupid and useless.
I lost the last few pages of that story. Does anyone remember how it ends?

In order to fix the issue, you obviously need to determine if user's input belongs to certain script(s). The most obvious way to do this is to use Unicode Regular Expressions:
var regexPattern = "[\\p{Katakana}\\p{Hiragana}\\p{Han}]+";
The only issue would be, that JavaScript does not support this kind of regular expressions out of the box. Anyway, you are lucky - there is a JS library called XRegExp and its Scripts add-on seems to exactly what you need. Now, the question is, whether you want to require at least three characters for non-Japanese or non-Chinese users, or do it otherwise - require at least three characters for certain scripts (Latin, Common, Cyrillic, Greek and Hebrew) while allowing any other to be searched on one character. I'd suggest the second solution:
if (XRegExp('[\\p{Latin}\\p{Common}\\p{Cyrillic}\\p{Greek}\\p{Hebrew}]+').test(input)) {
// test for string length and call AJAX if the string is long enough
} else {
// call AJAX search method
}
You might want to pre-compile the regular expression for better performance, but that's basically it.

I guess it mainly depends on where you get that min length variable from. If it's hardcoded, you'd probably better use a dynamic internationalization module:
int.getMinStringLength(int.getCurrentLanguage())
Either you have a dynamic bindings framework such as AngularJS, or you update that module when the user changes the language.
Now maybe you'd want to sort your supported languages by using grouping attributes such as "verbose" and "condensed".

How can I remove escaping from a RegExp pattern?

I'm trying to simplify input for a particular regex for my users. A simple example of the regex might be
\b(C|C\+\+|Java)\b
I'm now giving the user the option of appending another branch at the end of the regex by inputting the raw string into a <input type="text"> field. The branch will be interpreted literally, so I need to escape it. I've used https://stackoverflow.com/a/2593661/785663 to get RegExp.quote to do this. I then store the complete regex in a database.
Now, when I retrieve the regex from the database and split it back up and display the branches to the user, I need to remove all the escape characters again. Is there some pre-made function for this or do I need to roll my own?
Yes, I know I ought to replace this with a list of strings to search for. But this only a part of a larger (regex based) picture.

The optimal solution is to change your design: store the unescaped regex, then only escape it when you actually use it. That way you don't have to worry about this messy business of converting it back and forth all the time.
If you use this regex a lot and are worried about the overhead of having to escape it all the time, then store both the unescaped and escaped versions. Update both whenever the user makes a change.
p.s. Allowing user-entered regexes may make your site vulnerable to attack. (Update: Though in this case it is less likely to be a problem, since you are only allowing literal strings)

We Keep Coding

JavaScript is the programming language of the Web.

Regexp methodology for password expression - javascript

Related

Regex: How to valid domain part not in email have all numeric

Custom - Javascript regular expression to validate custom email addresses

How to make sure that the user can only submit specific pattern while inserting a value into an input? [duplicate]

How to approach string length constraints when localization is brought into the equation?

How can I remove escaping from a RegExp pattern?

Categories

Resources