Need to write regular expression in Javascript

Need to write regular expression in Javascript - javascript

Need to write regular expression in javascript on a field with constraint -
The name can be up to 80 characters long. It must begin with a word character, and it must end with a word character or with ''. The name may contain word characters or '.', '-', ''."
Example -
Allowed strings -
abc.'
abc-'.'
ab-.''-a
Not allowed strings -
rish a
rish.-
What I have tried so far:
!/^[A-Za-z.-'']{1,80}$/.test(Name)

I guess, you're looking for something like this:
^(?=[A-Za-z])[A-Za-z\.\-']{0,79}[A-Za-z']$
To explain:
^(?=[A-Za-z]): Check, that the string starts with a word character. This is a look-ahead assertion, so it will NOT take a part in the match. The rest of the pattern must still account for at least 1 and max 80 characters.
[A-Za-z\.\-']{0,79}: First and middle characters, therefore max 79 chars. Minimum of one is enforced with the last character.
[A-Za-z']$: Ends with a letter or apostrophe.
Testable here: https://regex101.com/r/AOQojT/1

Using look-ahead assertion is a very clever way of solving this.
Another way would be using OR operator:
^[a-zA-Z]$|^[a-zA-Z][a-zA-Z.\-']{0,78}[a-zA-Z']$
It simply checks whether:
^[a-zA-Z]$ - there is only one word character
Or |
^[a-zA-Z]$ - one word character at the very beginning of given string
[a-zA-Z.\-']{0,78} - from zero to seventy-eight characters. . (dot) does not have to be escaped, since it has no special meaning in character set.
[a-zA-Z'] - one word character or apostrophe
Thus it validates strings longer, than 1 character.
https://regex101.com/r/CB1uOw/1

Related

JavaScript regular expressions to match no digits, whitespace and selected symbols

Thanks for taking a look.
My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !#£$%^&*()+= or any other symbol I may choose.
I am however struggling to grasp precisely how regular expressions work.
I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.
So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.
Question 1
Is my reasoning up to this point correct?
The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.
This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.
Question 2
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Question 3
How would I go about writing the regular expression I'm trying to solve?
While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.

Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!
Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."
It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].
The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.
/^[^\d\s!#£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:
i. [] - match a range of characters
ii. []+ - match one or more of that range of characters
iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)
iv. [^\d\s!#£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match
v. ^[^\d\s!#£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match
A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!

Is my reasoning up to this point correct?
Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).
and [^\s] will match the first non-whitespace character
Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).
/^\D+[^\s]+$/ matching strings that contain spaces?
Yes, it does, because \D matches a space (space is not a digit).
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Because \D+ in /^\D+[^\s]+$/can match spaces.
Conclusion:
Use
^[^\d\s!#£$%^&*()+=]+$
It will match strings that have no digits and spaces, and the symbols you do not allow.
Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.

Just insert every character you don't want to include in a negated character class as follows:
^[^\s\d!#£$%^&*()+=]*$
DEMO
Debuggex Demo
^ - start of the string
[^...] - matches one character that is not in `...`
\s - matches a whitespace (space, newline,tab)
\d - matches a digit from 0 to 9
* - a quantifier that repeats immediately preceeding element by 0 or more times
so the regex matches any string that has
1. string that has a beginning
2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !#£$%^&*()+=) i.e., characters that are not included in the character class `[...]`
3.that has ending
NOTE:
If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class

Regular expression X characters long, alphanumeric but not _ and periods, but not at beginning or end

As the subject indicates, I am in need of a JavaScript Regular expression X characters long, that accepts alphanumeric characters, but not the underscore character, and also accepts periods, but not at beginning or end. Periods cannot be consecutive either.
I have been able to almost get to where I want to be searching and reading other people's questions and the answers here on Stack Overflow (such as here).
However, in my case, I need a string that has to be exactly X characters long (say 6), and can contain letters and numbers (case insensitive) and may also include periods.
Said periods cannot be consecutive and also, cannot start, or end the string.
Jd.1.4 is valid, but Jdf1.4f is not (7 characters).
/^(?:[a-z\d]+(?:\.(?!$))?)+$/i
is what I have been able to construct using examples from others, but I cannot get it to only accept strings that match the set length.
/^((?:[a-z\d]+(?:\.(?!$))?)+){6}$/i
works in that it now accepts nothing less than 6 characters, but it also happily accepts anything longer as well...
I am obviously missing something, but I do not know what it is.
Can anyone help?

This should work:
/^(?!.*?\.\.)[a-z\d][a-z\d.]{4}[a-z\d]$/i
Explanation:
^ // matches the beginning of the string
(?!.*?\.\.) // negative lookahead, only matches if there are no
// consecutive periods (.)
[a-z\d] // matches a-z and any digit
[a-z\d.]{4} // matches 4 consecutive characters or digits or periods
[a-z\d] // matches a-z and any digit
$ // matches the end of the string

Another way to do that:
/(?=.{6}$)^[a-z\d]+(?:\.[a-z\d]+)*$/i
explanation:
(?=.{6}$) this lookahead impose the number of characters before
the end of the string
^[a-z\d]+ 1 or more alphanumeric characters at the beginning
of the string
(?:\.[a-z\d]+)* 0 or more groups containing a dot followed by 1 or
more alphanumerics
$ end of the string

How to create a single JavaScript regex to match both length and multiple constraints?

I need to build a JavaScript regular expression with the following constraints:
The input string needs to be at least 6 characters long
The input string needs to contain at least 1 alphabetical character
The input string needs to contain at least 1 non-alphabetical character
I'm seriously lacking a lookback feature in JavaScript. The thing I came up with:
((([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z]))....)|
(.(([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z]))...)|
(..(([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z]))..)|
(...(([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z])).)|
(....(([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z])))
This looks pretty long. Is there a better way?
How I came to this:
Regex for alphabetical character is [a-zA-Z]
Regex for non-alphabetical character is [^a-zA-Z]
So I need to look for a [a-zA-Z][^a-zA-Z] or [^a-zA-Z][a-zA-Z] so (([a-zA-Z][^a-zA-Z])|([^a-zA-Z][a-zA-Z])).
I need to check for n preceding characters and 6-n succeeding characters.

/^(?=.{6})(?=.*[a-zA-Z])(?=.*[^a-zA-Z])/
This means:
^ - start of the string
(?= ... ) - followed by (i.e. an independent submatch; it won't move the current match position)
.{6} - six characters ("start of string followed by six characters" implements the "must be at least six characters long" rule)
.* - 0 or more of any character (except newline - may need to fix this?)
[a-zA-Z] - a letter (.*[a-zA-Z] therefore finds any string with a letter anywhere in it (technically it finds the last letter in it))
[^a-zA-Z] - a non-letter character
In summary: Starting from the beginning of the string, we try to match each of the following in turn:
6 characters (if we find those, the string must be 6 characters long (or more))
an arbitrary string followed by a letter
an arbitrary string followed by a non-letter

Use this regex...
/^(?=.{6,})(?=.*[a-zA-Z])(?=.*[^a-zA-Z]).*$/
-------- ------------- --------------
^ ^ ^
| | |->checks for a single non-alphabet
| |->checks for a single alphabet
|->checks for 6 to many characters
(?=) is a zero width look ahead which checks for a match.It doesn't consume characters.This is the reason why we can use multiple lookaheads back to back

Similar answer to others, thus this doesn't need much explanation, I think the best way is to do
/^(?=.*[a-zA-Z])(?=.*[^a-zA-Z]).{6,}$/
This starts at the beginning of the string, looks ahead for an alphabetical character, looks ahead for a non-alphabetical character and, in the end, it finds a string of 6+ chars, I think there's no need for lookaheads about length

regex for password

I'm trying to get regex for minimum requirements of a password to be minimum of 6 characters; 1 uppercase, 1 lowercase, and 1 number. Seems easy enough? I have not had any experience in regex's that "look ahead", so I would just do:
if(!pwStr.match(/[A-Z]+/) || !pwStr.match(/[a-z]+/) || !pwStr.match(/[0-9]+/) ||
pwStr.length < 6)
//was not successful
But I'd like to optimize this to one regex and level up my regex skillz in the process.

^.*(?=.{6,})(?=.*[a-zA-Z])(?=.*\d)(?=.*[!&$%&? "]).*$
^.*
Start of Regex
(?=.{6,})
Passwords will contain at least 6 characters in length
(?=.*[a-zA-Z])
Passwords will contain at least 1 upper and 1 lower case letter
(?=.*\d)
Passwords will contain at least 1 number
(?=.*[!#$%&? "])
Passwords will contain at least given special characters
.*$
End of Regex
here is the website that you can check this regex - http://rubular.com/

Assuming that a password may consist of any characters, have a minimum length of at least six characters and must contain at least one upper case letter and one lower case letter and one decimal digit, here's the one I'd recommend: (commented version using python syntax)
re_pwd_valid = re.compile("""
# Validate password 6 char min with one upper, lower and number.
^ # Anchor to start of string.
(?=[^A-Z]*[A-Z]) # Assert at least one upper case letter.
(?=[^a-z]*[a-z]) # Assert at least one lower case letter.
(?=[^0-9]*[0-9]) # Assert at least one decimal digit.
.{6,} # Match password with at least 6 chars
$ # Anchor to end of string.
""", re.VERBOSE)
Here it is in JavaScript:
re_pwd_valid = /^(?=[^A-Z]*[A-Z])(?=[^a-z]*[a-z])(?=[^0-9]*[0-9]).{6,}$/;
Additional: If you ever need to require more than one of the required chars, take a look at my answer to a similar password validation question
Edit: Changed the lazy dot star to greedy char classes. Thanks Erik Reppen - nice optimization!

My experience is if you can separate out Regexes, the better the code will read. You could combine the regexes with positive lookaheads (which I see was just done), but... why?
Edit:
Ok, ok, so if you have some configuration file where you could pass string to compile into a regex (which I've seen done and have done before) I guess it is worth the hassle. But otherwise, Even if the answers provided are corrected to match what you need, I'd still advise against it unless you intend to create such a thing. Separate regexes are just so much nicer to deal with.

I haven't tested thoroughly but here's a more efficient version of Amit's. I think his also allowed unspecified characters into the mix (which wasn't technically listed as a rule). This one won't go berserk on you if you accidentally target a large hunk of text, it will fail sooner on strings that are too long and it only allows the characters in the final class.
'.' should be used sparingly. Think of the looping it has to do to determine a match with all the characters it can represent. It's much more efficient to use negating classes.
`^(?=[^0-9]{0,9}[0-9])(?=[^a-z]{0,9}[a-z])(?=[^A-Z]{0,9}[A-Z])(?=[^##$%]{0,9}[##$%])[0-9a-zA-Z##$%]{6,10`}$
There's nothing wrong with trying to find the ideal regEx. But split it up when you need to.
RegEx tends to be explained poorly. I'll add a breakdown:
a - a single 'a' character
ab - a single 'a' character followed by a single b character
a* - 0 or more 'a' characters
a+ - one or more 'a' characters
a+b - one or any number of a characters followed by a single b character.
a{6,} - at least 6 'a' characters (would match more)
a{6,10} - 6-10 'a' characters
a{10} - exactly 10 'a' characters iirc - not very useful
^ - beginning of a string - so ^a+ would not math 'baaaa'
$ - end of a string - b$ would not find a match 'aaaba'
[] signifies a character class. You can put a variety of characters inside it and every character will be checked. By itself only whatever string character you happen to be on is matched against. It can be modified by + and * as above.
[ab]+c - one or any number of a or b characters followed by a single c character
[a-zA-Z0-9] - any letter, any number - there are a bunch of \<some key> characters representing sets like \d for 'digits' I'm guessing. \w iirc is basically [a-zA-Z_]
note: '\' is the escape key for character classes. [a\-z] for 'a' or '-' or 'z' rather than anything from a to z which is what [a-z] means
[^<stuff>] a character class with the caret in front means everything but the characters or <stuff> listed - this is critical to performance in regEx matches hitting large strings.
. - wildcard character representing most characters (exceptions are a handful of really old-school whitespace characters). Not a big deal in very small sets of characters but avoid using it.
(?=<regex stuff>) - a lookahead. Doesn't move the parser further down the string if it matches. If a lookahead fails, the whole match fails. If it succeeds, you go back to the same character before it. That's why we can string a bunch together to search if there's at least one of a given character.
So:
^ - at the beginning followed by whatever is next
(?=[^0-9]{0,9}[0-9]) - look for a digit from 0-9 preceded by up to 9 or 0 instances of anything that isn't 0-9 - next lookahead starts at the same place
etc. on the lookaheads
[0-9a-zA-Z##$%]{6,10} - 6-10 of any letter, number, or ##$% characters
No '$' is needed because I've limited everything to 10 characters anyway

UK Currency Regular Expression for javascript

I'm after a regular expression that matches a UK Currency (ie. £13.00, £9,999.99 and £12,333,333.02), but does not allow negative (-£2.17) or zero values (£0.00 or 0).
I've tried to create one myself, but I've got in a right muddle!
Any help greatfully received.
Thanks!

This'll do it (well mostly...)
/^£?[1-9]{1,3}(,\d{3})*(\.\d{2})?$/
Leverages the ^ and $ to make sure no negative or other character is in the string, and assumes that commas will be used. The pound symbol, and pence are optional.
edit: realised you said non-zero so replaced the first \d with [1-9]
Update: it's been pointed out the above won't match £0.01. The below improvement will but now there's a level of complexity where it may quite possibly be better to test /[1-9]/ first and then the above - haven't benchmarked it.
/^£?(([1-9]{1,3}(,\d{3})*(\.\d{2})?)|(0\.[1-9]\d)|(0\.0[1-9]))$/
Brief explanation:
Match beginning of string followed by optional "£"
Then match either:
a >£1 amount with potential for comma separated groupings and optional pence
OR a <£1 >=£0.10 amount
OR a <=£0.09 amount
Then match end of line
The more fractions of pence (zero in the above) you require adding to the regex the less efficient it becomes.

Under Unix/Linux, it's not always possible to type in the '£' sign in a JavaScript file, so I tend to use its hexadecimal representation, thus:
/^\xA3?\d{1,3}?([,]\d{3}|\d)*?([.]\d{1,2})?$/
This seems to take care of all combinations of UK currency amounts representation that I have come across.

/^\xA3?\d{1,}(?:\,?\d+)*(?:.\d{1,2})?$/;
Explanation:
^ Matches the beginning of the string, or the beginning of a line.
xA3 Matches a "£" character (char code 163)
? Quantifier for match between 0 and 1 of the preceding token.
\d Matches any digit character (0-9).
{1,} Match 1 or more of the preceding token.
(?: Groups multiple tokens together without creating a capture group.
\, Matches a "," character (char code 44).
{1,2} Match between 1 and 2 of the preceding token.
$ Matches the end of the string, or the end of a line if the multiline flag (

You could just make two passes:
/^£\d{1,3}(,\d{3})*(\.\d{2})?$/
to validate the format, and
/[1-9]/
to ensure that at least one digit is non-zero.
This is less efficient than doing it in one pass, of course (thanks, annakata, for the benchmark information), but for a first implementation, just "saying what you want" can significantly reduce developing time.

We Keep Coding

JavaScript is the programming language of the Web.