JavaScript RegEx to allow only AlphaNumeric Characters and some special characters - javascript

In a textarea field, i want to allow only alphanumeric characters, -(hyphen),/(forward slash), .(dot) and space.
I have gone through similar questions but one way or the other, my exact requirements seem to differ. So, below is the regex i've come up with after reading answers give by members:
/^[a-z0-9\-\/. ]+$/i
I've tested the regex and so far it seems to work but i want to double check. Please verify as to whether the above regex fulfills my requirements.

You do too much escaping
/^[a-z0-9/. -]+$/i
In a character class, only [, ], \, - and ^ have special meaning, the ^ even only when it is the first character and the - only if it is between characters.
To match a literal ^ just put it into any position but the first. To match a - literally, don't put it between characters (i.e., at the start or at the end).
Escaping things like the /, . or $ is never necessary.

Related

validate username with regex in javascript

I am a newbie to regex and would like to create a regular expression to check usernames. These are the conditions:
username must have between 4 and 20 characters
username must not contain anything but letters a-z, digits 0-9 and special characters -._
the special characters -._ must not be used successively in order to avoid confusion
the username must not contain whitespaces
Examples
any.user.13 => valid
any..user13 => invalid (two dots successively)
anyuser => valid
any => invalid (too short)
anyuserthathasasupersuperlonglongname => invalid (too many characters)
any username => invalid because of the whitespace
I've tried to create my own regex and only got to the point where I specify the allowed characters:
[a-z0-9.-_]{4,20}
Unfortunately, it still matches a string if there's a whitespace in between and it's possible to have two special chars .-_ successively:
If anybody would be able to provide me with help on this issue, I would be extremely grateful. Please keep in mind that I'm a newbie on regex and still learning it. Therefore, an explanation of your regex would be great.
Thanks in advance :)
Sometimes writing a regular expression can be almost as challenging as finding a user name. But here you were quite close to make it work. I can point out three reasons why your attempt fails.
First of all, we need to match all of the input string, not just a part of it, because we don't want to ignore things like white spaces and other characters that appear in the input. For that, one will typically use the anchors ^ (match start) and $ (match end) respectively.
Another point is that we need to prevent two special characters to appear next to each other. This is best done with a negative lookahead.
Finally, I can see that the tool you are using to test your regex is adding the flags gmi, which is not what we want. Particularly, the i flag says that the regex should be case insensitive, so it should match capital letters like small ones. Remove that flag.
The final regex looks like this:
/^([a-z0-9]|[-._](?![-._])){4,20}$/
There is nothing really cryptic here, except maybe for the group [-._](?![-._]) which means any of -._ not followed by any of -._.

My javascript regular expression does not work sometimes [duplicate]

NB. I only want to know if it's a valid application of unescaped hyphen in the regex definition. It's not a question about matching email, meaning of hyphen nor backslash, quantifiers or anything else. Also, please note that the linked in answer doesn't really discuss the validity issue between escaped/unescaped hyphen.
Usually I declare the regex for matching email addresses like this.
var emailPattern = /^[a-z.\-_]+#[a-z]+[.]{1}[a-z]{2,3}$/;
emailPattern.test('ss.a_a-#ass.com');
Now, by mistake, a colleague of mine forgot the escape character and **still* made it work, which surprised me, because of the interval meaning of the hyphen. It looks like this.
var weirdPattern = /^[a-z._-]+#[a-z]+[.]{1}[a-z]{2,3}$/;
weirdPattern.test('ss.a_a-#ass.com');
Apparently, it works because the hyphen is the last character in the brackets. My question is if this is just a happy coincidence or if it's a valid syntax? Have I been regexing wrong my whole life?
Hyphens inside character class are used for range. However, when put at the beginning or at the end inside character class there is no need of escaping that.
Note that, in some browsers, hyphens at any position in the character class are still considered as range metacharacters, so it is best practice to always escape it.
Quoting from regular-expressions.info
The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. Both [-x] and [x-] match an x or a hyphen. [^-x] and [^x-] match any character that is not an x or a hyphen. Hyphens at other positions in character classes where they can't form a range may be interpreted as literals or as errors. Regex flavors are quite inconsistent about this.

Regex returns with incorrect value

I'm trying to create a function with a regex that can decide if my string value is correct or not. It should be true, if the string begins with lower or uppercase alphabetical characters or underscore. If it begins with any others, the function must return false.
My test input is something like this: ".dasfh"
The expressions, what I tried to use: [_a-zA-Z]..., [:alpha:]..., but both of them returned true.
I tried a bit easier task also:
"Hadfg" where the expression is [a-z]...: returns true
BUT
"hadfg" where the expression is [A-Z]...: returns false
Could anybody help me to understand this behaviour?
You're trying to match the first character in the string to be something in particular, this means you have to tell regex that it has to be the first character in the string.
The regex engine just tries to find any match in the entire string.
All you're telling it with [a-z] is "find me a lowercase character anywhere in the string". This means that:
"Hadfg" will equal true because it can find a, d, f or g as a match.
"HADFG" will equal false because there are no lowercase letters.
the same will happen for "hADFG" when matched with [A-Z] for instance, it will be able to find an A, D, F or G as a match whereas "hadfg" will return false because there is no uppercase character.
What you are looking for here is ^ in your regex, it is a special kind of modifier that indicates "start of line"
So when you apply this to your regex it will look like this: /^[a-z]/.
The regex on the previous line basically says "from the start of the string, is the first character following up a lowercase a-z?"
Try it out and you'll see.
For your solution you'd need /^[_a-zA-Z]/ to check if the first character is an _, a-z or A-Z character.
For reference, you can find cheatsheets within these tools (and test your regexes with it ofcourse!)
Regexr - My personal favorite (Uses your browsers JS regex engine)
Rubular - A Ruby regex tester
Regex101 - A Python / PCRE / PHP / JavaScript
And for a reference or tutorial (I'd recommend reading from start to finish if you want to start understanding regexp and how they work) theres regular-expressions.info.
Regex is never easy and be careful with what you do with it, it's a powerful but sometimes ugly beast to deal with :)
PS
I see you tagged your question as email-validation so I'll add a little bonus regex that validates the minimum requirements for an email address to be absolutely correct, I use this one personally:
.+#.+\..{2,}
which when broken up, looks like this:
.+ - one or more of any character
# - followed by a literal # character
.+ - one or more of any character
\. - followed by a literal . character
.{2,} - two or more of any character
Optionally you could replace {2,} with a + to make it one or more but this would allow a TLD with 1 character.
To see a RFC email-regex at work check this link.
When I look at that regex I basically just want to cry in a corner somewhere, there are definitely things you cannot do in an email address that my regex doesn't address but at least it makes sure it's something that looks like it's e-mailable anyways, if a new user decides to fill in some bull that's not my problem anymore and I wouldn't want to force them to change that 1 character just because the huge regex doesn't agree with it either.

Matching variable-term equations

I am trying to develop a regular expression to match the following equations:
(Price+10%+100+200)
(Price+20%+200)
(Price+30%)
(Price+100)
(Price-10%-100-200)
(Price-20%-200)
(Price-30%)
(Price-100)
My regex so far is...
/([(])+([P])+([r])+([i])+([c])+([e])+([+]|[-]){1}([\d])+([+]|[-])?([\d])+([%])?([)])/g
..., but it only matches the following equations:
(Price+100+10%)
(Price+100+100)
(Price+200)
(Price-100-10%)
(Price-100-100)
(Price-200)
Can someone help me understand how to make my pattern match the full set of equations provided?
Note: Parentheses and 'Price' are musts in the equations that the pattern must match.
Try this, which matches all the input strings provided in the question:
/\(Price([+-]\d+%?){1,3}\)/g
You can test it in a regex fiddle.
Things to note:
Only use parentheses where you want to group. Parentheses around single-possibility, fixed-quantity matches (e.g. ([P]) provide no value.
Use character classes (opened with [ and closed with ]) for multiple characters that can match at a position in the pattern (e.g. [+-]). Single-possibility character classes (e.g. [P]) similarly provide no value.
Yes, character classes (generally) implicitly escape regex special characters within them (e.g. ( in [(] vs. equivalent \( outside a character class), but to just escape regex special characters (i.e. to match them literally), you are better off not using a character class and just escaping them (e.g. \() – unless multiple characters should match at a position in the pattern (per the previous point to note).
The quantifier {1} is (almost) always useless: drop it.
The quantifier + means "one or more" as you probably know. However, in a series of cases where you used it (i.e. ([(])+([P])+([r])+([i])+([c])+([e])+), it would match many values that I doubt you expect (e.g. ((((((PPPrriiiicccceeeeee): basically, don't overuse it. Stop to consider whether you really want to match one or more of the character (class) or group to which + applies in the pattern.
To match a literal string without any regex special characters like Price, just use the literal string at the appropriate position in the pattern – e.g. Price in \(Price.
/\(Price[+-](\d)+(%)?([+-]\d+%?)?([+-]\d+%?)?\)/g
works on http://www.regexr.com/
/^[(Price]+\d+\d+([%]|[)])&/i
try at your own risk!

Email verification regex failing on hyphens

I'm attempting to verify email addresses using this regex: ^.*(?=.{8,})[\w.]+#[\w.]+[.][a-zA-Z0-9]+$
It's accepting emails like a-bc#def.com but rejecting emails like abc#de-f.com (I'm using the tool at http://tools.netshiftmedia.com/regexlibrary/ for testing).
Can anybody explain why?
Here is the explaination:
In your regualr expression, the part matches a-bc#def.com and abc#de-f.com is [\w.]+[.][a-zA-Z0-9]+$
It means:
There should be one or more digits, word characters (letters, digits, and underscores), and whitespace (spaces, tabs, and line breaks) or '.'. See the reference of '\w'
It is followed by a '.',
Then it is followed one or more characters within the collection a-zA-Z0-9.
So the - in de-f.com doesn't matches the first [\w.]+ format in rule 1.
The modified solution
You could adjust this part to [\w.-]+[.][a-zA-Z0-9]+$. to make - validate in the #string.
Because after the # you're looking for letters, numbers, _, or ., then a period, then alphanumeric. You don't allow for a - anywhere after the #.
You'd need to add the - to one of the character classes (except for the single literal period one, which I would have written \.) to allow hyphens.
\w is letters, numbers, and underscores.
A . inside a character class, indicated by [], is just a period, not any character.
In your first expression, you don't limit to \w, you use .*, which is 0+ occurrences of any character (which may not actually be what you want).
Use this Regex:
var email-regex = /^[^#]+#[^#]+\.[^#\.]{2,}$/;
It will accept a-bc#def.com as well as emails like abc#de-f.com.
You may also refer to a similar question on SO:
Why won't this accept email addresses with a hyphen after the #?
Hope this helps.
Instead you can use a regex like this to allow any email address.
^[a-zA-Z][\w\.-]*[a-zA-Z0-9]#[a-zA-Z][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$
Following regex works:
([A-Za-z0-9]+[-.-_])*[A-Za-z0-9]+#[-A-Za-z0-9-]+(\.[-A-Z|a-z]{2,})+

Categories