I need a little help. I want to create a regex pattern in order to validate names, it should contain only letters (any type of letters, non European included), apostrophes, periods, dashes and whitespaces. Or, to put it in another flavor, the regex should not validate any numbers, [], {}, <> etc. Is there a way to to that?
Thank you in advance.
/(\w|\s|[\.\'-])+/
But that's not enough, I guess. Surely we must consider that an apostrophe can not be in the beginning, that several dashes can not follow in a row, etc.
You need a more precise definition of the name.
The Regex you pasted is flawed, it should be
^([a-zA-Z]|\s)*$
Notice the extra parenthesis
Also, You were on the right track but just put all allowed characters in the character class [] :
^([-\w'.\s])*$
a-zA-Z was replaced by the short hand character class for words \w
Add allowed characters as needed
Related
How to rewrite the [a-zA-Z0-9!$* \t\r\n] pattern to match hyphen along with the existing characters ?
The hyphen is usually a normal character in regular expressions. Only if it’s in a character class and between two other characters does it take a special meaning.
Thus:
[-] matches a hyphen.
[abc-] matches a, b, c or a hyphen.
[-abc] matches a, b, c or a hyphen.
[ab-d] matches a, b, c or d (only here the hyphen denotes a character range).
Escape the hyphen.
[a-zA-Z0-9!$* \t\r\n\-]
UPDATE:
Never mind this answer - you can add the hyphen to the group but you don't have to escape it. See Konrad Rudolph's answer instead which does a much better job of answering and explains why.
It’s less confusing to always use an escaped hyphen, so that it doesn't have to be positionally dependent. That’s a \- inside the bracketed character class.
But there’s something else to consider. Some of those enumerated characters should possibly be written differently. In some circumstances, they definitely should.
This comparison of regex flavors says that C♯ can use some of the simpler Unicode properties. If you’re dealing with Unicode, you should probably use the general category \p{L} for all possible letters, and maybe \p{Nd} for decimal numbers. Also, if you want to accomodate all that dash punctuation, not just HYPHEN-MINUS, you should use the \p{Pd} property. You might also want to write that sequence of whitespace characters simply as \s, assuming that’s not too general for you.
All together, that works out to apattern of [\p{L}\p{Nd}\p{Pd}!$*] to match any one character from that set.
I’d likely use that anyway, even if I didn’t plan on dealing with the full Unicode set, because it’s a good habit to get into, and because these things often grow beyond their original parameters. Now when you lift it to use in other code, it will still work correctly. If you hard‐code all the characters, it won’t.
[-a-z0-9]+,[a-z0-9-]+,[a-z-0-9]+ and also [a-z-0-9]+ all are same.The hyphen between two ranges considered as a symbol.And also [a-z0-9-+()]+ this regex allow hyphen.
use "\p{Pd}" without quotes to match any type of hyphen. The '-' character is just one type of hyphen which also happens to be a special character in Regex.
Is this what you are after?
MatchCollection matches = Regex.Matches(mystring, "-");
how to write regular expression allow name with one space and special Alphabets?
I tried with this [a-zA-Z]+(?:(?:\. |[' ])[a-zA-Z]+)* but not working for me,
example string Björk Guðmundsdóttir
You may try something along these lines:
^(?!.*[ ].*[ ])[ A-Za-zÀ-ÖØ-öø-ÿ]+$
The first negative lookahead asserts that we do not find two spaces in the name. This implies that at most one space is present (or no spaces at all). Then, we match any number of alphabets, with most accented letters included. Spaces can also be matched, but the lookahead would already ensure that at most one space can be present.
Demo
Use this one:
[a-zA-Z\u00C0-\u00ff]*[ ]{1}[a-zA-Z\u00C0-\u00ff]*
Answer from other question
I am trying to handle Arabic strings.
I want to handle multiple spaces between two strings (i.e. first name, last name).
But the RegEx that I am using is valid only for 1 spacing between the first name and last name.
RegEx used:
/^[\u0600-\u06FF]+([ ][\u0600-\u06FF]+)?$/
Please suggest.
As suggested by Simone Chelo, you need to add "+" to the regex. It means "one or more".
You also don't need to wrap the space with brackets.
This should work for you:
/^[\u0600-\u06FF]+( +[\u0600-\u06FF]+)?$/
If you want any kind of white space, you can use \s instead of [ ]
/^[\u0600-\u06FF]+(\s+[\u0600-\u06FF]+)?$/
Here is a great resource for regex.
NB. I only want to know if it's a valid application of unescaped hyphen in the regex definition. It's not a question about matching email, meaning of hyphen nor backslash, quantifiers or anything else. Also, please note that the linked in answer doesn't really discuss the validity issue between escaped/unescaped hyphen.
Usually I declare the regex for matching email addresses like this.
var emailPattern = /^[a-z.\-_]+#[a-z]+[.]{1}[a-z]{2,3}$/;
emailPattern.test('ss.a_a-#ass.com');
Now, by mistake, a colleague of mine forgot the escape character and **still* made it work, which surprised me, because of the interval meaning of the hyphen. It looks like this.
var weirdPattern = /^[a-z._-]+#[a-z]+[.]{1}[a-z]{2,3}$/;
weirdPattern.test('ss.a_a-#ass.com');
Apparently, it works because the hyphen is the last character in the brackets. My question is if this is just a happy coincidence or if it's a valid syntax? Have I been regexing wrong my whole life?
Hyphens inside character class are used for range. However, when put at the beginning or at the end inside character class there is no need of escaping that.
Note that, in some browsers, hyphens at any position in the character class are still considered as range metacharacters, so it is best practice to always escape it.
Quoting from regular-expressions.info
The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. Both [-x] and [x-] match an x or a hyphen. [^-x] and [^x-] match any character that is not an x or a hyphen. Hyphens at other positions in character classes where they can't form a range may be interpreted as literals or as errors. Regex flavors are quite inconsistent about this.
Two quick questions:
What would be a RegEx string for three letters and two numbers with space before and after them (i.e. " LET 12 ")?
Would you happen to know any good RegEx resources/tools?
For a good resource, try this website and the program RegexBuddy. You may even be able to figure out the answer to your question yourself using these sites.
To start you off you want something like this:
/^[a-zA-Z]{3}\s+[0-9]{2}$/
But the exact details depend on your requirements. It's probably a better idea that you learn how to use regular expressions yourself and then write the regular expression instead of just copying the answers here. The small details make a big difference. Examples:
What is a "letter"? Just A-Z or also foreign letters? What about lower case?
What is a "number"? Just 0-9 or also foreign numerals? Only integers? Only positive integers? Can there be leading zeros?
Should there be a single space between the letters and numbers? Or any amount of any whitespace? Even none?
Do you want to search for this string in a larger text? Or match a line exactly?
etc..
The answers to these questions will change the regular expression. It would be much faster for you in the long run to learn how to create the regular expression than to completely specify your requirements and wait for other people to reply.
I forgot to mention that there will be a space before and after. How do I include that?
Again you need to consider the questions:
Do you mean just one space or any amount of spaces? Possibly not always a space but only sometimes?
Do you mean literally a space character or any whitespace characters?
My guess is:
/^\s+[a-zA-Z]{3}\s+[0-9]{2}\s+$/
/[a-z]{3} [0-9]{2}/i will match 3 letters followed by a whitespace character, and then 2 numbers. [a-z] is a character class containing the letters a through z, and the {3} means that you want exactly 3 members of that class. The space character matches a literal space (alternately, you could use \s, which is a "shorthand" character class that matches any whitespace character). The i at the end is a pattern modifier specifying that your pattern is case-insenstive.
If you want the entire string to only be that, you need to anchor it with ^ and $:
/^[a-z]{3} [0-9]{2}$/i
Regular expression resources:
http://www.regular-expressions.info - great tutorial with a lot of information
http://rexv.org/ - online regular expression tester that supports a variety of engines.
^([A-Za-z]{3}) ([0-9]{2})$ assuming one space between the letters/numbers, as in your example. This will capture the letters and numbers separately.
I use http://gskinner.com/RegExr/ - it allows you to build a regex and test it with your own text.
As you can probably tell from the wide variety of answers, RegEx is a complex subject with a wide variety of opinions and preferences, and often more than one way of doing things. Here's my preferred solution.
^[a-zA-Z]{3}\s*\d{2}$
I used [a-zA-Z] instead of \w because \w sometimes includes underscores.
The \s* is to allow zero or more spaces.
I try to use character classes wherever possible, which is why I went with \d.
\w{3}\s{1}\d{2}
And I like this site.
EDIT:[a-zA-Z]{3}\s{1}\d{2} - The \w supports numeric characters too.
try this regularexpression
[^"\r\n]{3,}