NB. I only want to know if it's a valid application of unescaped hyphen in the regex definition. It's not a question about matching email, meaning of hyphen nor backslash, quantifiers or anything else. Also, please note that the linked in answer doesn't really discuss the validity issue between escaped/unescaped hyphen.
Usually I declare the regex for matching email addresses like this.
var emailPattern = /^[a-z.\-_]+#[a-z]+[.]{1}[a-z]{2,3}$/;
emailPattern.test('ss.a_a-#ass.com');
Now, by mistake, a colleague of mine forgot the escape character and **still* made it work, which surprised me, because of the interval meaning of the hyphen. It looks like this.
var weirdPattern = /^[a-z._-]+#[a-z]+[.]{1}[a-z]{2,3}$/;
weirdPattern.test('ss.a_a-#ass.com');
Apparently, it works because the hyphen is the last character in the brackets. My question is if this is just a happy coincidence or if it's a valid syntax? Have I been regexing wrong my whole life?
Hyphens inside character class are used for range. However, when put at the beginning or at the end inside character class there is no need of escaping that.
Note that, in some browsers, hyphens at any position in the character class are still considered as range metacharacters, so it is best practice to always escape it.
Quoting from regular-expressions.info
The hyphen can be included right after the opening bracket, or right before the closing bracket, or right after the negating caret. Both [-x] and [x-] match an x or a hyphen. [^-x] and [^x-] match any character that is not an x or a hyphen. Hyphens at other positions in character classes where they can't form a range may be interpreted as literals or as errors. Regex flavors are quite inconsistent about this.
Related
I found this regex which acomplishes the following:
^(\w+\s)*(\w+$)
no space at beginning
no space at end
no double or more consecutive spaces in between
But I also need to allow any character and currently it only accepts alphanumeric values.
How do I write this?
Replace \w (which matches [a-zA-Z0-9_]) with \S (not a whitespace character, as mentioned in the comments. Should be equivalent to [^\s] but if there is a shorthand, better use it), making ^(\S+\s)*(\S+$).
Note that this matches everything that is not matched by \s, also any weird unicode symbols or the likes.
This is a token answer as there seem to be no answers after my comment and OP noted that marking as resolved cannot be done on comments.
Tried to search for /\,$/ online, but coudnt find anything.
I have:
coords = coords.replace(/\,$/, "");
Im guessing it returns coords string index number. What I have to search online for this, so I can learn more?
/\,$/ finds the comma character (,) at the end of a string (denoted by the $) and replaces it with empty (""). You sometimes see this in regex code aiming to clean up excerpts of text.
It's a regular expression to remove a trailing comma.
That thing is a Regular Expression, also known as regex or regexp. It is a way to "match" strings using some rules. If you want to learn how to use it in JavaScript, read the Mozilla Developer Network page about RegExp.
By the way, regular expressions are also available on most languages and in some tools. It is a very useful thing to learn.
That's a regular expression that finds a comma at the end of a string. That code removes the comma.
// defines a JavaScript regular expression, used to match a pattern within a string.
\,$ is the pattern
In this case \, translates to ,. A backslash is used to escape special characters, but in this case, it's not necessary. An example where it would be necessary would be to remove trailing periods. If you tried to do that with /.$/ the period here has a different meaning; it is used as a wildcard to match [almost] any character (aside for some newlines). So in this case to match on "." (period character) you would have to escape the wildcard (/\.$/).
When $ is placed at the end of the pattern, it means only look at the end of the string. This means that you can't mistakingly find a comma anywhere in the middle of the string (e.g., not after help in help, me,), only at the end (trailing). It also speeds of the regular expression search considerably. If you wanted to match on characters only at the beginning of the string, you would start off the pattern with a carat (^), for instance /^,/ would find a comma at the start of a string if one existed.
It's also important to note that you're only removing one comma, whereas if you use the plus (+) after the comma, you'd be replacing one or more: /,+$/.
Without the +; trailing commas,, becomes trailing commas,
With the +; no trailing comma,, becomes no trailing comma
I need a JS regular expression which should allow only the word having alphanumeric, dot and hyphen.
Let me know this is correct.
var regex = /^[a-zA-Z_0-9/.-]+$/;
Almost. That will also allow underscores and slashes. Remove those from your range:
var regex = /^[a-zA-Z0-9.-]+$/;
This will also not match the empty string. That may be what you want, but it also may not be what you want. If it's not what you want, change + to *.
The first simplifications I'd make are to use the "word character" shorthand '\w', which is about the same as 'a-zA-Z', but shorter, and automagically stays correct when you move to other languages that include some accented alphabetic characters, and the "digit character" shorthand '\d'.
Also, although dot is special in most places in regular expressions, it's not special inside square brackets, and shouldn't be quoted there. (Besides, the single character quote character is back-slash, not forward-slash. That forward-slash of yours inside the brackets is the same character that begins and ends the RE, and so is likely to prematurely terminate the RE and so cause a parse error!) Since we're completely throwing it away, it no longer matters whether it should be forward-slash or back-slash, quoted or bare.
And as you've noticed, hyphen has a special meaning of "range" inside brackets (ex: a-z), so if you want a literal hyphen you have to do something a little different. By convention that something is to put the literal hyphen first inside the brackets.
So my result would be var regex = /^[-.\w\d]+$/;
(As you've probably noticed, there's almost always more than one way to express a regular expression so it works, and RE weenies spend as much time on a) economy of expression and b) run-time performance as they do on getting it "correct". In other words, you can ignore much of what I've just said, as it doesn't really matter to you. I think all that really matters is a) getting rid of that extraneous forward-slash and b) moving the literal hyphen to be the very first character inside the square brackets.)
(Another thought: very frequently when accepting alphabetic characters and hyphens, underscore is acceptable too ...so did you really mean to have that underscore after all?)
(Yet another thought: sometimes the very first character of an identifier must be an alpha, in which case what you probably want is var regex = /^\w[-.\w\d]*$/; You may want a different rule for the very first character in any case, as the naive recipe above would allow "-" and "." as legitimate words of length one.)
I need a little help. I want to create a regex pattern in order to validate names, it should contain only letters (any type of letters, non European included), apostrophes, periods, dashes and whitespaces. Or, to put it in another flavor, the regex should not validate any numbers, [], {}, <> etc. Is there a way to to that?
Thank you in advance.
/(\w|\s|[\.\'-])+/
But that's not enough, I guess. Surely we must consider that an apostrophe can not be in the beginning, that several dashes can not follow in a row, etc.
You need a more precise definition of the name.
The Regex you pasted is flawed, it should be
^([a-zA-Z]|\s)*$
Notice the extra parenthesis
Also, You were on the right track but just put all allowed characters in the character class [] :
^([-\w'.\s])*$
a-zA-Z was replaced by the short hand character class for words \w
Add allowed characters as needed
I am trying to highlight a set of keywords using JavaScript and regex, I facing one problem, my keyword may contain literal and special characters as in #text #number etc. I am using word boundary to match and replace the whole word and not a partial word (contained within another word).
var pattern = new regex('\b '( + keyword +')\b',gi);
Here this expression matches the whole keywords and highlights them, however in case if any keyword like "number:" do not get highlighted.
I am aware that \bword\b matches for a word boundary and special characters are non alphanumeric characters hence are not matched by the above expression.
Can you let me know what regex expression I can use to accomplish the above.
==Update==
For the above I tried Tim Pietzcker's suggestion for the below regex,
expr: (?:^|\\b|\\s)(" + keyword + ")(?:$|\\b|\\s)
The above seems to be working for getting me a match for the whole word with alphanumeric and non alphanumeric characters, however whenever a keyword has consecutive html tag before or after the keyword without a space, it does not highlight that keyword (e.g. social security *number:< br >*)
I tried the following regex, but it replaces the html tag preceding the keyword
expr: (?:^|\b|\s|<[^>]+>)number:(?:$|\b|\s|<[^>]+>)
Here for the keyword number: which has < br > (space added intentionally for br tag to avoid browser interpreting the tag) coming next without space in between gets highlighted with the keyword.
Can you suggest an expression which would ignore the consecutive html tag for the whole word with both alphanumeric and non alphanumeric characters.
2021 update: JS now supports lookbehind so this answer is a little outdated.
OK, so you have two problems: JavaScript doesn't support lookbehind, and \b only finds boundaries between alphanumeric and non-alphanumeric characters.
The first question: What exactly does constitute a word boundary for your keywords? My guess is that it must be either a \b boundary or whitespace. If that is the case, you could search for
"(?:^|\\b|\\s)(" + keyword + ")(?:$|\\b|\\s)"
Of course the whitespace characters around keywords like #number# would also become part of the match, but perhaps highlighting those isn't such a problem. In other cases, i. e. if there is an actual word boundary that can match, the spaces won't be part of the match so it should work fine in the majority of cases.
The actual word you're interested in will be in backreference #1, so if you can highlight that separately, even better.
EDIT:
If other characters than space may occur after/before a keyword, then I think the only thing you can do (if you're stuck with JavaScript) is:
Check if your keyword starts with an alnum character.
If so, prepend \b to your regex.
Check if your keyword ends with an alnum character.
If so, append \b to your regex.
So, for keyword, use \bkeyword\b; for number:, use \bnumber:; for #twitter, use #twitter\b.
We need to look for a substring that has a whitespace character on both sides. If JavaScript supported lookbehind, this would look like:
var re = new RegExp('(?<!\\S)' + keyword + '(?!\\S)', 'gi');
That won't work though (but would in Perl and other scripting languages). Instead, we need to include the leading whitespace character (or beginning of string) as the beginning part of the match (and optionally capture what we are really looking for into $1):
var re = new RegExp('(?:^|\\s)(' + keyword + ')(?!\\S)', 'gi');
Just consider that the real place where any match starts will be one character after what is returned by the .index property returned by re.exec(string), and that if you are accessing the matched string, you either need to remove the first character with .slice(1) or simply access what is captured.
maybe what you're trying to do is
'\b\W*(' + keyword + ')\W*\b'
Lookahead and lookbehind are your answer: "(?=<[\s^])" + keyword + "(?=[\s$])". The bits in brackets aren't included in the match, so include whatever characters aren't permitted in the keywords in there.
As Tim correctly points out, \b are tricky things that work differently than the way people often think they work. Read this answer for more details about this matter, and what you can do about it.
In brief, this is a boundary to the left:
(?(?=\w)(?<!\w)|(?<!\W))
and this is a boundary to the right:
(?(?<=\w)(?!\w)|(?!\W))
People always think there are spaces involved, but there aren’t. However, now that you know the real definitions, it’s easy to build that into them. One could swap out \w and \W in echange for \s and \Sin the two patterns above. Or one could add in whitespace awareness to the else blocks.
Try this it should work...
var pattern = new regex(#"\b"+Regex.escape(keyword)+#"\b",gi);