My regular expression isn't matching correctly - javascript

I'm using the following regular expression to match one or more special characters for a password strength test.
if (password.match(/\W+/)) points++;
This doesn't seem to match the underscore '_' as a special character. Why is this and how can I fix it?

It is because \W is the same as [^\w], while \w contains a-z, A-Z, 0-9, and _ as well.
In order to fix it just add _ character separately:
if (password.match(/[\W_]+/)) points++;

\W (uppercase) means not \w, so anything except word characters.
Word characters (\w) includes letters, digits, and underscore.
Perhaps you should use /[^a-z0-9]+/i to match non-letters.

Are you sure you don't want the \w? The \W is the negation of \w.
\w matches (letters, digits, and underscores), so \W does NOT match letters, digits, and underscores. See here: http://www.regular-expressions.info/reference.html

The match fails because underscore is treated as a word character. From the MDN documentation for \W:
Matches any non-word character. Equivalent to [^A-Za-z0-9_]
You can fix this by grouping underscore and \W:
if (password.match(/[\W_]+/)) points++;
A regex tool such as Javascript Regex Tester can be especially helpful for debugging this sort of thing.

Related

Regex for word not surrounded by alphanumeric characters

I want a regex matching a specific word that is not surrounded by any alphanumeric character. My thought was to include a negation before and after:
[^a-zA-Z\d]myspecificword[^a-zA-Z\d]
So it would match:
myspecificword
_myspecificword_
-myspecificword
And not match:
notmyspecificword
myspecificword123
But this simple regex won't match the word by itself unless it is preceeded by a whitespace:
myspecificword // no match
myspecificword // match
Using the flags "gmi" and testing with JavaScript. What am I doing wrong? Shouldn't it be as simple as that?
https://regex101.com/r/BCkbVQ/3
Try using:
(?<![^\s_-])myspecificword(?![^\s_-])
This says to match myspecificword when it surrounded, on both sides, by either the start/end of the input, whitespace, underscore, or dash.
Demo
It is not whitespace that is required but any symbol that is matches [^a-zA-Z\d].
You should use: (Demo)
(?:^|[^a-zA-Z\d])myspecificword(?:[^a-zA-Z\d]|$)
The main benefit is support across all Regexp parsers.
If you truly mean "not surrounded by alphanumerics other than _ (and in your attempted regex you seem to be willing to match anything that isn't a letter or digit), then any of the following should be acceptable:
'myspecificword'
'_myspecificword_'
' myspecificword '
'-myspecificword-'
'(myspecificword)'
And the regex should be:
(?<![^_\W])myspecificword(?![^_\W])
let tests = ['myspecificword',
'_myspecificword_',
' myspecificword ',
'-myspecificword-',
'(myspecificword)',
'amyspecificword',
'1myspecificword'
];
let regex = /(?<![^_\W])myspecificword(?![^_\W])/;
for (let test of tests) {
console.log(regex.test(test));
}
The "accepted" answer will not match (myspecificword), for example.
The title of this question is
Regex for word not surrounded by alphanumeric characters
The other answers have all addressed a different question (which may well be the one intended):
Regex for word neither preceded nor followed by alphanumeric characters
I will refer to these statements as #1 and #2 respectively.
If the specified word were 'cat' and the string were '9cat', 'cat' is not surrounded by alphanumeric characters in the string, so there is a match with #1, but not with #2.
For #1, one could use the regex:
/cat(?!\p{Alpha}|(?<!\p{Alnum})cat/
("match 'cat' not followed by a Unicode alphanumeric character or 'cat' not preceded by a Unicode alphanumeric character"), though it's easier to test for the negation:
/(?<=\p{Alpha}cat(?<=\p{Alnum})/
The test passes if the string does not match this regex.
With interpretation #2, the regex is:
/(?<!\p{Alpha}cat(?!\p{Alnum})/
I think this will work:
/[^a-z0-9]?myspesificword[^a-z0-9]?/i

How to interpret this regular expression /[\W_]/g

My code is:
var result2 = result.replace(/[\W_]/g,"").replace(",","").replace(".","");
The code works i get what i need done, but I don't understand how the regular expression /[\W_]/g works, and I can't find any documentation that i understand.
/ ... /g It's a global regex. So it'll operate on multiple matches in the string.
[ ... ] This creates a character set. Basically it'll match any single character within the listed set of characters.
\W_ This matches the inverse of "word characters" and underscores. Any non-word character.
Then you have a few one off replacements for comma and period. Honestly, if that's the complete code, /[\W_,.]/g, omitting the two other replaces, would work just as well.
[ and ] are the start and end of a character set.
\W means "non-word", as opposed to \w which will match a word.
_ is the "_" character.
/ mark the beginning and end of a regular expression.
g means it's a global search.
From MDN
\W Matches any non-word character. Equivalent to [^A-Za-z0-9_].
For example, /\W/ or /[^A-Za-z0-9_]/ matches '%' in "50%."
the underscore (_) matches a literal underscore
The brackets define a character class meaning that the regexp will match if any non word or an underscore character is present
\W means "any non word character"
[\W_] means "any non word character or a _
/[\W_]/g find globally any non word character or _
replace find all occurences of a regexp, and replace it with another string.
So your expression replace any non word character, or _, or . to an empty string (ie, remove it)
it can be simplified to :
result.replace(/[\W_,\,]/g,"")
Okay, let's break it down. replace(/[\W_]/g, "") means replace every non-word character and underscore with an empty string. So in the string $1.00, it would come out as 100 ($ and . are non-word characters).
Then .replace(",","") removes commas.
And .replace(".","") removes periods.

How to make the below regex to accept any special character

/^[^ ]([\w- \.\\\/&#]+)[^ ]$/,
I have the above regex. I want to make sure it accepts all special characters but i don't want to specify the entire special character listsuch as [\w- \.\\\/&#!##$&]. How can we make sure the above regex accepts all special characters
[^\w\s] matches any non-alphanumeric and non-whitespace character.
\S matches any non-whitespace character.
. matches any character except newlines.
[\S\s] matches any character in a JavaScript regex.
Since you've got \w and a space in there already, you must want all of the ASCII characters except control characters. That would be:
[ -~]
...or any character whose code point is in the range U+0020 (space) to U+007E (tilde). But it looks like you want to make sure the first and last characters are not whitespace. In fact, looking at your previous question, I'll assume you want only letters or digits in those positions. This would work:
/^[A-Za-z0-9][ -~]*[A-Za-z0-9]$/
...but that requires the string to be at least two characters long. To allow for a single-character string, change it to this:
/^[A-Za-z0-9](?:[ -~]*[A-Za-z0-9])?$/
In other words, if there's only one character, it must be a letter or digit. If there are two or more characters, the first and last must letters or digits, while the rest can be any printing character--i.e., a letter, a digit, a "special" (punctuation) character, or a space.
Note that this only matches ASCII characters, not accented Latin letters like  or ë, or symbols from other alphabets or writing systems.
. matches any character except for newline.

Does the Javascript regex pattern \W include spaces?

I am using this expression: /\W+/g to match all characters that are not numbers, letters and spaces. It seems to be including spaces. How would I build a regex that did not include spaces?
/[^a-z0-9\s]+/ig
Explanation:
[^ Character class which matches characters NOT in the following class
a-z All lowercase letters of the alphabet
0-9 All numbers
\s Whitespace characters
] End of the character class
i Case-insensitivity to match uppercase letters
A more accurate wording for \W is any Non-Alphanumeric character.
\s is for Any Whitespace.
So, it would be something like this:
[^\s]
\W means "non-word characters", the inverse of \w, so it will match spaces as well. I'm a bit surprised it doesn't match numbers, though.

JavaScript special characters/regular expressions

I'm trying to learn from reading Mozilla documentation for regular expressions, but there's one thing I don't get. For the special character \s it gives the following example
/\s\w*/ matches ' bar' in "foo bar."
I understand that \s is the special character for white space, but why is there a w* in the example?
doesn't /\s/ also match ' bar' in "foo bar."?
What's with the w*?
/\s\w*/ is whitespace character followed by 0 or more word characters.
/\s/ would only find the whitespace in the example.
\w matches any alphanumerical character (word characters) including underscore (short for [a-zA-Z0-9_]).
It's a character escape.
\w is all word characters (letters, digits, and underscores)
Check this link for more documentation on such shorthand

Categories