str = "The stor-+)_y is someth12ing that tr##ee3 de124scrib%^&ing
becom5es life7 4 difficult";
console.log(str.replace(/\W/g,''));
Guys, have written this RegEx for matching all non alphanumeric characters, but cannot select 'underline' ?
As I know, \d is for all digits
\s for whitespace... so, what letter stand for underline?
To match all non-alphanumeric characters, \W is not enough since matches the same text as [^a-zA-Z0-9_] does. To match _ with your regex, add _ and \W to a character class:
str = "The stor-+)_y is someth12ing that tr##ee3 de124scrib%^&ing becom5es life7 4 difficult";
console.log(str.replace(/[\W_]+/g,''));
Since you are removing, it is advisable to quantify with + (to remove 1+ consecutive occureences in one go).
Related
The input is this:
*Word. Word.* Word word. *…*
"…" Word word. "…"
"…" word. "…"
The following is matching the empty space on the right side of a sentence.
(?<=["*]*[A-Z].+?\.["*]*)\s
If I want to match the empty space on the left side, I have to do this:
\s(?=["*]*[A-Z].+?\.["*]*)
The output should be this (the [] symbolize the matches):
*Word.[]Word.*[]Word word.[]*…*
"…"[]Woad word.[]"…"
"…" word.[]"…"
How to modify this regex so it matches the empty spaces on both sides of a sentence at the same time?
https://regexr.com/5tddc
For the examples shown, you may be able to use this regex with look arounds to match spaces:
(?<=\.\*?) |(?<!\w) (?=[A-Z])
RegEx Demo
RegEx Details:
(?<=\.\*?) : Match a space if that is preceded by a dot and optional *
|: OR
(?<!\w) (?=[A-Z]): Match a space that must be followed by an uppercase letter and must not be preceded by a word character
Perhaps you can match a non word boundary and assert either an uppercase char A-Z or one of " * at the right.
\B[ ](?=[A-Z"*])
The pattern matches:
\B A position where \b does not match
[ ] Match a space (The brackets are for clarity only)
(?= Positive lookahead, assert what is at the right is
[A-Z"*] Match one of A-Z or " or *
) Close lookahead
regex demo
I have spent the last couple of hours trying to figure out how to match all whitespace (\s) unless followed by AND\s or preceded by \sAND.
I have this so far
\s(?!AND\s)
but it is then matching the space after \sAND, but I don't want that.
Any help would be appreciated.
Often, when you want to split by a single character that appears in specific context, you can replace the approach with a matching one.
I suggest matching all sequences of non-whitespace characters joined with AND enclosed with whitespace ones before and then match any other non-whitespace sequences. Thus, we'll ensure we get an array of necessary substrings:
\S+\sAND\s\S+|\S+
See regex demo
I assume the \sAND\s pattern appears between some non-whitespace characters.
var re = /\S+\sAND\s\S+|\S+/g;
var str = 'split this but don\'t split this AND this';
var res = str.match(re);
document.write(JSON.stringify(res));
As Alan Moore suggests, the alternation can be unrolled into \S+(?:\sAND\s\S+)*:
\S+ - 1 or more non-whitespace characters
(?:\sAND\s\S+)* - 0 or more (thus, it is optional) sequences of...
\s - one whitespace (add + to match 1 or more)
AND - literal AND character sequence
\s - one whitespace (add + to match 1 or more)
\S+ - one or more non-whitespace symbols.
Since JS doesn't support lookbehinds, you can use the following trick:
Match (\sAND\s)|\s
Throw away any match where $1 has a value
Here's a short example which replaces the spaces you want with an underscore:
var str = "split this but don't split this AND this";
str = str.replace(/(\sAND\s)|\s/g, function(m, a) {
return a ? m : "_";
});
document.write(str);
I am confused about /\w\b\w/. I think it should match "e w" in "we we", since:
\w is word character which is "e"
\b is word broundary which is " " (space)
\w is another word which is "w"
So the match is "e w" in "we we".
But...
/\w\b\w/ will never match anything, because a word character can never
be followed by both a non-word and a word character.
I got this one from MDN:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions?redirectlocale=en-US&redirectslug=JavaScript%2FGuide%2FRegular_Expressions
I can't understand their explanation. Can you help me explain it in baby step? Thank you!
Nick
The space character isn't the word boundary. A word boundary isn't a character itself, it's the place "in between characters" where a word character transitions to a non-word character.
So "e w".match(/\w\b/) only matches "e", not "e ".
/\w\b\w/ never matches anything because it would require that a word character be immediately followed by a non-word character and also by a word character, which is of course not possible.
The key is the \b meaning. \b matches a word boundary. A word boundary matches the position where a word-character is not followed or preceded by another word-character. Note that a matched word boundary is not included in the match. In other words, the length of a matched word boundary is zero.
So \b itself doesn't match anything, it's just a condition like ^, $ and so on. Like /^\w/ mean start with word-character, /\w\b/ mean a word-character not followed by a word-character.
In "e w", /\w\b/ only match "e" which a word-character not followed by a word-character in here is space, but not "e ".
/\w\W/ do match "e " in "e w". \b just a condition don't match anything.
/\w\b\w/ is mean a word-character both followed by a non-word and a word-character is contradictory, so will never match anything.
\w\b\w means match:
an alphanumeric character (\w); followed by
a transition from alphanumeric to non-alphanumeric characters (or vice-versa) ('\b'). But not any actual character; followed by
an alphanumeric character (\w).
The key point is that \b doesn't consume any characters, it checks which characters are adjacent to the tested position. So \w\b\w matches only two characters, both must be alphanumeric (\w) and the imaginary point between them must have an alphanumeric on one side and non-alphanumeric on the other, which is therefore not possible to match.
Hope this helps.
Your regular expression would fail for the input "we we" because a word boundary in most dialects is a position between \w and a non-word character (\W), or at the beginning or end of a string if it begins or ends with a word character.
Your regular expression is doing this:
\w word characters (a-z, A-Z, 0-9, _)
\b the boundary between a word char (\w) and not a word char
\w word characters (a-z, A-Z, 0-9, _)
Therefore, its saying look for a word character following the position of your word boundary. If you were to remove the ending \w it would match the e in your input.
console.log("we we".match(/\w\b/));
// => [ 'e', index: 1, input: 'we we' ]
I had same question. Reading this post, i finaly figured it out. The difficulty here may be that we imagine \b in \w\b\w as asymbol of space. But here and everywhere \b only points out "after or before" must be non-word (not represents the non-word symbol). And given last assertion, in case \w\b\w, last \w says "No! here is word-symbol". So last \w contradicts to \b. Well, take in account that \b is pointer, not a symbol-class. And for exercise prove, that for firs \w in \w\b\w all this true also :)
use \w\s\w to match what you need. note that \s and \d are different
I'm using the following regular expression to match one or more special characters for a password strength test.
if (password.match(/\W+/)) points++;
This doesn't seem to match the underscore '_' as a special character. Why is this and how can I fix it?
It is because \W is the same as [^\w], while \w contains a-z, A-Z, 0-9, and _ as well.
In order to fix it just add _ character separately:
if (password.match(/[\W_]+/)) points++;
\W (uppercase) means not \w, so anything except word characters.
Word characters (\w) includes letters, digits, and underscore.
Perhaps you should use /[^a-z0-9]+/i to match non-letters.
Are you sure you don't want the \w? The \W is the negation of \w.
\w matches (letters, digits, and underscores), so \W does NOT match letters, digits, and underscores. See here: http://www.regular-expressions.info/reference.html
The match fails because underscore is treated as a word character. From the MDN documentation for \W:
Matches any non-word character. Equivalent to [^A-Za-z0-9_]
You can fix this by grouping underscore and \W:
if (password.match(/[\W_]+/)) points++;
A regex tool such as Javascript Regex Tester can be especially helpful for debugging this sort of thing.
I am using this expression: /\W+/g to match all characters that are not numbers, letters and spaces. It seems to be including spaces. How would I build a regex that did not include spaces?
/[^a-z0-9\s]+/ig
Explanation:
[^ Character class which matches characters NOT in the following class
a-z All lowercase letters of the alphabet
0-9 All numbers
\s Whitespace characters
] End of the character class
i Case-insensitivity to match uppercase letters
A more accurate wording for \W is any Non-Alphanumeric character.
\s is for Any Whitespace.
So, it would be something like this:
[^\s]
\W means "non-word characters", the inverse of \w, so it will match spaces as well. I'm a bit surprised it doesn't match numbers, though.