How to interpret this regular expression /[\W_]/g

How to interpret this regular expression /[\W_]/g - javascript

My code is:
var result2 = result.replace(/[\W_]/g,"").replace(",","").replace(".","");
The code works i get what i need done, but I don't understand how the regular expression /[\W_]/g works, and I can't find any documentation that i understand.

/ ... /g It's a global regex. So it'll operate on multiple matches in the string.
[ ... ] This creates a character set. Basically it'll match any single character within the listed set of characters.
\W_ This matches the inverse of "word characters" and underscores. Any non-word character.
Then you have a few one off replacements for comma and period. Honestly, if that's the complete code, /[\W_,.]/g, omitting the two other replaces, would work just as well.

[ and ] are the start and end of a character set.
\W means "non-word", as opposed to \w which will match a word.
_ is the "_" character.
/ mark the beginning and end of a regular expression.
g means it's a global search.

From MDN
\W Matches any non-word character. Equivalent to [^A-Za-z0-9_].
For example, /\W/ or /[^A-Za-z0-9_]/ matches '%' in "50%."
the underscore (_) matches a literal underscore
The brackets define a character class meaning that the regexp will match if any non word or an underscore character is present

\W means "any non word character"
[\W_] means "any non word character or a _
/[\W_]/g find globally any non word character or _
replace find all occurences of a regexp, and replace it with another string.
So your expression replace any non word character, or _, or . to an empty string (ie, remove it)
it can be simplified to :
result.replace(/[\W_,\,]/g,"")

Okay, let's break it down. replace(/[\W_]/g, "") means replace every non-word character and underscore with an empty string. So in the string $1.00, it would come out as 100 ($ and . are non-word characters).
Then .replace(",","") removes commas.
And .replace(".","") removes periods.

Related

How to remove char from regex

I'm not got, yet, with regex. I've been trying to break my head to get this to work.
I need a regex that allows the user to enter
Any alphabetical char (a-z)
Any number
For special char only "-" and "_".
"#" is not allowed.
I got this but no dice. [^a-zA-Z0-9]
Thanks

^[\w-]+$
will match a string following the rules you describe. \w matches letters, digits, or underscore, then it adds - to that set. Anchoring with ^ and $ requires all the characters in the string to match this pattern.

remove ^ character in square brackets because is negative ranges, add some \-\_ to allow '-' and '_' character inside square brackets
[a-zA-Z0-9\-\_]+

Regex for any character except quote after comma

I want to match every word separated by comma, but it must not include a quote like ' or ".
I was using this regex:
^[a-zA-Z0-9][\!\[\#\\\:\;a-zA-Z0-9`_\s,]+[a-zA-Z0-9]$
However, it only matches a character and number and not a symbol.
The output should be:
example,example //true
exaplle,examp#3 //true, with symbol or number
example, //false, because there is no word after comma
,example //false, because there is no word before comma
##example&$123,&example& //true, with all character and symbol except quote

You can match 1+ times what is present in the character class. Then repeat 1+ times in a non capturing group (?: what is present in the character class, preceded by a comma.
^[!\[#\\:;a-zA-Z0-9`_ &$#]+(?:,[!\[#\\:;a-zA-Z0-9`_ &$#]+)+$
Regex demo
Note that you don't have to escape \!, \#, \: and \; in the character class, and that \s might also possibly match a newline.

I'm assuming you want the whole string to match perfectly with your conditions and return true then and then only.
These are the conditions-
Each word should be separated by a comma, said comma should have 2 valid words on each side
Words can contain anything except the 2 kinds of quotes (' and ") and whitespace characters (spaces and newlines).
The regex you would use is this- ^(?:[^,'"\s]+,[^,'"\s]+)+$, with the global flag (g) on.
Check out the demo here
Edit: As per request of being able to match only a single word.
This is the regex you would use for that- ^(?:(?:[^,'"\s]+,[^,'"\s]+)+|[^,'"\s]+)$
This will match words separated by a , as well as match just a single word.
The conditions for what qualifies as a word remains the same as aforementioned.
Quick explanation:-
^[^,'"\s]+,[^,'"\s]+$
This part matches 2 words separated by a comma, [^,'"\s]+ denotes a word
Wrapping that whole thing in ^(?:[^,'"\s]+,[^,'"\s]+)+$ simply makes it repeat, so it'll match N number of words separated by a comma, not just 2
Then adding another alternative using | and wrapping the whole thing in a group (non-capturing), we get ^(?:(?:[^,'"\s]+,[^,'"\s]+)+|[^,'"\s]+)$
This simply just adds the alternative [^,'"\s]+ - which matches a singular word.
Check out the updated demo here

Simple regex with repeated unordered matches

I have this regex
/^[a-z]{1,}( (?=[a-z])){0,}(_(?=[a-z])){0,}[a-z]{0,}$/
I want to match
ag_b_cf_ajk
or
zva b c de
or
hh_b opxop a_b
so any character tokens separated by a single space or underscore.
(In the regex above, we have a literal space, which is legal, and we have look-aheads that ensure that a space or underscore is followed by a character).
The problem is, my above regex is only matching the first space or underscore, like so:
axz_be
axz be
but these fail
axz_be_j
axz be j
I believe I missing some concept with regexes in order to solve this as I have been trying for the last few hours!

It seems you can just use
^[a-z]+(?:[_ ][a-z]+)*$
See the regex demo
The regex matches
^ - start of string
[a-z]+ - one or more lowercase ASCII letters
(?:[_ ][a-z]+)* - zero or more sequences of:
[_ ] - a space or an underscore
[a-z]+ - one or more lowercase ASCII letters
$ - end of string
If the space or underscore must appear at least once, use the + quantifier instead of *:
^[a-z]+(?:[_ ][a-z]+)+$
^
To add a multicharacter alternative to the underscore and hyphen, you need to introduce another non-capturing group:
^[a-z]+(?:(?:[_ ]|\[])[a-z]+)+$
See another regex demo

JavaScript regular expressions to match no digits, whitespace and selected symbols

Thanks for taking a look.
My goal is to come up with a regexp that will match input that contains no digits, whitespace or the symbols !#£$%^&*()+= or any other symbol I may choose.
I am however struggling to grasp precisely how regular expressions work.
I started out with the simple pattern /\D/, which from my understanding will match the first non-digit character it can find. This would match the string 'James' which is correct but also 'James1' which I don't want.
So, my understanding is that if I want to ensure that a pattern is not found anywhere in a given string, I use the ^ and $ characters, as in /^\D$/. Now because this will only match a single character that is not a digit, I needed to use + to specify that 1 or more digits should not be founds in the entire string, giving me the expression /^\D+$/. Brilliant, it no longer matches 'James1'.
Question 1
Is my reasoning up to this point correct?
The next requirement was to ensure no whitespace is in the given string. \s will match a single whitespace and [^\s] will match the first non-whitespace character. So, from my understanding I just had to add this to what I have already to match strings that contain no digits and no whitespace. Again, because [^\s] will only match a single non-white space character, I used + to match one or more whitespace characters, giving the new regexp of /^\D+[^\s]+$/.
This is where I got lost, as the expression now matches 'James1' or even 'James Smith25'. What? Massively confused at this point.
Question 2
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Question 3
How would I go about writing the regular expression I'm trying to solve?
While I am keen to solve the problem I am more interested in figuring where my understanding of regular expressions is lacking, so any explanations would be helpful.

Not quite; ^ and $ are actually "anchors" - they mean "start" and "end", it's actually a little more complicated, but you can consider them to mean the start and end of a line for now - look up the various modifiers on regular expressions if you're interested in learning more about this. Unfortunately ^ has an overloaded meaning; if used inside square brackets it means "not", which is the meaning you are already acquainted with. It's very important that you understand the difference between these two meanings and that the definition in your head actually applies only to character range matching!
Contributing further to your confusion is that \d means "a numerical digit" and \D means "not a numerical digit". Similarly \s means "a whitespace (space/tab/newline/etc.) character" and \S means "not a whitespace character."
It's worth noting that \d is effectively a shortcut for [0-9] (note that - has a special meaning inside square brackets), and \D is a shortcut for [^0-9].
The reason it's matching strings that contain spaces is that you've asked for "1+ non-numerical digits followed by 1+ non-space characters" - so it'll match lots of strings! I think that perhaps you don't understand that regular expressions match bits of strings, you're not adding constraints as you go, but rather building up bots of matchers that will match bits of corresponding strings.
/^[^\d\s!#£$%^&*()+=]+$/ is the answer you're looking for - I'd look at it like this:
i. [] - match a range of characters
ii. []+ - match one or more of that range of characters
iii. [^\d\s]+ - match one or more characters that do not match \d (numerical digit) or \s (whitespace)
iv. [^\d\s!#£$%^&*()+=]+ - here's a bunch of other characters I don't want you to match
v. ^[^\d\s!#£$%^&*()+=]+$ - now there are anchors applied, so this matcher has to apply to the whole line otherwise it fails to match
A useful website to explore regexs is http://regexr.com/3b9h7 - which I supply with my suggested solution as an example. Edit: Pruthvi Raj's link to debuggerx is awesome!

Is my reasoning up to this point correct?
Almost. /\D/ matches any character other than a digit, but not just the first one (if you use g option).
and [^\s] will match the first non-whitespace character
Almost, [^\s] will match any non-whitespace character, not just the first one (if you use g option).
/^\D+[^\s]+$/ matching strings that contain spaces?
Yes, it does, because \D matches a space (space is not a digit).
Why is /^\D+[^\s]+$/ matching strings that contain spaces?
Because \D+ in /^\D+[^\s]+$/can match spaces.
Conclusion:
Use
^[^\d\s!#£$%^&*()+=]+$
It will match strings that have no digits and spaces, and the symbols you do not allow.
Mind that to match a literal -, ] or [ with a character class, you either need to escape them, or use at the start or end of the expression. To play it safe, escape them.

Just insert every character you don't want to include in a negated character class as follows:
^[^\s\d!#£$%^&*()+=]*$
DEMO
Debuggex Demo
^ - start of the string
[^...] - matches one character that is not in `...`
\s - matches a whitespace (space, newline,tab)
\d - matches a digit from 0 to 9
* - a quantifier that repeats immediately preceeding element by 0 or more times
so the regex matches any string that has
1. string that has a beginning
2. containing 0 or more number of characters that is not whitesapce, digit, and all the symbols included in the character class ( In this example !#£$%^&*()+=) i.e., characters that are not included in the character class `[...]`
3.that has ending
NOTE:
If the symbols you don't want it to have also includes - , a hyphen, don't put it in between some other characters because it is a metacharacter in character class, put it at last of character class

regex to match alphanumeric and hyphen only, strip everything else in javascript

I want to strip everything except alphanumeric and hyphens.
so far i've got this but its not working:
String = String.replace(/^[a-zA-Z0-9-_]+$/ig,'');
any help appreciated?

If you want to remove everything except alphanum, hypen and underscore, then negate the character class, like this
String = String.replace(/[^a-zA-Z0-9-_]+/ig,'');
Also, ^ and $ anchors should not be there.
Apart from that, you have already covered both uppercase and lowercase characters in the character class itself, so i flag is not needed. So, RegEx becomes
String = String.replace(/[^a-zA-Z0-9-_]+/g,'');
There is a special character class, which matches a-zA-Z0-9_, \w. You can make use of it like this
String = String.replace(/[^\w-]+/g,'');
Since \w doesn't cover -, we included that separately.
Quoting from MDN RegExp documentation,
\w
Matches any alphanumeric character from the basic Latin alphabet, including the underscore. Equivalent to [A-Za-z0-9_].
For example, /\w/ matches 'a' in "apple," '5' in "$5.28," and '3' in "3D."

We Keep Coding

JavaScript is the programming language of the Web.