JavaScript Regex to create array of whole words only. - javascript

I'm trying to split out the whole words out of a string without whitespace or special characters.
So from
'(votes + downvotes) / views'
I'd like to create an array like the following
['votes', 'downvotes, 'views']
Have tried the following, but catching the parens and some whitespace.
https://regex101.com/r/yX9iW8/1

You could use /\w+/g as regular expression in combination with String#match
var array = '(votes + downvotes) / views'.match(/\w+/g);
console.log(array);

You can use \W+ to split on all non-word characters
'(votes + downvotes) / views'.split(/\W+/g).filter(x => x !== '');
// ["votes", "downvotes", "views"]
Or \w+ to match on all word characters
'(votes + downvotes) / views'.match(/\w+/g);
// ["votes", "downvotes", "views"]

It's very simple...
var matches = '(votes + downvotes) / views'.match(/[a-z]+/ig);
You must decide for your project what characters make a words, and min-length of word. It can be chars with digits and dash with min-length of 3 characters...
[a-z0-9-]{3,}
Good luck!

Related

Javascript - Regex - how to filter characters that are not part of regex

I want to accept words and some special characters, so if my regex
does not fully match, let's say I display an error,
var re = /^[[:alnum:]\-_.&\s]+$/;
var string = 'this contains invalid chars like ##';
var valid = string.test(re);
but now I want to "filter" a phrase removing all characters not matching the regex ?
usualy one use replace, but how to list all characters not matching the regex ?
var validString = string.filter(re); // something similar to this
how do I do this ?
regards
Wiktor Stribiżew solution works fine :
regex=/[^a-zA-Z\-_.&\s]+/g;
let s='some bloody-test #rfdsfds';
s = s.replace(/[^\w\s.&-]+/g, '');
console.log(s);
Rajesh solution :
regex=/^[a-zA-Z\-_.&\s]+$/;
let s='some -test #rfdsfds';
s=s.split(' ').filter(x=> regex.test(x));
console.log(s);
JS regex engine does not support POSIX character classes like [:alnum:]. You may use [A-Za-z0-9] instead, but only to match ASCII letters and digits.
Your current regex matches the whole string that contains allowed chars, and it cannot be used to return the chars that are not matched with [^a-zA-Z0-9_.&\s-].
You may remove the unwanted chars with
var s = 'this contains invalid chars like ##';
var res = s.replace(/[^\w\s.&-]+/g, '');
var notallowedchars = s.match(/[^\w\s.&-]+/g);
console.log(res);
console.log(notallowedchars);
The /[^\w\s.&-]+/g pattern matches multiple occurrences (due to /g) of any one or more (due to +) chars other than word chars (digits, letters, _, matched with \w), whitespace (\s), ., & and -.
To match all characters that is not alphanumeric, or one of -_.& move ^ inside group []
var str = 'asd.=!_#$%^&*()564';
console.log(
str.match(/[^a-z0-9\-_.&\s]/gi),
str.replace(/[^a-z0-9\-_.&\s]/gi, '')
);

Conditional regex replace in nodejs

I have an address like this:
117042,ABC DEF,HIJ KLMNOP,9,170
and want to have
117042,ABC DEF,HIJ KLMNOP 9 170
I tried it with this replace Regex
address = address.replace(/,[\d]/g, " ");
but this results in
117042,ABC DEF,HIJ KLMNOP 70
I do not want to replace the digit but still need to check if the digit comes after the comma to not match the other commas.
I am not very good with regex thats why I am asking for help.
You may only replace commas after numbers if they occur at the end of string:
var s = "117042,ABC DEF,HIJ KLMNOP,9,170";
var res = s.replace(/,(\d+)(?=(?:,\d+)*$)/g, " $1");
console.log(res);
The ,(\d+)(?=(?:,\d+)*$) regex matches:
, - a comma
(\d+) - (Group 1, referred to via $1 from the replacement pattern) one or more digits
(?=(?:,\d+)*$) - a positive lookahead that requires 0+ sequences of , + one or more digits at the end of the string.

Regexp match spaces not followed be a specific word

I have spent the last couple of hours trying to figure out how to match all whitespace (\s) unless followed by AND\s or preceded by \sAND.
I have this so far
\s(?!AND\s)
but it is then matching the space after \sAND, but I don't want that.
Any help would be appreciated.
Often, when you want to split by a single character that appears in specific context, you can replace the approach with a matching one.
I suggest matching all sequences of non-whitespace characters joined with AND enclosed with whitespace ones before and then match any other non-whitespace sequences. Thus, we'll ensure we get an array of necessary substrings:
\S+\sAND\s\S+|\S+
See regex demo
I assume the \sAND\s pattern appears between some non-whitespace characters.
var re = /\S+\sAND\s\S+|\S+/g;
var str = 'split this but don\'t split this AND this';
var res = str.match(re);
document.write(JSON.stringify(res));
As Alan Moore suggests, the alternation can be unrolled into \S+(?:\sAND\s\S+)*:
\S+ - 1 or more non-whitespace characters
(?:\sAND\s\S+)* - 0 or more (thus, it is optional) sequences of...
\s - one whitespace (add + to match 1 or more)
AND - literal AND character sequence
\s - one whitespace (add + to match 1 or more)
\S+ - one or more non-whitespace symbols.
Since JS doesn't support lookbehinds, you can use the following trick:
Match (\sAND\s)|\s
Throw away any match where $1 has a value
Here's a short example which replaces the spaces you want with an underscore:
var str = "split this but don't split this AND this";
str = str.replace(/(\sAND\s)|\s/g, function(m, a) {
return a ? m : "_";
});
document.write(str);

Regex replace punctuation with whitespace

I have a word counter function but it doesn't account for people using poor punctuation, for example:
"hello.world"
That would only count is as 1 word.
Instead it should count that as 2 words.
So instead I need a regex to replace comma's, full stops and any whitespace that is 1+ with a single whitespace.
Here's what I have so far:
proWords = proWords.replace(/[,\s]/, '\s');
negWords = negWords.replace(/[,\s]/, '\s');
The replacement is just an ordinary string, it shouldn't contain regular expression escape sequences like \s.
proWords = proWords.replace(/[,.\s]+/g, ' ');
The + regular expression makes it replace any sequence of the characters, and you need the g modifier to replace multiple times.
Change
proWords = proWords.replace(/[,\s]/, '\s');
negWords = negWords.replace(/[,\s]/, '\s');
to
proWords = proWords.replace(/[,\.\s]/, ' ');
negWords = negWords.replace(/[,\.\s]/, ' ');
This should work.

regex string replace

I am trying to do a basic string replace using a regex expression, but the answers I have found do not seem to help - they are directly answering each persons unique requirement with little or no explanation.
I am using str = str.replace(/[^a-z0-9+]/g, ''); at the moment. But what I would like to do is allow all alphanumeric characters (a-z and 0-9) and also the '-' character.
Could you please answer this and explain how you concatenate expressions.
This should work :
str = str.replace(/[^a-z0-9-]/g, '');
Everything between the indicates what your are looking for
/ is here to delimit your pattern so you have one to start and one to end
[] indicates the pattern your are looking for on one specific character
^ indicates that you want every character NOT corresponding to what follows
a-z matches any character between 'a' and 'z' included
0-9 matches any digit between '0' and '9' included (meaning any digit)
- the '-' character
g at the end is a special parameter saying that you do not want you regex to stop on the first character matching your pattern but to continue on the whole string
Then your expression is delimited by / before and after.
So here you say "every character not being a letter, a digit or a '-' will be removed from the string".
Just change + to -:
str = str.replace(/[^a-z0-9-]/g, "");
You can read it as:
[^ ]: match NOT from the set
[^a-z0-9-]: match if not a-z, 0-9 or -
/ /g: do global match
More information:
https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Regular_Expressions
Your character class (the part in the square brackets) is saying that you want to match anything except 0-9 and a-z and +. You aren't explicit about how many a-z or 0-9 you want to match, but I assume the + means you want to replace strings of at least one alphanumeric character. It should read instead:
str = str.replace(/[^-a-z0-9]+/g, "");
Also, if you need to match upper-case letters along with lower case, you should use:
str = str.replace(/[^-a-zA-Z0-9]+/g, "");
str = str.replace(/\W/g, "");
This will be a shorter form
We can use /[a-zA-Z]/g to select small letter and caps letter sting in the word or sentence and replace.
var str = 'MM-DD-yyyy'
var modifiedStr = str.replace(/[a-zA-Z]/g, '_')
console.log(modifiedStr)

Categories