Capturing consecutive non-alphanumeric characters in a string in a single group - javascript

I want to replace all special characters in a string with dashes. I use the following regex to replace the characters.
var x = "Querty(&)keypad";
alert(x.replace(/[^A-Za-z0-9]/g, "-"));
However, this causes each character to be replaced by a dash, rather than replacing consecutive characters with a single dash. This examples gives me the output Querty---keypad. My desired output is Querty-keypad.
You can see the issue in this jsfiddle.

Use + to match 1 or more repetitions:
> "Querty(&)keypad".replace(/[^A-Za-z0-9]+/g, "-")
"Querty-keypad"

Related

How to remove characters in a string that matches a RegEx while preserving spaces?

I am trying to remove characters from a string so that it will match this RegEx: ^[-a-zA-Z0-9._:,]+$. For example:
const input = "test // hello". The expected output would be "test hello". I tried the following:
input.replace(/^[-a-zA-Z0-9._:,]+$/g, "")
But this does not seem to work
The example output "hello world" that you give does not match your regex, because the regex does not allow spaces. Assuming you want to keep spaces, use
input.replace(/[^-a-zA-Z0-9._:, ]/g, "")
The negation character ^ must be inside the [...]. The + is not needed, because /g already ensures that all matching characters are replaced (that is, removed).
If you also want to condense consecutive spaces into a single space (as implied by your example), use
input.replace(/[^-a-zA-Z0-9._:, ]/g, "").replace(/\s+/g, " ")
I like to use the following canonical approach:
var input = "test // hello";
var output = input.replace(/\s*[^-a-zA-Z0-9._:, ]+\s*/g, " ").trim()
console.log(output);
The logic here is to target all unwanted characters and their surrounding whitespace. We replace with just a single space. Then we do a trim at the end in case there might be an extra leading/trailing space.

Regex remove all leading and trailing special characters?

Let's say I have the following string in javascript:
&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&
I want to remove all the leading and trailing special characters (anything which is not alphanumeric or alphabet in another language) from all the words.
So the string should look like
a.b.c a.b.c a.b.c a.b.c a.b&.c a.b.&&dc ê.b..c
Notice how the special characters in between the alphanumeric is left behind. The last ê is also left behind.
This regex should do what you want. It looks for
start of line, or some spaces (^| +) captured in group 1
some number of symbol characters [!-\/:-#\[-``\{-~]*
a minimal number of non-space characters ([^ ]*?) captured in group 2
some number of symbol characters [!-\/:-#\[-``\{-~]*
followed by a space or end-of-line (using a positive lookahead) (?=\s|$)
Matches are replaced with just groups 1 and 2 (the spacing and the characters between the symbols).
let str = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*([^ ]*?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
Note that if you want to preserve a string of punctuation characters on their own (e.g. as in Apple & Sauce), you should change the second capture group to insist on there being one or more non-space characters (([^ ]+?)) instead of none and add a lookahead after the initial match of punctuation characters to assert that the next character is not punctuation:
let str = 'Apple &&& Sauce; -This + !That!';
str = str.replace(/(^| +)[!-\/:-#\[-`\{-~]*(?![!-\/:-#\[-`\{-~])([^ ]+?)[!-\/:-#\[-`\{-~]*(?=\s|$)/gi, '$1$2');
console.log(str);
a-zA-Z\u00C0-\u017F is used to capture all valid characters, including diacritics.
The following is a single regular expression to capture each individual word. The logic is that it will look for the first valid character as the beginning of the capture group, and then the last sequence of invalid characters before a space character or string terminator as the end of the capture group.
const myRegEx = /[^a-zA-Z\u00C0-\u017F]*([a-zA-Z\u00C0-\u017F].*?[a-zA-Z\u00C0-\u017F]*)[^a-zA-Z\u00C0-\u017F]*?(\s|$)/g;
let myString = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&'.replace(myRegEx, '$1$2');
console.log(myString);
Something like this might help:
const string = '&a.b.c. &a.b.c& .&a.b.c.&. *;a.b.c&*. a.b&.c& .&a.b.&&dc.& &ê.b..c&';
const result = string.split(' ').map(s => /^[^a-zA-Z0-9ê]*([\w\W]*?)[^a-zA-Z0-9ê]*$/g.exec(s)[1]).join(' ');
console.log(result);
Note that this is not one single regex, but uses JS help code.
Rough explanation: We first split the string into an array of strings, divided by spaces. We then transform each of the substrings by stripping
the leading and trailing special characters. We do this by capturing all special characters with [^a-zA-Z0-9ê]*, because of the leading ^ character it matches all characters except those listed, so all special characters. Between these two groups we capture all relevant characters with ([\w\W]*?). \w catches words, \W catches non-words, so \w\W catches all possible characters. By appending the ? after the *, we make the quantifier * lazy, so that the group stops catching as soon as the next group, which catches trailing special characters, catches something. We also start the regex with a ^ symbol and end it with an $ symbol to capture the entire string (they respectively set anchors to the start end the end of the string). With .exec(s)[1] we then execute the regex on the substring and return the first capturing group result in our transform function. Note that this might be null if a substring does not include proper characters. At the end we join the substrings with spaces.

Regex keeps finding character I want matched along with previous character

I have the following regex in javascript for a split operation since I can't do a negative look behind to find any delimiters , in a string that is not proceeded by one or more escape characters of \.
[^\\],
The regex works fine for finding where the commas not proceeded by \ are, but also finds the character that proceeds the comma as a match and thus splits the string incorrectly.
For example if I had the string
hello\,there,are
The result would be that e, matches my regex and not just ,. Making the split string array read
[hello\,ther] [are]
Why does the regex I am using keep finding the comma and the proceeding character instead of only matching the comma?
You cannot use split here because you'd need a lookbehind that JS regex does not support. Use a match with appropriate regex. Like the one below:
/(?:[^\\,]|\\.)+/g
See the regex demo.
The pattern matches 1 or more (+) sequences of any char other than , and \ ([^\\,]) or (|) any escaped character (excluding linebreak chars) with \\.
JS demo:
var regex = /(?:[^\\,]|\\.)+/g;
var str = "hello\\,there,are";
var res = str.match(regex);
console.log(res);

Regex to match # followed by square brackets containing a number

I want to parse a pattern similar to this using javascript:
#[10] or #[15]
With all my efforts, I came up with this:
#\\[(.*?)\\]
This pattern works fine but the problem is it matches anything b/w those square brackets. I want it to match only numbers. I tried these too:
#\\[(0-9)+\\]
and
#\\[([(0-9)+])\\]
But these match nothing.
Also, I want to match only pattern which are complete words and not part of a word in the string. i.e. should contain spaces both side if its not starting or ending the script. That means it should not match phrase like this:
abxdcs#[13]fsfs
Thanks in advance.
Use the regex:
/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
It will match if the pattern (#[number]) is not a part of a word. Should contain spaces both sides if its not starting or ending the string.
It uses groups, so if need the digits, use the group 1.
Testing code (click here for demo):
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("#[10]")); // true
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("#[15]")); // true
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("abxdcs#[13]fsfs")); // false
console.log(/(?:^|\s)#\[([0-9]+)\](?=$|\s)/g.test("abxdcs #[13] fsfs")); // true
var r1 = /(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
var match = r1.exec("#[10]");
console.log(match[1]); // 10
var r2 = /(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
var match2 = r2.exec("abxdcs #[13] fsfs");
console.log(match2[1]); // 13
var r3 = /(?:^|\s)#\[([0-9]+)\](?=$|\s)/g
var match3;
while (match3 = r3.exec("#[111] #[222]")) {
console.log(match3[1]);
}
// while's output:
// 111
// 222
You were close, but you need to use square brackets:
#\[[0-9]+\]
Or, a shorter version:
#\[\d+\]
The reason you need those slashes is to "escape" the square bracket. Usually they are used for denoting a "character class".
[0-9] creates a character class which matches exactly one digit in the range of 0 to 9. Adding the + changes the meaning to "one or more". \d is just shorthand for [0-9].
Of course, the backslash character is also used to escape characters inside of a javascript string, which is why you must escape them. So:
javascript
"#\\[\\d+\\]"
turns into:
regex
#\[\d+\]
which is used to match:
# a literal "#" symbol
\[ a literal "[" symbol
\d+ one or more digits (nearly identical to [0-9]+)
\] a literal "]" symbol
I say that \d is nearly identical to [0-9] because, in some regex flavors (including .NET), \d will actually match numeric digits from other cultures in addition to 0-9.
You don't need so many characters inside the character class. More importantly, you put the + in the wrong place. Try this: #\\[([0-9]+)\\].

regex string replace

I am trying to do a basic string replace using a regex expression, but the answers I have found do not seem to help - they are directly answering each persons unique requirement with little or no explanation.
I am using str = str.replace(/[^a-z0-9+]/g, ''); at the moment. But what I would like to do is allow all alphanumeric characters (a-z and 0-9) and also the '-' character.
Could you please answer this and explain how you concatenate expressions.
This should work :
str = str.replace(/[^a-z0-9-]/g, '');
Everything between the indicates what your are looking for
/ is here to delimit your pattern so you have one to start and one to end
[] indicates the pattern your are looking for on one specific character
^ indicates that you want every character NOT corresponding to what follows
a-z matches any character between 'a' and 'z' included
0-9 matches any digit between '0' and '9' included (meaning any digit)
- the '-' character
g at the end is a special parameter saying that you do not want you regex to stop on the first character matching your pattern but to continue on the whole string
Then your expression is delimited by / before and after.
So here you say "every character not being a letter, a digit or a '-' will be removed from the string".
Just change + to -:
str = str.replace(/[^a-z0-9-]/g, "");
You can read it as:
[^ ]: match NOT from the set
[^a-z0-9-]: match if not a-z, 0-9 or -
/ /g: do global match
More information:
https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Regular_Expressions
Your character class (the part in the square brackets) is saying that you want to match anything except 0-9 and a-z and +. You aren't explicit about how many a-z or 0-9 you want to match, but I assume the + means you want to replace strings of at least one alphanumeric character. It should read instead:
str = str.replace(/[^-a-z0-9]+/g, "");
Also, if you need to match upper-case letters along with lower case, you should use:
str = str.replace(/[^-a-zA-Z0-9]+/g, "");
str = str.replace(/\W/g, "");
This will be a shorter form
We can use /[a-zA-Z]/g to select small letter and caps letter sting in the word or sentence and replace.
var str = 'MM-DD-yyyy'
var modifiedStr = str.replace(/[a-zA-Z]/g, '_')
console.log(modifiedStr)

Categories