Regex to replace string with word and characters - javascript

I've got three working regexp's,
string.replace(\catalogue\g, "") // replace a the word catalogue
string.replace(/[/:]/g, "") // replace the characters /, :
string.replace(\20%\g, "") // replace '20%'
Instead of replacing the string three times, I want to combine my regexp's.
Wanted result = 'removethewordnow';
var string = 'rem:ove20%the/word:catalogue20%now';
My latest try was:
string.replace(/\catalogue\b|[/20%:]/g, ""); // works, but catalouge is unaffected and 20% isn't combined as a word

Off the top of my head:
string.replace(/(catalogue|[\/:]|20%)/g,"");

Just use an alternative, i.e. separate each of the regular expressions you had before by the alternation operator |:
catalogue|20%|[/:]
Also note that you cannot just combine character classes and literal strings in the way you have done there. Above naïve combination works and everything beyond that might be optimisation (even if it can't be optimised further in this case) – but that only works if you don't change the language described by the regex.

You seem to be having a typo there (\c), also you don't want 20% inside the character class (and you should escape the slash). You also need to remove the word boundaries if you want to allow catalogue20% to match - there is no boundary between catalogue and 20, therefore the \b fails:
string.replace(/catalogue|20%|[\/:]/g, "");

var string = 'rem:ove20%the/word:catalogue20%now';
string.replace(/([:/]|20%|catalogue)/g, '');

\b refers to a word boundary, but your word catalogue is mixed with other words. So your regex should be:
string.replace(/catalogue|[\/20%:]/g, "");
Also do escape the / with \/.

string.replace(/catalogue|20%|[/:]/g, '')

Related

Regex replace not removing characters properly

I have the regular expression:
const regex = /^\d*\.?\d{0,2}$/
and its inverse (I believe) of
const inverse = /^(?!\d*\.?\d{0,2}$)/
The first regex is validating the string fits any positive number, allowing a decimal and two decimal digits (e.g. 150, 14., 7.4, 12.68). The second regex is the inverse of the first, and doing some testing I'm fairly confident it's giving the expected result, as it only validates when the string is anything but a number that may have a decimal and two digits after (e.g. 12..05, a5, 54.357).
My goal is to remove any characters from the string that do not fit the first regex. I thought I could do that this way:
let myString = '123M.45';
let fixed = myString.replace(inverse, '');
But this does not work as intended. To debug, I tried having the replace character changed to something I would be able to see:
let fixed = myString.replace(inverse, 'ZZZ');
When I do this, fixed becomes: ZZZ123M.45
Any help would be greatly appreciated.
I think I understand your logic here trying to find a regex that is the inverse of the regex that matches your valid string, in the hopes that it will allow you to remove any characters that make your string invalid and leave only the valid string. However, I don't think replace() will allow you to solve your problem in this way. From the MDN docs:
The replace() method returns a new string with some or all matches of a pattern replaced by a replacement.
In your inverse pattern you are using a negative lookahead. If we take a simple example of X(?!Y) we can think of this as "match X if not followed by Y". In your pattern your "X" is ^ and your "Y" is \d*\.?\d{0,2}$. From my understanding, the reason you are getting ZZZ123M.45 is that it is finding the first ^ (i.e, the start of the string) that is not followed by your pattern \d*\.?\d{0,2}$, and since 123M.45 doesn't match your "Y" pattern, your negative lookahead is satisfied and the beginning of your string is matched and "replaced" with ZZZ.
That (I think) is an explanation of what you are seeing.
I would propose an alternative solution to your problem that better fits with how I understand the .replace() method. Instead of your inverse pattern, try this one:
const invalidChars = /[^\d\.]|\.(?=\.)|(?<=\.\d\d)\d*/g
const myString = '123M..456444';
const fixed = myString.replace(invalidChars, '');
Here I am using a pattern that I think will match the individual characters that you want to remove. Let's break down what this one is doing:
[^\d\.]: match characters that are not digits
\.(?=\.): match . character if it is followed by another . character.
(?<=\.\d\d)\d*: match digits that are preceded by a decimal and 2 digits
Then I join all these with ORs (|) so it will match any one of the above patterns, and I use the g flag so that it will replace all the matches, not just the first one.
I am not sure if this will cover all your use cases, but I thought I would give it a shot. Here's a link to a breakdown that might be more helpful than mine, and you can use this tool to tweak the pattern if necessary.
I don't think you can do this
remove any characters from the string that do not fit the first regex
Because regex matching is meant for the entire string, and replace is used to replace just a PART inside that string. So the Regex inside replace must be a Regex to match unwanted characters only, not inverted Regex.
What you could do is to validate the string with your original regex, then if it's not valid, replace and validate again.
//if (notValid), replace unwanted character
// replace everything that's not a dot or digit
const replaceRegex = /[^\d.]/g; // notice g flag here to match every occurrence
const myString = '123M.45';
const fixed = myString.replace(replaceRegex, '');
console.log(fixed)
// validate again

Javascript Regex remove specialchars except - and æøå

I got this:
var stringToReplace = 'æøasdasd\89-asdasd sse';
var desired = stringToReplace.replace(/[^\w\s]/gi, '');
alert(desired);
I found the replace rule from another SO question.
This works fine, it gives output:
asdasd89asdasd sse
Although I would like to set up additional rules:
Keep æøå characters
Keep - character
Turn whitespace/space into a - character
So the output would be:
æøåasdasd89-asdasd-sse
I know I can run an extra line:
stringtoReplace.replace(' ', '-');
to accomplish my 3) goal - but I dont know what to do with the 1 and 2), since I am not into regex expressions ?
This should work:
str = str.replace(/[^æøå\w -]+/g, '').replace(/ +/g, '-');
Live Demo: http://ideone.com/d60qrX
You can just add the special characters to the exclusion list.
/[^\w\sæøå-]/gi
Fiddle with example here.
And as you said - you can use another replace to replace spaces with dashes
Your original regex [^\w\s] targets any character which isn't a word or whitespace character (-, for example). To include other characters in this regex's 'whitelist', simply add them to that character group:
stringToReplace.replace(/[^\w\sæøå-]/gi, '');
As for replacing spaces with hyphens, you cannot do both in a single regex. You can, however, use a string replacement afterwards to solve that.
stringToReplace.replace(" ","-");

Creating javascript regex tp replace characters using whitelist

I'm trying to create a regex which will replace all the characters which are not in the specified white list (letters,digits,whitespaces, brackets, question mark and explanation mark)
This is the code :
var regEx = /^[^(\s|\w|\d|()|?|!|<br>)]*?$/;
qstr += tempStr.replace(regEx, '');
What is wrong with it ?
Thank you
The anchors are wrong - they only allow the regex to match the entire string
The lazy quantifier is wrong - you wouldn't want the regex to match 0 characters (if you have removed the anchors)
The parentheses and pipe characters are wrong - you don't need them in a character class.
The <br> is wrong - you can't match specific substrings in a character class.
The \d is superfluous since it's already contained in \w (thanks Alex K.!)
You're missing the global modifier to make sure you can do more than one replace.
You should be using + instead of * in order not to replace lots of empty strings with themselves.
Try
var regEx = /[^\s\w()?!]+/g;
and handle the <br>s independently (before that regex is applied, or the brackets will be removed).
You'll want to use the g (global) modifier:
var regEx = /^[^(\s|\w|\d|()|?|!|<br>)]*?$/g; // <-- `g` goes there
qstr += tempStr.replace(regEx, '');
This allows your expression to match multiple times.

regex for comma followed by space or just comma

is it possible to make a regex with multiple delimiters? For example I want to split a string which can come in two forms: 1. "string1, string2, string3" or 2. "string1,string2,string3". I've been trying to do this in javascript but with no success so far.
Just use a regex split():
var string = "part1,part2, part3, part4, part5",
components = string.split(/,\s*/);
JS Fiddle demo.
The reason I've used * rather than ? is simply because it allows for no white-space or many white-spaces. Whereas the ? matches zero-or-one white-space (which is exactly what you asked, but even so).
Incidentally, if there might possibly be white-spaces preceding the comma, then it might be worth amending the split() regex to:
var string = "part1,part2 , part3, part4, part5",
components = string.split(/\s*,\s*/);
console.log(components);​
JS Fiddle demo.
Which splits the supplied string on zero-or-more whitespace followed by a comma followed by zero-or-more white-space. This may, of course, be entirely unnecessary.
References:
Regular Expressions.
string.split().
Yes, make the whitespace (\s) optional using ?:
var s = "string1,string2,string3";
s.split(/,\s?/);
In addition to silva
just in case you have doubt it can have more than one space then use (or no space)
var s = "string1, string2, string3";
s.split(/,\s*/);

Javascript string replace with regex to strip off illegal characters

Need a function to strip off a set of illegal character in javascript: |&;$%#"<>()+,
This is a classic problem to be solved with regexes, which means now I have 2 problems.
This is what I've got so far:
var cleanString = dirtyString.replace(/\|&;\$%#"<>\(\)\+,/g, "");
I am escaping the regex special chars with a backslash but I am having a hard time trying to understand what's going on.
If I try with single literals in isolation most of them seem to work, but once I put them together in the same regex depending on the order the replace is broken.
i.e. this won't work --> dirtyString.replace(/\|<>/g, ""):
Help appreciated!
What you need are character classes. In that, you've only to worry about the ], \ and - characters (and ^ if you're placing it straight after the beginning of the character class "[" ).
Syntax: [characters] where characters is a list with characters.
Example:
var cleanString = dirtyString.replace(/[|&;$%#"<>()+,]/g, "");
I tend to look at it from the inverse perspective which may be what you intended:
What characters do I want to allow?
This is because there could be lots of characters that make in into a string somehow that blow stuff up that you wouldn't expect.
For example this one only allows for letters and numbers removing groups of invalid characters replacing them with a hypen:
"This¢£«±Ÿ÷could&*()\/<>be!##$%^bad".replace(/([^a-z0-9]+)/gi, '-');
//Result: "This-could-be-bad"
You need to wrap them all in a character class. The current version means replace this sequence of characters with an empty string. When wrapped in square brackets it means replace any of these characters with an empty string.
var cleanString = dirtyString.replace(/[\|&;\$%#"<>\(\)\+,]/g, "");
Put them in brackets []:
var cleanString = dirtyString.replace(/[\|&;\$%#"<>\(\)\+,]/g, "");

Categories