Javascript string replace with regex to strip off illegal characters - javascript

Need a function to strip off a set of illegal character in javascript: |&;$%#"<>()+,
This is a classic problem to be solved with regexes, which means now I have 2 problems.
This is what I've got so far:
var cleanString = dirtyString.replace(/\|&;\$%#"<>\(\)\+,/g, "");
I am escaping the regex special chars with a backslash but I am having a hard time trying to understand what's going on.
If I try with single literals in isolation most of them seem to work, but once I put them together in the same regex depending on the order the replace is broken.
i.e. this won't work --> dirtyString.replace(/\|<>/g, ""):
Help appreciated!

What you need are character classes. In that, you've only to worry about the ], \ and - characters (and ^ if you're placing it straight after the beginning of the character class "[" ).
Syntax: [characters] where characters is a list with characters.
Example:
var cleanString = dirtyString.replace(/[|&;$%#"<>()+,]/g, "");

I tend to look at it from the inverse perspective which may be what you intended:
What characters do I want to allow?
This is because there could be lots of characters that make in into a string somehow that blow stuff up that you wouldn't expect.
For example this one only allows for letters and numbers removing groups of invalid characters replacing them with a hypen:
"This¢£«±Ÿ÷could&*()\/<>be!##$%^bad".replace(/([^a-z0-9]+)/gi, '-');
//Result: "This-could-be-bad"

You need to wrap them all in a character class. The current version means replace this sequence of characters with an empty string. When wrapped in square brackets it means replace any of these characters with an empty string.
var cleanString = dirtyString.replace(/[\|&;\$%#"<>\(\)\+,]/g, "");

Put them in brackets []:
var cleanString = dirtyString.replace(/[\|&;\$%#"<>\(\)\+,]/g, "");

Related

Backslash bug in JavaScript

I have a string that involves tricky \\ characters.
Below is the initial code, and what I am literally trying to achieve but it is not working. I have to replace the \" characters but I think that is where the bug is.
var current = csvArray[0][i].Replace("\"", "");
I have tried the variation below but it is still not working.
var current = csvArray[0][i].Replace('\"', '');
It is currently throwing an Uncaught TypeError: csvArray[0][i].Replace is not a function
Is there a way for Javascript to take my string ("\"") literally like in C#? Kindly help me investigate. Thanks!
If the sequence you want to match is a single backslash character followed by a quotation mark, then you need to escape the backslash itself because backslashes have special meaning in string literals. You then need to separately escape the quotation mark with its own backslash:
.replace("\\\"", "")
I believe that would also be true in C#.
Or you can simplify it by using single quotes around the string so that only the backslash needs to be escaped:
.replace('\\"', '')
If the first argument to .replace() is a string, however, it will only replace the first occurrence. To do a global replace you have to use a regular expression with the g flag, noting that backslashes need to be escaped in regular expressions too:
.replace(/\\"/g, '')
I'm not going to setup a demo array to exactly match your code, but here's a simple demo where you can see that a lone backslash or quote in the input string are not replaced, but all backslash-quote combinations are replaced:
var input = 'Some\\ test" \\" text \\" for demo \\"'
var output = input.replace(/\\"/g, '')
console.log(input)
console.log(output)

Regex with numerics and dashes only

I am facing an issue in JavaScript form validation. I have to store number in this format 1-74347064527
I have tried these regular expressions but not worked properly:
var srNo =/^[-0-9]*$/;
var srNo = /^[0-9]+(-[0-9]+)+$/;
var srNo=/^([0-9]+-)*[0-9]+$/;
Suggest some regex for this.
Kind regards.
This should work unless you have additional constraints:
var srNo = /^\d+-\d+$/;
If you prefer the [0-9] syntax:
var srNo = /^[0-9]+-[0-9]+$/;
When in a character class ([ ]), dash ( - ) has a special meaning in regular expressions - it means "range", eg. a-z means from 'a' to 'z'. You're not escaping it, so your RegExps are not even correct (at least not in every language).
Update: It appears, that this syntax is correct when dash is not surrounded by other characters (when it's placed at the beginning or end of the character class). Sorry for confusion.
Try this instead:
/^\d\-\d+$/
It matches strings that begin with one digit, followed by a dash, and then by one or more digits.
var regex = /^\d{1}-?\d{11}$/g
window.alert(regex.test('1-74347064527'));

Javascript Regex remove specialchars except - and æøå

I got this:
var stringToReplace = 'æøasdasd\89-asdasd sse';
var desired = stringToReplace.replace(/[^\w\s]/gi, '');
alert(desired);
I found the replace rule from another SO question.
This works fine, it gives output:
asdasd89asdasd sse
Although I would like to set up additional rules:
Keep æøå characters
Keep - character
Turn whitespace/space into a - character
So the output would be:
æøåasdasd89-asdasd-sse
I know I can run an extra line:
stringtoReplace.replace(' ', '-');
to accomplish my 3) goal - but I dont know what to do with the 1 and 2), since I am not into regex expressions ?
This should work:
str = str.replace(/[^æøå\w -]+/g, '').replace(/ +/g, '-');
Live Demo: http://ideone.com/d60qrX
You can just add the special characters to the exclusion list.
/[^\w\sæøå-]/gi
Fiddle with example here.
And as you said - you can use another replace to replace spaces with dashes
Your original regex [^\w\s] targets any character which isn't a word or whitespace character (-, for example). To include other characters in this regex's 'whitelist', simply add them to that character group:
stringToReplace.replace(/[^\w\sæøå-]/gi, '');
As for replacing spaces with hyphens, you cannot do both in a single regex. You can, however, use a string replacement afterwards to solve that.
stringToReplace.replace(" ","-");

Regex to replace string with word and characters

I've got three working regexp's,
string.replace(\catalogue\g, "") // replace a the word catalogue
string.replace(/[/:]/g, "") // replace the characters /, :
string.replace(\20%\g, "") // replace '20%'
Instead of replacing the string three times, I want to combine my regexp's.
Wanted result = 'removethewordnow';
var string = 'rem:ove20%the/word:catalogue20%now';
My latest try was:
string.replace(/\catalogue\b|[/20%:]/g, ""); // works, but catalouge is unaffected and 20% isn't combined as a word
Off the top of my head:
string.replace(/(catalogue|[\/:]|20%)/g,"");
Just use an alternative, i.e. separate each of the regular expressions you had before by the alternation operator |:
catalogue|20%|[/:]
Also note that you cannot just combine character classes and literal strings in the way you have done there. Above naïve combination works and everything beyond that might be optimisation (even if it can't be optimised further in this case) – but that only works if you don't change the language described by the regex.
You seem to be having a typo there (\c), also you don't want 20% inside the character class (and you should escape the slash). You also need to remove the word boundaries if you want to allow catalogue20% to match - there is no boundary between catalogue and 20, therefore the \b fails:
string.replace(/catalogue|20%|[\/:]/g, "");
var string = 'rem:ove20%the/word:catalogue20%now';
string.replace(/([:/]|20%|catalogue)/g, '');
\b refers to a word boundary, but your word catalogue is mixed with other words. So your regex should be:
string.replace(/catalogue|[\/20%:]/g, "");
Also do escape the / with \/.
string.replace(/catalogue|20%|[/:]/g, '')

regex and javascript

using http://www.regular-expressions.info/javascriptexample.html I tested the following regex
^\\{1}([0-9])+
this is designed to match a backslash and then a number.
It works there
If I then try this directly in code
var reg = /^\\{1}([0-9])+/;
reg.exec("/123")
I get no matches!
What am I doing wrong?
Update:
Regarding the update of your question. Then the regex has to be:
var reg = /^\/(\d+)/;
You have to escape the slash inside the regex with \/.
The backslash needs to be escaped in the string too:
reg.exec("\\123")
Otherwise \1 will be treated as special character.
Btw, the regular expression can be simplified:
var reg = /^\\(\d+)/;
Note that I moved the quantifier + inside the capture group, otherwise it will only capture a single digit (namely 3) and not the whole number 123.
You need to escape the backslash in your string:
"\\123"
Also, for various implementation bugs, you may want to set reg.lastIndex = 0;.
In addition, {1} is completely redundant, you can simplify your regex to /^\\(\d)+/.
One last note: (\d)+ will only capture the last digit, you may want (\d+).

Categories