Creating javascript regex tp replace characters using whitelist - javascript

I'm trying to create a regex which will replace all the characters which are not in the specified white list (letters,digits,whitespaces, brackets, question mark and explanation mark)
This is the code :
var regEx = /^[^(\s|\w|\d|()|?|!|<br>)]*?$/;
qstr += tempStr.replace(regEx, '');
What is wrong with it ?
Thank you

The anchors are wrong - they only allow the regex to match the entire string
The lazy quantifier is wrong - you wouldn't want the regex to match 0 characters (if you have removed the anchors)
The parentheses and pipe characters are wrong - you don't need them in a character class.
The <br> is wrong - you can't match specific substrings in a character class.
The \d is superfluous since it's already contained in \w (thanks Alex K.!)
You're missing the global modifier to make sure you can do more than one replace.
You should be using + instead of * in order not to replace lots of empty strings with themselves.
Try
var regEx = /[^\s\w()?!]+/g;
and handle the <br>s independently (before that regex is applied, or the brackets will be removed).

You'll want to use the g (global) modifier:
var regEx = /^[^(\s|\w|\d|()|?|!|<br>)]*?$/g; // <-- `g` goes there
qstr += tempStr.replace(regEx, '');
This allows your expression to match multiple times.

Related

javascript find protocol, domain, plus first slash with regexp from a src tag, replace with empty string

I tried to construct a regex for this task but I'm afraid I am still failing to have an intuitive understanding of regexp.
The problem is the regex matches until the last slash in a string. I want it to stop at the first match of the string.
My pathetic attempt at regex:
/^http(s?):\/\/.+\/{1}/
Test subject:
http://foo.com/bar/test/foo.jpeg
The goal is to obtain bar/test/foo.jpeg, so that I may then split the string, pop the last element and then join the remainder, resulting in having the path to the JavaScript file.
Example
var str = 'http://foo.com/bar/test/foo.jpeg';
str.replace(regexp,'');
While the other answer shows how to match a part of a string, I think a replace solution is more appropriate for the current task.
The issue you have is that .+ matches one or more characters other than a newline greedily, that is, all the string is grabbed first in one go, and then the regex engine starts backtracking (moving backwards along the input string looking for a / to accommodate in the match). Thus, you get the match from http until the last /.
To restrict the match from http to the first / use a negated character class [^/]+ instead of .+.
^https?:\/\/[^\/]+\/
^^^^^^
See the regex demo
Note that you do not need to place s into a capturing group to make it optional, unescaped ? is a quantifier that makes the preceding character match one or zero times. Also, {1} is a redundant quantifier since this is default behavior, c will only match 1 c, (?:something) will only match one something.
var re = /^https?:\/\/[^\/]+\//;
var str = 'http://foo.com/bar/test/foo.jpeg';
var result = str.replace(re, '');
document.getElementById("r").innerHTML = result;
<div id="r"/>
Note that you will need to assign the replace result to some variable, since in JS, strings are immutable.
Regex explanation:
^ - start of string
https? - either http or https substring
:\/\/ - a literal sequence of ://
[^\/]+ - 1 or more characters other than a /
\/ - a literal / symbol
Use capturing group based regex.
> var s = "http://foo.com/bar/test/foo.jpeg"
> s.match(/^https?:\/\/[^\/]+((?:\/[^\/]*)*)/)[1]
'/bar/test/foo.jpeg'

Regex-Groups in Javascript

I have a problem using a Javascript-Regexp.
This is a very simplified regexp, which demonstrates my Problem:
(?:\s(\+\d\w*))|(\w+)
This regex should only match strings, that doesn't contain forbidden characters (everything that is no word-character).
The only exception is the Symbol +
A match is allowed to start with this symbol, if [0-9] is trailing.
And a + must not appear within words (44+44 is not a valid match, but +4ad is)
In order to allow the + only at the beginning, I said that there must be a whitespace preceding. However, I don't want the whitespace to be part of the match.
I tested my regex with this tool: http://regex101.com/#javascript and the resultig matches look fine.
There are 2 Issues with that regexp:
If I use it in my JS-Code, the space is always part of the match
If +42 appears at the beginning of a line, it won't be matched
My Questions:
How should the regex look like?
Why does this regex add the space to the matches?
Here's my JS-Code:
var input = "+5ad6 +5ad6 sd asd+as +we";
var regexp = /(?:\s(\+\d\w*))|(\w+)/g;
var tokens = input.match(regexp);
console.log(tokens);
How should the regex look like?
You've got multiple choices to reach your goal:
It's fine as you have it. You might allow the string beginning in place of the whitespace as well, though. Just get the capturing groups (tokens[1], tokens[2]) out of it, which will not include the whitespace.
If you didn't use JavaScript, a lookbehind could help. Unfortunately it's not supported.
Require a non-word-boundary before the +, which would make every \w character before the + prevent the match:
/\B\+\d\w+|\w+/
Why does this regex add the space to the matches?
Because the regex does match the whitespace. It does not add the \s(\+\d\w+) to the captured groups, though.

Javascript Regex remove specialchars except - and æøå

I got this:
var stringToReplace = 'æøasdasd\89-asdasd sse';
var desired = stringToReplace.replace(/[^\w\s]/gi, '');
alert(desired);
I found the replace rule from another SO question.
This works fine, it gives output:
asdasd89asdasd sse
Although I would like to set up additional rules:
Keep æøå characters
Keep - character
Turn whitespace/space into a - character
So the output would be:
æøåasdasd89-asdasd-sse
I know I can run an extra line:
stringtoReplace.replace(' ', '-');
to accomplish my 3) goal - but I dont know what to do with the 1 and 2), since I am not into regex expressions ?
This should work:
str = str.replace(/[^æøå\w -]+/g, '').replace(/ +/g, '-');
Live Demo: http://ideone.com/d60qrX
You can just add the special characters to the exclusion list.
/[^\w\sæøå-]/gi
Fiddle with example here.
And as you said - you can use another replace to replace spaces with dashes
Your original regex [^\w\s] targets any character which isn't a word or whitespace character (-, for example). To include other characters in this regex's 'whitelist', simply add them to that character group:
stringToReplace.replace(/[^\w\sæøå-]/gi, '');
As for replacing spaces with hyphens, you cannot do both in a single regex. You can, however, use a string replacement afterwards to solve that.
stringToReplace.replace(" ","-");

Javascript Regex I'm struggling with

I am admittedly NOT a regex person but usually I can figure my way around something. This one has me stumped...
I need to match and replace a double greater than str (>>) that has an optional leading space. I know this doesn't work but something along the lines of...
/\s\s+[>>]/
But that's obviously no good.
Would appreciate any help. This site has been an amazing resource for me over the years and I can't believe I'm only getting around to posting something now, so it goes to show even a knucklehead like me has been able to benefit without bothering people... until now:) Thanks in advance.
For >> both inside a string and with leading whitespace, try:
/(\s*)(>>){1}/
If you want the space to be optional, then you can simply do this :
/>>/
And you may use it as a replacement pattern with the g modifier :
str = str.replace(/>>/g, 'something')
If you want to check that a string is >> with maybe some space before, then use
/^\s?>>$/
Breaking down your example:
\s will match any white-space character
\s+ will match one or more white-space characters
[>>] will match one > (see more below on this)
So your expression will match a > preceeded by at least two white-space characters.
If you want to match zero-or-more you will have to use *; fex \s*.
Square brackets are used to denote sets of characters, and will match any of the characters in the set; fex [abc] will match a, b or c, but only one character at time.
Single characters in a regular expression will match that character; fex > will match one greater-than sign.
Putting it together we get the following regular expression for your case:
/\s*>>/
This should work with the optional space.
/\s{0,}>>/g
Visit this link to test the matches.
If you want it to match an unlimited amount of leading space characters:
/ *>>/
If you want it to match 0 or 1 leading space character:
/ ?>>/
use this :
str.replace(/\s+>>/g, 'whatever');
This regex should work.
/\s*[>]{2}/
This is cleaner
/\s*>>/
Tested:
var pattern = /\s*>>/;
s1 = " >>";
s2 = ">>";
s3 = ">> surrounding text";
s4 = "surrounding >> text";
s5 = "surrounding>>text";
s1.match(pattern);
[" >>"]
s2.match(pattern);
[">>"]
s3.match(pattern);
[">>"]
s4.match(pattern);
[" >>"]
s5.match(pattern);
[">>"]
Replacement example
var pattern = /\s*>>/;
var s6 = " >> surrounding text";
s6.replace(pattern, ">");
"> surrounding text"

Regex to replace string with word and characters

I've got three working regexp's,
string.replace(\catalogue\g, "") // replace a the word catalogue
string.replace(/[/:]/g, "") // replace the characters /, :
string.replace(\20%\g, "") // replace '20%'
Instead of replacing the string three times, I want to combine my regexp's.
Wanted result = 'removethewordnow';
var string = 'rem:ove20%the/word:catalogue20%now';
My latest try was:
string.replace(/\catalogue\b|[/20%:]/g, ""); // works, but catalouge is unaffected and 20% isn't combined as a word
Off the top of my head:
string.replace(/(catalogue|[\/:]|20%)/g,"");
Just use an alternative, i.e. separate each of the regular expressions you had before by the alternation operator |:
catalogue|20%|[/:]
Also note that you cannot just combine character classes and literal strings in the way you have done there. Above naïve combination works and everything beyond that might be optimisation (even if it can't be optimised further in this case) – but that only works if you don't change the language described by the regex.
You seem to be having a typo there (\c), also you don't want 20% inside the character class (and you should escape the slash). You also need to remove the word boundaries if you want to allow catalogue20% to match - there is no boundary between catalogue and 20, therefore the \b fails:
string.replace(/catalogue|20%|[\/:]/g, "");
var string = 'rem:ove20%the/word:catalogue20%now';
string.replace(/([:/]|20%|catalogue)/g, '');
\b refers to a word boundary, but your word catalogue is mixed with other words. So your regex should be:
string.replace(/catalogue|[\/20%:]/g, "");
Also do escape the / with \/.
string.replace(/catalogue|20%|[/:]/g, '')

Categories