RegEx for replacing punctuation excluding negative numbers

RegEx for replacing punctuation excluding negative numbers - javascript

Currently, to remove punctuation from a string, I use:
export function scrubPunctuation(text) {
let reg = /\b[-.,()&$#![\]{}"']+\B|\B[-.,()&$#![\]{}"']+\b/g;
return text.replace(reg, "");
}
but this also removes -1, where - is not so much "punctuation" as part of a numerical value.
How do I solve this problem?
Example use case:
I have take a string from a user that might look like this:
const userStr = " I want something, sort of, that has at least one property < -1.02 ? "
Currently, my approach is to first trim the string to remove the leading / trailing white space.
Then I "scrub" punctuation from the string.
From the example of userStr above, I might eventually parse out (via some unrelated to regex):
const relevant = ["something", "at least one", "<", "-1.02"]
In general, non-numeric punctuation is irrelevant.

Split your first character set. Remove the hyphen from the first set and add a Negative lookahead for the hyphen:
[-]+(?![0-9]) \\a Hyphen not followed by a number
And the full expression:
\b[-]+(?![0-9])|[-.,()&$#![\]{}"']+\B|\B[.,()&$#![\]{}"']+\b
Here is a working example

If you don't want the minus sign or the dot or comma removed form the digits, one option might be to capture what you want to keep (in this case a digit with an optional decimal part) and match what you want to remove.
(-?\d+(?:[.,]\d+)*)|[-.,()&$#![\]{}"']+
Regex demo
let pattern = /(-?\d+(?:[.,]\d+)*)|[-.,()&$#![\]{}"']+/g;
let str = "This is -4, -55 or -4,00.00 (test) 5,00";
let res = str.replace(pattern, "$1");
console.log(res);

something like /[,?!.']/g could do the job and you add whatever you want
const text = "bar........,foo,????!-1'poo!!!?'";
const res = text.replace(/[,?!.']/g, "")
console.log(res)

I would split it into two.
First I would remove everything but alphanumeric and -.
/[^a-z0-9\-\s\n]/gi
It is a little more readable than your method and should give the same result unless there is some character you want to keep (like whitespace \s and newline \n).
To get rid of the punctuation "-", I would use:
/-(\d*)/g
So altogether:
export function scrubPunctuation(text) {
let reg = /[^a-z0-9\-\s\n]/gi;
let reg2 = /-(\d*)/g;
text = text.replace(reg, "");
return text.replace(reg2, "$1");
}
Haven't tested it, but it should work

Related

Regex to match string in a sentence

I am trying to find a strictly declared string in a sentence, the thread says:
Find the position of the string "ten" within a sentence, without using the exact string directly (this can be avoided in many ways using just a bit of RegEx). Print as many spaces as there were characters in the original sentence before the aforementioned string appeared, and then the string itself in lowercase.
I've gotten this far:
let words = 'A ton of tunas weighs more than ten kilograms.'
function findTheNumber(){
let regex=/t[a-z]*en/gi;
let output = words.match(regex)
console.log(words)
console.log(output)
}
console.log(findTheNumber())
The result should be:
input = A ton of tunas weighs more than ten kilograms.
output = ten(ENTER)

You could try a regex replacement approach, with the help of a callback function:
var input = "A ton of tunas weighs more than ten kilograms.";
var output = input.replace(/\w+/g, function(match, contents, offset, input_string)
{
if (!match.match(/^[t][e][n]$/)) {
return match.replace(/\w/g, " ");
}
else {
return match;
}
});
console.log(input);
console.log(output);
The above logic matches every word in the input sentence, and then selectively replaces every word which is not ten with an equal number of spaces.

You can use
let text = 'A ton of tunas weighs more than ten kilograms.'
function findTheNumber(words){
console.log( words.replace(/\b(t[e]n)\b|[^.]/g, (x,y) => y ?? " ") )
}
findTheNumber(text)
The \b(t[e]n)\b is basically ten whole word searching pattern.
The \b(t[e]n)\b|[^.] regex will match and capture ten into Group 1 and will match any char but . (as you need to keep it at the end). If Group 1 matches, it is kept (ten remains in the output), else the char matched is replaced with a space.
Depending on what chars you want to keep, you may adjust the [^.] pattern. For example, if you want to keep all non-word chars, you may use \w.

Javascript regex for hashtag with number only

I want to match these kind of hashtag pattern #1, #4321, #1000 and not:
#01 (with leading zero)
#1aa (has alphabetical char)
But special character like comma, period, colon after the number is fine, such as #1.. Think of it as hashtag at the end of the sentence or phrase. Basically treat these as whitespace.
Basically just # and a number.
My code below doesn't meet requirement because it takes leading zero and it has an ugly space at the end. Although I can always trim the result but doesn't feel it's the right way to do it
reg = new RegExp(/#[0-9]+ /g);
var result;
while((result = reg.exec("hyha #12 gfdg #01 aa #2e #1. #101")) !== null) {
alert("\"" + result + "\"");
}
http://jsfiddle.net/qhoc/d3TpJ/
That string there should just match #12, #1 and #101
Please help to suggest better RegEx string than I had. Thanks.

You could use a regex like:
#[1-9]\d*\b
Code example:
var re = /#[1-9]\d*\b/g;
var str = "#1 hyha #12 #0123 #5 gfdg #2e ";
var matches = str.match(re); // = ["#1", "#12", "#5"]

This should work
reg = /(#[1-9]\d*)(?: |\z)/g;
Notice the capturing group (...) for the hash and number, and the non capturing (?: ..) to match the number only if it is followed by a white space or end of string. Use this if you dont want to catch strings like #1 in #1.. Otherwise the other answer is better.
Then you have to get the captured group from the match iterating over something like this:
myString = 'hyha #12 gfdg #01 aa #2e #1. #101';
match = reg.exec(myString);
alert(match[1]);
EDIT
Whenever you are working with regexps, you should use some kind of tool. For desktop for instance you can use The regex coach and online you can try this regex101
For instance: http://regex101.com/r/zY0bQ8

JavaScript: How can I remove any words containing (or directly preceding) capital letters, numbers, or commas, from a string?

I'm trying to write the code so it removes the "bad" words from the string (the text).
The word is "bad" if it has comma or any special sign thereafter. The word is not "bad" if it contains only a to z (small letters).
So, the result I'm trying to achieve is:
<script>
String.prototype.azwords = function() {
return this.replace(/[^a-z]+/g, "0");
}
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove.".azwords();//should be "good gooood"
//Remove has a capital letter
//remove1 has 1
//remove, has comma
//### has three #
//rem0ve? has 0 and ?
//RemoVE has R and V and E
//remove. has .
alert(res);//should alert "good gooood"
</script>

Try this:
return this.replace(/(^|\s+)[a-z]*[^a-z\s]\S*(?!\S)/g, "");
It tries to match a word (that is surrounded by whitespaces / string ends) and contains any (non-whitespace) character but at least one that is not a-z. However, this is quite complicated and unmaintainable. Maybe you should try a more functional approach:
return this.split(/\s+/).filter(function(word) {
return word && !/[^a-z]/.test(word);
}).join(" ");

okay, first off you probably want to use the word boundary escape \b in your regex. Also, it's a bit tricky if you match the bad words, because a bad word might contain lower case chars, so your current regex will exclude anything which does have lowecase letters.
I'd be tempted to pick out the good words and put them in a new string. It's a much easier regex.
/\b[a-z]+\b/g
NB: I'm not totally sure that it'll work for the first and last words in the string so you might need to account for that as well. http://www.regextester.com/ is exceptionally useful.
EDIT: as you want punctiation after the word to be 'bad', this will actually do what I was suggesting
(^|\s)[a-z]+(\s|$)

Firstly I wouldn't recommend changing the prototype of String (or of any native object) if you can avoid because you leave yourself open to conflicts with other code that might define the same property in different ways. Much better to put custom methods like this on a namespaced object, though I'm sure some will disagree.
Second, is there any need to use RegEx completely? (Genuine question; not trying to be facetious.)
Here is an example of the function with plain old JS using a little bit of RegEx here and there. Easier to comment, debug, and reuse.
Here is the code:
var azwords = function(str) {
var arr = str.split(/\s+/),
len = arr.length,
i = 0,
res = "";
for (i; i < len; i += 1) {
if (!(arr[i].match(/[^a-z]/))) {
res += (!res) ? arr[i] : " " + arr[i];
}
}
return res;
}
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove."; //should be "good gooood"
//Remove has a capital letter
//remove1 has 1
//remove, has comma
//### has three #
//rem0ve? has 0 and ?
//RemoVE has R and V and E
//remove. has .
alert(azwords(res));//should alert "good gooood";

Try this one:
var res = "good Remove remove1 remove, ### rem0ve? RemoVE gooood remove.";
var new_one = res.replace(/\s*\w*[#A-Z0-9,.?\\xA1-\\xFF]\w*/g,'');
//Output `good gooood`
Description:
\s* # zero-or-more spaces
\w* # zero-or-more alphanumeric characters
[#A-Z0-9,.?\\xA1-\\xFF] # matches any list of characters
\w* # zero-or-more alphanumeric characters
/g - global (run over all string)

This will find all the words you want /^[a-z]+\s|\s[a-z]+$|\s[a-z]+\s/g so you could use match.
this.match(/^[a-z]+\s|\s[a-z]+$|\s[a-z]+\s/g).join(" "); should return the list of valid words.
Note that this took some time as a JSFiddle so it maybe more efficient to split and iterate your list.

Regex in javascript complex

string str contains somewhere within it http://www.example.com/ followed by 2 digits and 7 random characters (upper or lower case). One possibility is http://www.example.com/45kaFkeLd or http://www.example.com/64kAleoFr. So the only certain aspect is that it always starts with 2 digits.
I want to retrieve "64kAleoFr".
var url = str.match([regex here]);

The regex you’re looking for is /[0-9]{2}[a-zA-Z]{7}/.
var string = 'http://www.example.com/64kAleoFr',
match = (string.match(/[0-9]{2}[a-zA-Z]{7}/) || [''])[0];
console.log(match); // '64kAleoFr'
Note that on the second line, I use the good old .match() trick to make sure no TypeError is thrown when no match is found. Once this snippet has executed, match will either be the empty string ('') or the value you were after.

you could use
var url = str.match(/\d{2}.{7}$/)[0];
where:
\d{2} //two digits
.{7} //seven characters
$ //end of the string
if you don't know if it will be at the end you could use
var url = str.match(/\/\d{2}.{7}$/)[0].slice(1); //grab the "/" at the begining and slice it out

what about using split ?
alert("http://www.example.com/64kAleoFr".split("/")[3]);

var url = "http://www.example.com/",
re = new RegExp(url.replace(/\./g,"\\.") + "(\\d{2}[A-Za-z]{7})");
str = "This is a string with a url: http://www.example.com/45kaFkeLd in the middle.";
var code = str.match(re);
if (code != null) {
// we have a match
alert(code[1]); // "45kaFkeLd"
}
The url needs to be part of the regex if you want to avoid matching other strings of characters elsewhere in the input. The above assumes that the url should be configurable, so it constructs a regex from the url variable (noting that "." has special meaning in a regex so it needs to be escaped). The bit with the two numbers and seven letter is then in parentheses so it can be captured.
Demo: http://jsfiddle.net/nnnnnn/NzELc/

http://www\\.example\\.com/([0-9]{2}\\w{7}) this is your pattern. You'll get your 2 digits and 7 random characters in group 1.

If you notice your example strings, both strings have few digits and a random string after a slash (/) and if the pattern is fixed then i would rather suggest you to split your string with slash and get the last element of the array which was the result of the split function.
Here is how:
var string = "http://www.example.com/64kAleoFr"
ar = string.split("/");
ar[ar.length - 1];
Hope it helps

remove unwanted commas in JavaScript

I want to remove all unnecessary commas from the start/end of the string.
eg; google, yahoo,, , should become google, yahoo.
If possible ,google,, , yahoo,, , should become google,yahoo.
I've tried the below code as a starting point, but it seems to be not working as desired.
trimCommas = function(s) {
s = s.replace(/,*$/, "");
s = s.replace(/^\,*/, "");
return s;
}

In your example you also want to trim the commas if there's spaces between them at the start or at the end, use something like this:
str.replace(/^[,\s]+|[,\s]+$/g, '').replace(/,[,\s]*,/g, ',');
Note the use of the 'g' modifier for global replace.

You need this:
s = s.replace(/[,\s]{2,}/,""); //Removes double or more commas / spaces
s = s.replace(/^,*/,""); //Removes all commas from the beginning
s = s.replace(/,*$/,""); //Removes all commas from the end
EDIT: Made all the changes - should work now.

My take:
var cleanStr = str.replace(/^[\s,]+/,"")
.replace(/[\s,]+$/,"")
.replace(/\s*,+\s*(,+\s*)*/g,",")
This one will work with opera, internet explorer, whatever
Actually tested this last one, and it works!

What you need to do is replace all groups of "space and comma" with a single comma and then remove commas from the start and end:
trimCommas = function(str) {
str = str.replace(/[,\s]*,[,\s]*/g, ",");
str = str.replace(/^,/, "");
str = str.replace(/,$/, "");
return str;
}
The first one replaces every sequence of white space and commas with a single comma, provided there's at least one comma in there. This handles the edge case left in the comments for "Internet Explorer".
The second and third get rid of the comma at the start and end of string where necessary.
You can also add (to the end):
str = str.replace(/[\s]+/, " ");
to collapse multi-spaces down to one space and
str = str.replace(/,/g, ", ");
if you want them to be formatted nicely (space after each comma).
A more generalized solution would be to pass parameters to indicate behaviour:
Passing true for collapse will collapse the spaces within a section (a section being defined as the characters between commas).
Passing true for addSpace will use ", " to separate sections rather than just "," on its own.
That code follows. It may not be necessary for your particular case but it might be better for others in terms of code re-use.
trimCommas = function(str,collapse,addspace) {
str = str.replace(/[,\s]*,[,\s]*/g, ",").replace(/^,/, "").replace(/,$/, "");
if (collapse) {
str = str.replace(/[\s]+/, " ");
}
if (addspace) {
str = str.replace(/,/g, ", ");
}
return str;
}

First ping on Google for "Javascript Trim": http://www.somacon.com/p355.php. You seem to have implemented this using commas, and I don't see why it would be a problem (though you escaped in the second one and not in the first).

Not quite as sophisticated, but simple with:
',google,, , yahoo,, ,'.replace(/\s/g, '').replace(/,+/g, ',');

You should be able to use only one replace call:
/^( *, *)+|(, *(?=,|$))+/g
Test:
'google, yahoo,, ,'.replace(/^( *, *)+|(, *(?=,|$))+/g, '');
"google, yahoo"
',google,, , yahoo,, ,'.replace(/^( *, *)+|(, *(?=,|$))+/g, '');
"google, yahoo"
Breakdown:
/
^( *, *)+ # Match start of string followed by zero or more spaces
# followed by , followed by zero or more spaces.
# Repeat one or more times
| # regex or
(, *(?=,|$))+ # Match , followed by zero or more spaces which have a comma
# after it or EOL. Repeat one or more times
/g # `g` modifier will run on until there is no more matches
(?=...) is a look ahead will will not move the position of the match but only verify that a the characters are after the match. In our case we look for , or EOL

match() is much better tool for this than replace()
str = " aa, bb,, cc , dd,,,";
newStr = str.match(/[^\s,]+/g).join(",")
alert("[" + newStr + "]")

When you want to replace ",," ",,,", ",,,," and ",,,,," below code will be removed by ",".
var abc = new String("46590,26.91667,75.81667,,,45346,27.18333,78.01667,,,45630,12.97194,77.59369,,,47413,19.07283,72.88261,,,45981,13.08784,80.27847,,");
var pqr= abc.replace(/,,/g,',').replace(/,,/g, ',');
alert(pqr);

We Keep Coding

JavaScript is the programming language of the Web.

RegEx for replacing punctuation excluding negative numbers - javascript

Split your first character set. Remove the hyphen from the first set and add a Negative lookahead for the hyphen: [-]+(?![0-9]) \\a Hyphen not followed by a number And the full expression: \b[-]+(?![0-9])|[-.,()&$#![\]{}"']+\B|\B[.,()&$#![\]{}"']+\b Here is a working example

something like /[,?!.']/g could do the job and you add whatever you want const text = "bar........,foo,????!-1'poo!!!?'"; const res = text.replace(/[,?!.']/g, "") console.log(res)

Related

Regex to match string in a sentence

Javascript regex for hashtag with number only

JavaScript: How can I remove any words containing (or directly preceding) capital letters, numbers, or commas, from a string?

Regex in javascript complex

remove unwanted commas in JavaScript

Categories

Resources