Regex to remove numbers and others characters - javascript

I would like to remove some numbers and characters from my typescript string by using regex. I think I'm close but I'm missing something.
Here the king of strings I have :
[15620584560] - product name (type)
[1256025] - product name (test+1)
[12560255544220] - product name
What I would like :
Product name
Here the regex I'm using.
product_name = product_name.replace(/\[[0-9]+\]/,'');

You may use
.replace(/^\s*\[[0-9]+]\s*-\s*|\s*\([^()]*\)\s*$/g, '')
See the regex demo
The regex matches two alternatives (separated with |):
^\s*\[[0-9]+]\s*-\s*:
^ - start of string
\s* - 0+ whitespaces
\[ - a [
[0-9]+ - 1+ digits
] - a ] char
\s*-\s* - a - char enclosed with 0+ whitespaces
| - or
\s\([^()]*\)\s*$:
\s* - 0+ whitespaces
\( - a (
[^()]* - 0+ chars other than ( and )
\) - a )
\s* - 0+ whitespaces
$ - end of string.
JS demo:
var strs = ['[15620584560] - product name (type)','[1256025] - product name (test+1)','[12560255544220] - product name'];
var reg = /^\s*\[[0-9]+]\s*-\s*|\s*\([^()]*\)\s*$/g;
for (var s of strs) {
console.log(s, '=>', s.replace(reg, ''));
}

One approach which might work would be to split the input string on dash, and then use a simple regex to remove all terms in parentheses:
var input = '[15620584560] - product name (type)';
var fields = input.split(/\]\s*-/);
var result = fields[1].replace(/\s*\(.*?\)\s*/g, '').trim();
console.log(result);

Related

Replace not numbers or words to underscore but leave dash and remove spaces around it

So I got this string
'word word - word word 24/03/21'
And I would like to convert it to
'word_word-word_word_24_03_21'
I have tried this
replace(/[^aA-zZ0-9]/g, '_')
But I get this instead
word_word___word_word_24_03_21
You can use 2 .replace() calls:
const s = 'word word - word word 24/03/21'
var r = s.replace(/\s*-\s*/g, '-').replace(/[^-\w]+/g, '_')
console.log(r)
//=> "word_word-word_word_24_03_21"
Explanation:
.replace(/\s*-\s*/g, '-'): Remove surrounding spaces of a hyphen
.replace(/[^-\w]+/g, '_'): Replace all character that are not a hyphen and not a word character with an underscore
You can use
console.log(
'word word - word word 24/03/21'.replace(/\s*(-)\s*|[^\w-]+/g, (x,y) => y || "_")
)
Here,
/\s*(-)\s*|[^\w-]+/g - matches and captures into Group 1 a - enclosed with zero or more whitespaces, and just matches any non-word char excluding -
(x,y) => y || "_") - replaces with Group 1 if it was matched, and if not, replacement is a _ char.
With a function for replace and an alternation in the pattern, you could also match:
(\s*-\s*) Match a - between optional whtiespace chars
| Or
[^a-zA-Z0-9-]+ Match 1+ times any of the listed ranges
In the callback, check if group 1 exists. If it does, return only a -, else return _
Note that this notation [^aA-zZ0-9] is not the same as [a-zA-Z0-9], see what [A-z] matches.
let s = "word word - word word 24/03/21";
s = s.replace(/(\s*-\s*)|[^a-zA-Z0-9-]+/g, (_, g1) => g1 ? "-" : "_");
console.log(s);
You can use the + regex operator to replace 1 or more continuous matches at once.
let s = 'word word - word word 24/03/21';
let r = s
.replace(/[^aA-zZ0-9]*-[^aA-zZ0-9]*/g, '-')
.replace(/[^aA-zZ0-9-]+/g, '_');
console.log(r);
// 'word_word-word_word_24_03_21'

I am writing a regex to handle the following but i am stuck in between

Requirements:
There can be no capital letters in the string.
The string cannot contain any of the characters '^$.?*+()'
If '[' is present in the string it must be followed by zero or more characters other than '[' and ']', which must be followed by ']'. For example, [test] is valid, whereas [test not valid, test] and [[test]] are not valid. [test][test] is valid.
'|' can be used if there is words after | like ||| not valid but |test valid word| not valid
function charPos(str, char) {
return str
.split("")
.map(function (c, i) { if (c == char) return i; })
.filter(function (v) { return v >= 0; });
}
function testString(urlPattern)
{
let regex = /[ $^*()+\[\]\\|.\/?]/g;
if(regex.test(urlPattern)){
let secondRegex = true;
let thirdRegex = true;
let regexData = /[ $^*()+\\\\.\/?]/g;
let regexDataNew = /[ $^*()+\\\.\/?]/g;
if(urlPattern.indexOf("[") < urlPattern.indexOf("]") && !regexData.test(urlPattern)){
secondRegex = false;
}
if(urlPattern.indexOf("[") == -1 && urlPattern.indexOf("]") == -1){
secondRegex = false;
}
if(urlPattern.indexOf("[[") != -1 || urlPattern.indexOf("]]") != -1){
secondRegex = true;
}
let pos = charPos(urlPattern,'|');
let largest = pos.sort((a,b)=>a-b)[pos.length - 1];
if(largest+1 < urlPattern.length && !regexDataNew.test(urlPattern)){
thirdRegex = false;
}
if(urlPattern.indexOf("|") == -1 ){
thirdRegex = false;
}
if(secondRegex || thirdRegex){
return 'Not Valid';
}
else {
return 'Valid';
}
}
else {
return 'Valid1';
}
}
// test case
testString('testttthhh##|sss'); working
testString('testttthhh##|'); working
testString('testttthhh##|[]') working but need to show invalid.
If anyone have some solution or face same type problem help me to sort it out.
Thanks
You could use match any char except the chars that you don't want to match.
If you reach either a pipe or an opening square bracket, you assert either that that pipe is followed for example a word character like a-z, digits or an underscore.
In case you encounter an opening square bracket, you match it until a closing one, and assert that there is not another one following.
If an empty string should not be matched, you can start the pattern with a negative lookahead ^(?!$)
^[^\s\[\]^$.|?*+()A-Z\\]*(?:(?:\[[^\s\[\]\\]+](?!\])|\|[^\s\[\]^$.|?*+()A-Z\\]+)[^\s\[\]^$.|?*+()A-Z\\]*)*$
Explanation
^ Start of string
[^\s\[\]^$.|?*+()A-Z\\]* Match 0+ times any char except the listed
(?: Non capture group
(?: Non capture group
\[[^\s\[\]\\]+](?!\]) Match from [...] not followed by ]
| Or
\|[^\s\[\]^$.|?*+()A-Z\\]+ Match a pipe and match 1+ times any listed word chars
) Close non capture group
[^\s\[\]^$.|?*+()A-Z\\]* Match 0+ times any char except the listed
)* Close non capturing group and repeat 0+ times as there does not have to be a | or [] present
$ End of string
Regex demo
let pattern = /^[^\s\[\]^$.|?*+()A-Z\\]*(?:(?:\[[^\s\[\]\\]+](?!\])|\|[^\s\[\]^$.|?*+()A-Z\\]+)[^\s\[\]^$.|?*+()A-Z\\]*)*$/;
[
"testttthhh##|sss",
"[]",
"test test",
"test\\test",
"word|#pro",
"word|%pro%",
"testttthhh##|",
"testttthhh##|[]",
"[test]",
"[test",
"test]",
"[[test]]",
"[test][test]",
"|||",
"|test",
"word|",
"Atest"
].forEach(s => {
console.log(pattern.test(s) + " --> " + s);
});
You can test the string with the following regular expression.
/^(?!.*\|\S*\|)(?!.*[$^.?*+()A-Z])[^\[\]\n]*(?:\[[^\[\]\n]*\][^\[\]\n]*)*[^\[\]\n]*$/
Start your engine!
Javascript's regex engine performs the following operations.
^ : match beginning of string
(?! : begin negative lookahead
.*\|\S*\| : match 0+ chars followed by '|' follow by 0+
non-whitespace chars followed by |'
) : end negative lookahead
(?! : begin negative lookahead
.* : match 0+ chars
[$^.?*+()A-Z] : match a char in the char class
) : end negative lookahead
[^\[\]\n]* : match 0+ chars other than those char class
(?: : begin a non-capture group
\[ : match '['
[^\[\]\n]* : match 0+ chars other than those in char class
\] : match ']'
[^\[\]\n]* : match 0+ chars other than those in char class
) : end non-capture group
* : execute non-capture group 0+ times
[^\[\]\n]* : match 0+ chars other than those char class
$ : match end of string

Filter version number from string in javascript?

I found some threads about extracting version number from a string on here but none that does exactly what I want.
How can I filter out the following version numbers from a string with javascript/regex?
Title_v1_1.00.mov filters 1
v.1.0.1-Title.mp3 filters 1.0.1
Title V.3.4A. filters 3.4A
V3.0.4b mix v2 filters 3.0.4b
So look for the first occurrence of: "v" or "v." followed by a digit, followed by digits, letters or dots until either the end of the string or until a whitepace occurs or until a dot (.) occurs with no digit after it.
As per the comments, to match the first version number in the string you could use a capturing group:
^.*?v\.?(\d+(?:\.\d+[a-z]?)*)
Regex demo
That will match:
^ Assert the start of the string
.*? Match 0+ any character non greedy
v\.? Match v followed by an optional dot
( Capturing group
\d+ Match 1+ digits
(?: Non capturing group
\.\d+[a-z]? Match a dot, 1+ digits followed by an optional character a-z
)* Close non capturing group and repeat 0+ times
) Close capturing group
If the character like A in V.3.4A can only be in the last part, you could use:
^.*?v\.?(\d+(?:\.\d+)*[a-z]?)
const strings = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let pattern = /^.*?v\.?(\d+(?:\.\d+[a-z]?)*)/i;
strings.forEach((s) => {
console.log(s.match(pattern)[1]);
});
Details:
v - character "v"
(?:\.)? - matches 1 or 0 repetition of "."
Version capturing group
[0-9a-z\.]* - Matches alphanumeric and "." character
[0-9a-z] - ensures that version number don't ends with "."
You can use RegExp.exec() method to extract matches from string one by one.
const regex = /v(?:\.?)([0-9a-z\.]*[0-9a-z]).*/gi;
let str = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let versions = [];
let v; // variable to store match
for(let i = 0; i < str.length; i++) {
// Executes a check on str[i] to get the result of first capturing group i.e., our version number
if( (v = regex.exec(str[i])) !== null)
versions.push(v[1]); // appends the version number to the array
// If not found, then it checks again if there is a match present or not
else if(str[i].match(regex) !== null)
i--; // if match found then it loops over the same string again
}
console.log(versions);
var test = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b",
];
console.log(test.map(function (a) {
return a.match(/v\.?([0-9a-z]+(?:\.[0-9a-z]+)*)/i)[1];
}));
Explanation:
/ # regex delimiter
v # letter v
\.? # optional dot
( # start group 1, it will contain the version number
[0-9a-z]+ # 1 or more alphanumeric
(?: # start non capture group
\. # a dot
[0-9a-z]+ # 1 or more alphanumeric
)* # end group, may appear 0 or more times
) # end group 1
/i # regex delimiter and flag case insensitive

Retrieve BSR and category from string with RegExp

When I parse Amazon products I get this such of string.
"#19 in Home Improvements (See top 100)"
I figured how to retrieve BSR number which is /#\d*/
But have no idea how to retrieve Category which is going after in and end until brackets (See top 100).
I suggest
#(\d+)\s+in\s+([^(]+?)\s*\(
See the regex demo
var re = /#(\d+)\s+in\s+([^(]+?)\s*\(/;
var str = '#19 in Home Improvements (See top 100)';
var m = re.exec(str);
if (m) {
console.log(m[1]);
console.log(m[2]);
}
Pattern details:
# - a hash
(\d+) - Group 1 capturing 1 or more digits
\s+in\s+ - in enclosed with 1 or more whitespaces
([^(]+?) - Group 2 capturing 1 or more chars other than ( as few as possible before th first...
\s*\( - 0+ whitespaces and a literal (.

RegExp match word till space or character

I'm trying to match all the words starting with # and words between 2 # (see example)
var str = "#The test# rain in #SPAIN stays mainly in the #plain";
var res = str.match(/(#)[^\s]+/gi);
The result will be ["#The", "#SPAIN", "#plain"] but it should be ["#The test#", "#SPAIN", "#plain"]
Extra: would be nice if the result would be without the #.
Does anyone has a solution for this?
You can use
/#\w+(?:(?: +\w+)*#)?/g
See the demo here
The regex matches:
# - a hash symbol
\w+ - one or more alphanumeric and underscore characters
(?:(?: +\w+)*#)? - one or zero occurrence of:
(?: +\w+)* - zero or more occurrences of one or more spaces followed with one or more word characters followed with
# - a hash symbol
NOTE: If there can be characters other than word characters (those in the [A-Za-z0-9_] range), you can replace \w with [^ #]:
/#[^ #]+(?:(?: +[^ #]+)*#)?/g
See another demo
var re = /#[^ #]+(?:(?: +[^ #]+)*#)?/g;
var str = '#The test-mode# rain in #SPAIN stays mainly in the #plain #SPAIN has #the test# and more #here';
var m = str.match(re);
if (m) {
// Using ES6 Arrow functions
m = m.map(s => s.replace(/#$/g, ''));
// ES5 Equivalent
/*m = m.map(function(s) {
return s.replace(/#$/g, '');
});*/ // getting rid of the trailing #
document.body.innerHTML = "<pre>" + JSON.stringify(m, 0, 4) + "</pre>";
}
You can also try this regex.
#(?:\b[\s\S]*?\b#|\w+)
(?: opens a non capture group for alternation
\b matches a word boundary
\w matches a word character
[\s\S] matches any character
See demo at regex101 (use with g global flag)

Categories