Retrieve BSR and category from string with RegExp

Retrieve BSR and category from string with RegExp - javascript

When I parse Amazon products I get this such of string.
"#19 in Home Improvements (See top 100)"
I figured how to retrieve BSR number which is /#\d*/
But have no idea how to retrieve Category which is going after in and end until brackets (See top 100).

I suggest
#(\d+)\s+in\s+([^(]+?)\s*\(
See the regex demo
var re = /#(\d+)\s+in\s+([^(]+?)\s*\(/;
var str = '#19 in Home Improvements (See top 100)';
var m = re.exec(str);
if (m) {
console.log(m[1]);
console.log(m[2]);
}
Pattern details:
# - a hash
(\d+) - Group 1 capturing 1 or more digits
\s+in\s+ - in enclosed with 1 or more whitespaces
([^(]+?) - Group 2 capturing 1 or more chars other than ( as few as possible before th first...
\s*\( - 0+ whitespaces and a literal (.

Related

Regex between last two characters

I have a querstion about simple regex. I need to get between of these characters: - and ~
My string: Champions tour - To Win1 - To Win2 ~JIM FURYK
When I use this: \-([^)]+\~) it is giving as matched this:
To Win1 - To Win2 ~
But I need this:
To Win2 ~JIM FURYK
Is it possible to this?
My regex is here: https://regex101.com/r/fJBLXb/1/

Just add \-([^-)]+\~) - dash to not match

Your \-([^)]+\~) regex matches the leftmost - that is directly followed with one or more chars other than ) (so it matches -, a, §, etc.) and then a ~ char. It does not stop at - chars and thus can match any amount of hyphens.
To match the value after last hyphen you can use
[^\s-][^-]*$
See the regex demo and the regex graph. Details:
[^\s-] - a char other than whitespace and -
[^-]* - zero or more chars other than -
$ - end of string.
See the JavaScript demo:
const text = 'Champions tour - To Win1 - To Win2 ~JIM FURYK';
const match = text.match(/[^\s-][^-]*$/);
if (match) {
console.log(match[0]);
}

You could use match as follows:
var input = "Champions tour - To Win1 - To Win2 ~JIM FURYK";
var output = input.match(/- ([^-]+~.*)$/)[1];
console.log(output);
The regex pattern used above says to match:
- a hyphen
[ ] a single space
( capture what follows
[^-]+ match all content WITHOUT crossing another hyphen
~ ~
.* all remaining content
) stop capture
$ end of the string

javascript multiple regex matches

Given the string below
[NeMo (PROD)] 10.10.100.100 (EFA-B-3) [Brocade FC-Switch ] Sensor:
Power Supply #1 (SNMP Custom Table) Down (No Such Name (SNMP error #
2))
I try to get multiple matches to extract the following values:
var system = "PROD";
var ip = "10.10.100.100";
var location = "EFA-B-3";
var device = "Brocade FC-Switch";
var sensor = "Sensor: Power Supply #1";
var sensorArt = "SNMP Custom Table";
var sensorState = "Down";
var errorMsg = "No Such Name (SNMP error # 2)";
Since I am a beginner with regex I tried to define some "rules":
Extract first value within the first round brackets e.g PROD
Extract the value between the first closing square bracket and
second opening round bracket e.g. 10.10.100.100
Extract the value within the second round brackets e.g EFA-B-3
Extract the value within the second square brackets e.g. Brocade
FC-Switch
Extract the value between the second closing square bracket and the
third opening round bracket e.g. Sensor: Power Supply #1
Extract the value given within the third round brackets e.g. SNMP
Custom Table
Extract the value between the third closing round bracket and the
fourth opening round bracket e.g. Down
Extract the value within the fourth round brackets e.g. No Such Name
(SNMP error # 2)
Using the webpage https://scriptular.com/ I tried to achieve my goal.
So far I managed to build the regex
(?=(([^)]+)))
which gives me my first match (rule 1). Somehow I fail to declare the regex to look between the brackets. What am I missing?

Since there is no way to define separators, the only way is to match the parts and capture them separately.
/\(([^()]+)\)]\s*(.*?)\s*\(([^()]*)\)\s*\[([^\][]*)]\s*(.*?)\s*\(([^()]+)\)\s*(.*?)\s*\((.*)\)/
See the regex demo.
Details
\( - a ( char
([^()]+) - Group 1: 1 or more chars other than ( and )
\)]\s* - )] and 0+ whitespaces
(.*?) - Group 2: any 0+ chars other than line break chars, as few as possible
\s*\( - 0+ whitespaces, (
([^()]*) - Group 3: 1 or more chars other than ( and )
\)\s*\[ - ), 0+ whitespaces, [
([^\][]*) - Group 4: 1 or more chars other than [ and ]
]\s* - ] and 0+ whitespaces
(.*?) - Group 5: any 0+ chars other than line break chars, as few as possible
\s*\( - 0+ whitespaces, (
([^()]+) - Group 6: 1 or more chars other than ( and )
\)\s* - ) and 0+ whitespaces
(.*?) - Group 7: any 0+ chars other than line break chars, as few as possible
\s*\( - 0+ whitespaces and (
(.*) - Group 8: any 0+ chars other than line break chars, as many as possible
\) - ) char.
ES6+ code snippet:
var s = "[NeMo (PROD)] 10.10.100.100 (EFA-B-3) [Brocade FC-Switch ] Sensor: Power Supply #1 (SNMP Custom Table) Down (No Such Name (SNMP error # 2))";
let [_, system, ip, location1, device, sensor, sensorArt, sensorState, errorMsg] = s.match(/\(([^()]+)\)]\s*(.*?)\s*\(([^()]*)\)\s*\[([^\][]*)]\s*(.*?)\s*\(([^()]+)\)\s*(.*?)\s*\((.*)\)/);
console.log(`System=${system}\nIP=${ip}\nLocation=${location1}\nDevice=${device}\nSensor=${sensor}\nSensorArt=${sensorArt}\nSensorState=${sensorState}\nErrorMsg=${errorMsg}`);
ES5:
var s = "[NeMo (PROD)] 10.10.100.100 (EFA-B-3) [Brocade FC-Switch ] Sensor: Power Supply #1 (SNMP Custom Table) Down (No Such Name (SNMP error # 2))";
var system, ip, location1, device, sensor, sensorArt, sensorState, errorMsg;
var rx = /\(([^()]+)\)]\s*(.*?)\s*\(([^()]*)\)\s*\[([^\][]*)]\s*(.*?)\s*\(([^()]+)\)\s*(.*?)\s*\((.*)\)/;
if (m = s.match(rx)) {
system = m[1];
ip = m[2];
location1=m[3];
device=m[4];
sensor=m[5];
sensorArt=m[6];
sensorState=m[7];
errorMsg=m[8];
}
console.log("System="+system+"\nIP="+ip+"\nLocation="+location1+"\nDevice="+device+"\nSensor="+sensor+"\nSensorArt="+sensorArt+"\nSensorState="+sensorState+"\nErrorMsg="+errorMsg);

Filter version number from string in javascript?

I found some threads about extracting version number from a string on here but none that does exactly what I want.
How can I filter out the following version numbers from a string with javascript/regex?
Title_v1_1.00.mov filters 1
v.1.0.1-Title.mp3 filters 1.0.1
Title V.3.4A. filters 3.4A
V3.0.4b mix v2 filters 3.0.4b
So look for the first occurrence of: "v" or "v." followed by a digit, followed by digits, letters or dots until either the end of the string or until a whitepace occurs or until a dot (.) occurs with no digit after it.

As per the comments, to match the first version number in the string you could use a capturing group:
^.*?v\.?(\d+(?:\.\d+[a-z]?)*)
Regex demo
That will match:
^ Assert the start of the string
.*? Match 0+ any character non greedy
v\.? Match v followed by an optional dot
( Capturing group
\d+ Match 1+ digits
(?: Non capturing group
\.\d+[a-z]? Match a dot, 1+ digits followed by an optional character a-z
)* Close non capturing group and repeat 0+ times
) Close capturing group
If the character like A in V.3.4A can only be in the last part, you could use:
^.*?v\.?(\d+(?:\.\d+)*[a-z]?)
const strings = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let pattern = /^.*?v\.?(\d+(?:\.\d+[a-z]?)*)/i;
strings.forEach((s) => {
console.log(s.match(pattern)[1]);
});

Details:
v - character "v"
(?:\.)? - matches 1 or 0 repetition of "."
Version capturing group
[0-9a-z\.]* - Matches alphanumeric and "." character
[0-9a-z] - ensures that version number don't ends with "."
You can use RegExp.exec() method to extract matches from string one by one.
const regex = /v(?:\.?)([0-9a-z\.]*[0-9a-z]).*/gi;
let str = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let versions = [];
let v; // variable to store match
for(let i = 0; i < str.length; i++) {
// Executes a check on str[i] to get the result of first capturing group i.e., our version number
if( (v = regex.exec(str[i])) !== null)
versions.push(v[1]); // appends the version number to the array
// If not found, then it checks again if there is a match present or not
else if(str[i].match(regex) !== null)
i--; // if match found then it loops over the same string again
}
console.log(versions);

var test = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b",
];
console.log(test.map(function (a) {
return a.match(/v\.?([0-9a-z]+(?:\.[0-9a-z]+)*)/i)[1];
}));
Explanation:
/ # regex delimiter
v # letter v
\.? # optional dot
( # start group 1, it will contain the version number
[0-9a-z]+ # 1 or more alphanumeric
(?: # start non capture group
\. # a dot
[0-9a-z]+ # 1 or more alphanumeric
)* # end group, may appear 0 or more times
) # end group 1
/i # regex delimiter and flag case insensitive

Regex to remove numbers and others characters

I would like to remove some numbers and characters from my typescript string by using regex. I think I'm close but I'm missing something.
Here the king of strings I have :
[15620584560] - product name (type)
[1256025] - product name (test+1)
[12560255544220] - product name
What I would like :
Product name
Here the regex I'm using.
product_name = product_name.replace(/\[[0-9]+\]/,'');

You may use
.replace(/^\s*\[[0-9]+]\s*-\s*|\s*\([^()]*\)\s*$/g, '')
See the regex demo
The regex matches two alternatives (separated with |):
^\s*\[[0-9]+]\s*-\s*:
^ - start of string
\s* - 0+ whitespaces
\[ - a [
[0-9]+ - 1+ digits
] - a ] char
\s*-\s* - a - char enclosed with 0+ whitespaces
| - or
\s\([^()]*\)\s*$:
\s* - 0+ whitespaces
\( - a (
[^()]* - 0+ chars other than ( and )
\) - a )
\s* - 0+ whitespaces
$ - end of string.
JS demo:
var strs = ['[15620584560] - product name (type)','[1256025] - product name (test+1)','[12560255544220] - product name'];
var reg = /^\s*\[[0-9]+]\s*-\s*|\s*\([^()]*\)\s*$/g;
for (var s of strs) {
console.log(s, '=>', s.replace(reg, ''));
}

One approach which might work would be to split the input string on dash, and then use a simple regex to remove all terms in parentheses:
var input = '[15620584560] - product name (type)';
var fields = input.split(/\]\s*-/);
var result = fields[1].replace(/\s*\(.*?\)\s*/g, '').trim();
console.log(result);

What will be the regular expression for below requirement in javascript

Criteria:
any word that start with a and end with b having middle char digit. this word should not be on the line which start with char '#'
Given string:
a1b a2b a3b
#a4b a5b a6b
a7b a8b a9b
Expected output:
a1b
a2b
a3b
a7b
a8b
a9b
regex: ?i need it for javascipt.
So far tried below thing:
var text_content =above_mention_content
var reg_exp = /^[^#]?a[0-9]b/gmi;
var matched_text = text_content.match(reg_exp);
console.log(matched_text);
Getting below output:
[ 'a1b', ' a7b' ]

Your /^[^#]?a[0-9]b/gmi will match multiple occurrences of the pattern matching the start of line, then 1 or 0 chars other than #, then a, digit and b. No checking for a whole word, nor actually matching words farther than at the beginning of a string.
You may use a regex that will match lines starting with # and match and capture the words you need in other contexts:
var s = "a1b a2b a3b\n#a4b a5b a6b\n a7b a8b a9b";
var res = [];
s.replace(/^[^\S\r\n]*#.*|\b(a\db)\b/gm, function($0,$1) {
if ($1) res.push($1);
});
console.log(res);
Pattern details:
^ - start of a line (as m multiline modifier makes ^ match the line start)
[^\S\r\n]* - 0+ horizontal whitespaces
#.* - a # and any 0+ chars up to the end of a line
| - or
\b - a leading word boundary
(a\db) - Group 1 capturing a, a digit, a b
\b - a trailing word boundary.
Inside the replace() method, a callback is used where the res array is populated with the contents of Group 1 only.

I would suggest to use 2 reg ex:
First Reg ex fetches the non-hashed lines:
^[^#][a\db\s]+
and then another reg ex for fetching individual words(from each line):
^a\db\s

We Keep Coding

JavaScript is the programming language of the Web.

Retrieve BSR and category from string with RegExp - javascript

When I parse Amazon products I get this such of string. "#19 in Home Improvements (See top 100)" I figured how to retrieve BSR number which is /#\d*/ But have no idea how to retrieve Category which is going after in and end until brackets (See top 100).

Related

Regex between last two characters

javascript multiple regex matches

Filter version number from string in javascript?

Regex to remove numbers and others characters

What will be the regular expression for below requirement in javascript

Categories

Resources