Regex to match many times

Regex to match many times - javascript

I'm trying to match a type definition
def euro : t1 -> t2 -> t3 (and this pattern my repeat further in other examples)
I came up with this regex
^def ([^\s]*)\s:\s([^\s]*)(\s->\s[^\s]*)*
But while it matches euro and t1 it
then matches -> t2 rather than t2
fails to match anything with t3
I can't see what I am doing wrong, and my goal is to capture
euro t1 t2 t3
as four separate items, and what I currently get is
0: "def euro : t1 -> t2 -> t3"
1: "euro"
2: "t1"
3: " -> t3"

You can't use a repeated capturing group in JS regex, all but the last values will be "dropped", re-written upon each subsequent iteration.
When creating a regular expression that needs a capturing group to grab part of the text matched, a common mistake is to repeat the capturing group instead of capturing a repeated group. The difference is that the repeated capturing group will capture only the last iteration, while a group capturing another group that's repeated will capture all iterations.
The way out can be capturing the whole substring and then split it. Here is an example:
var s = "def euro : t1 -> t2 -> t3";
var rx = /^def (\S*)\s:\s(\S*)((?:\s->\s\S*)*)/;
var res = [];
var m = s.match(rx);
if (m) {
res = [m[1], m[2]];
for (var s of m[3].split(" -> ").filter(Boolean)) {
res.push(s);
}
}
console.log(res);
Pattern details
^ - start of string
def - a literal substring
(\S*) - Capturing group 1: 0+ non-whitespace chars
\s:\s - a : enclosed with single whitespaces
(\S*) - Capturing group 2: 0+ non-whitespace chars
((?:\s->\s\S*)*) - Capturing group 3: 0+ repetitions of the following pattern sequences:
\s->\s - whitespace, ->, whitespace
\S* - 0+ non-whitespace chars

Details:
?: - creates a non-capturing group
$1 - recieves the result of first capturing group i.e., \w+
\s[\:\-\>]+\s - matches " : " or " -> "
\w+ - matches repeating alphanumeric pattern
let str = 'def euro : t1 -> t2 -> t3';
let regex = /(?:def\s|\s[\:\-\>]+\s)(\w+)/g;
let match = str.replace(regex, '$1\n').trim().split('\n');
console.log(match);

Related

Filter version number from string in javascript?

I found some threads about extracting version number from a string on here but none that does exactly what I want.
How can I filter out the following version numbers from a string with javascript/regex?
Title_v1_1.00.mov filters 1
v.1.0.1-Title.mp3 filters 1.0.1
Title V.3.4A. filters 3.4A
V3.0.4b mix v2 filters 3.0.4b
So look for the first occurrence of: "v" or "v." followed by a digit, followed by digits, letters or dots until either the end of the string or until a whitepace occurs or until a dot (.) occurs with no digit after it.

As per the comments, to match the first version number in the string you could use a capturing group:
^.*?v\.?(\d+(?:\.\d+[a-z]?)*)
Regex demo
That will match:
^ Assert the start of the string
.*? Match 0+ any character non greedy
v\.? Match v followed by an optional dot
( Capturing group
\d+ Match 1+ digits
(?: Non capturing group
\.\d+[a-z]? Match a dot, 1+ digits followed by an optional character a-z
)* Close non capturing group and repeat 0+ times
) Close capturing group
If the character like A in V.3.4A can only be in the last part, you could use:
^.*?v\.?(\d+(?:\.\d+)*[a-z]?)
const strings = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let pattern = /^.*?v\.?(\d+(?:\.\d+[a-z]?)*)/i;
strings.forEach((s) => {
console.log(s.match(pattern)[1]);
});

Details:
v - character "v"
(?:\.)? - matches 1 or 0 repetition of "."
Version capturing group
[0-9a-z\.]* - Matches alphanumeric and "." character
[0-9a-z] - ensures that version number don't ends with "."
You can use RegExp.exec() method to extract matches from string one by one.
const regex = /v(?:\.?)([0-9a-z\.]*[0-9a-z]).*/gi;
let str = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let versions = [];
let v; // variable to store match
for(let i = 0; i < str.length; i++) {
// Executes a check on str[i] to get the result of first capturing group i.e., our version number
if( (v = regex.exec(str[i])) !== null)
versions.push(v[1]); // appends the version number to the array
// If not found, then it checks again if there is a match present or not
else if(str[i].match(regex) !== null)
i--; // if match found then it loops over the same string again
}
console.log(versions);

var test = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b",
];
console.log(test.map(function (a) {
return a.match(/v\.?([0-9a-z]+(?:\.[0-9a-z]+)*)/i)[1];
}));
Explanation:
/ # regex delimiter
v # letter v
\.? # optional dot
( # start group 1, it will contain the version number
[0-9a-z]+ # 1 or more alphanumeric
(?: # start non capture group
\. # a dot
[0-9a-z]+ # 1 or more alphanumeric
)* # end group, may appear 0 or more times
) # end group 1
/i # regex delimiter and flag case insensitive

Regex to remove numbers and others characters

I would like to remove some numbers and characters from my typescript string by using regex. I think I'm close but I'm missing something.
Here the king of strings I have :
[15620584560] - product name (type)
[1256025] - product name (test+1)
[12560255544220] - product name
What I would like :
Product name
Here the regex I'm using.
product_name = product_name.replace(/\[[0-9]+\]/,'');

You may use
.replace(/^\s*\[[0-9]+]\s*-\s*|\s*\([^()]*\)\s*$/g, '')
See the regex demo
The regex matches two alternatives (separated with |):
^\s*\[[0-9]+]\s*-\s*:
^ - start of string
\s* - 0+ whitespaces
\[ - a [
[0-9]+ - 1+ digits
] - a ] char
\s*-\s* - a - char enclosed with 0+ whitespaces
| - or
\s\([^()]*\)\s*$:
\s* - 0+ whitespaces
\( - a (
[^()]* - 0+ chars other than ( and )
\) - a )
\s* - 0+ whitespaces
$ - end of string.
JS demo:
var strs = ['[15620584560] - product name (type)','[1256025] - product name (test+1)','[12560255544220] - product name'];
var reg = /^\s*\[[0-9]+]\s*-\s*|\s*\([^()]*\)\s*$/g;
for (var s of strs) {
console.log(s, '=>', s.replace(reg, ''));
}

One approach which might work would be to split the input string on dash, and then use a simple regex to remove all terms in parentheses:
var input = '[15620584560] - product name (type)';
var fields = input.split(/\]\s*-/);
var result = fields[1].replace(/\s*\(.*?\)\s*/g, '').trim();
console.log(result);

What will be the regular expression for below requirement in javascript

Criteria:
any word that start with a and end with b having middle char digit. this word should not be on the line which start with char '#'
Given string:
a1b a2b a3b
#a4b a5b a6b
a7b a8b a9b
Expected output:
a1b
a2b
a3b
a7b
a8b
a9b
regex: ?i need it for javascipt.
So far tried below thing:
var text_content =above_mention_content
var reg_exp = /^[^#]?a[0-9]b/gmi;
var matched_text = text_content.match(reg_exp);
console.log(matched_text);
Getting below output:
[ 'a1b', ' a7b' ]

Your /^[^#]?a[0-9]b/gmi will match multiple occurrences of the pattern matching the start of line, then 1 or 0 chars other than #, then a, digit and b. No checking for a whole word, nor actually matching words farther than at the beginning of a string.
You may use a regex that will match lines starting with # and match and capture the words you need in other contexts:
var s = "a1b a2b a3b\n#a4b a5b a6b\n a7b a8b a9b";
var res = [];
s.replace(/^[^\S\r\n]*#.*|\b(a\db)\b/gm, function($0,$1) {
if ($1) res.push($1);
});
console.log(res);
Pattern details:
^ - start of a line (as m multiline modifier makes ^ match the line start)
[^\S\r\n]* - 0+ horizontal whitespaces
#.* - a # and any 0+ chars up to the end of a line
| - or
\b - a leading word boundary
(a\db) - Group 1 capturing a, a digit, a b
\b - a trailing word boundary.
Inside the replace() method, a callback is used where the res array is populated with the contents of Group 1 only.

I would suggest to use 2 reg ex:
First Reg ex fetches the non-hashed lines:
^[^#][a\db\s]+
and then another reg ex for fetching individual words(from each line):
^a\db\s

Retrieve BSR and category from string with RegExp

When I parse Amazon products I get this such of string.
"#19 in Home Improvements (See top 100)"
I figured how to retrieve BSR number which is /#\d*/
But have no idea how to retrieve Category which is going after in and end until brackets (See top 100).

I suggest
#(\d+)\s+in\s+([^(]+?)\s*\(
See the regex demo
var re = /#(\d+)\s+in\s+([^(]+?)\s*\(/;
var str = '#19 in Home Improvements (See top 100)';
var m = re.exec(str);
if (m) {
console.log(m[1]);
console.log(m[2]);
}
Pattern details:
# - a hash
(\d+) - Group 1 capturing 1 or more digits
\s+in\s+ - in enclosed with 1 or more whitespaces
([^(]+?) - Group 2 capturing 1 or more chars other than ( as few as possible before th first...
\s*\( - 0+ whitespaces and a literal (.

Match two equal character in word and replace with one RE

I want to convert Two equal character into single one like bannana should be banana //remove "nn" into single "n". ( except : "aa" all should be convert like above)
i/p : khuddar >> o/p : khudar
i/p : maanas >> o/p : maanas
i/p : hello >> o/p : helo
i/p : apple >> o/p : aple
Need regular expression to do these type of work.

Use capturing group and backreference.
Here's a javascript example:
"khuddar".replace(/([^a])\1/g, "$1")
// => "khudar"
"maanas".replace(/([^a])\1/g, "$1")
// => "maanas"
[^a] - matches a character that is not a.
(...) - matches the regular expression and save it to group 1 (2, 3, .. if there's more parentheses after it).
\1 - backreference for the group 1. If the matched part was b, \1 also refer b.

If you need to only match any letters but a, you can use
.replace(/([b-z])\1/ig, "$1")
See the regex demo
Regex explanation:
([b-z]) - Capture group 1 capturing any ASCII letter from b till z (and A to Z because the /i modifier making the pattern case-insensitive)
\1 - a inline backreference that matches the text value captured with the group above (thus, the pattern matches 2 identical ASCII letters)
In the replacement pattern, $1 numbered replacement backreference is used that replaces the 2 identical ASCII letters with 1 occurrence of this letter.
var re = /([b-z])\1/gi;
var str = 'khuddar<br/>maanas<br/>hello<br/>apple<br/>F11';
var subst = '$1';
var result = str.replace(re, subst);
document.body.innerHTML = result;

We Keep Coding

JavaScript is the programming language of the Web.

Regex to match many times - javascript

Related

Filter version number from string in javascript?

Regex to remove numbers and others characters

What will be the regular expression for below requirement in javascript

Retrieve BSR and category from string with RegExp

Match two equal character in word and replace with one RE

Categories

Resources