I'm trying to match a type definition
def euro : t1 -> t2 -> t3 (and this pattern my repeat further in other examples)
I came up with this regex
^def ([^\s]*)\s:\s([^\s]*)(\s->\s[^\s]*)*
But while it matches euro and t1 it
then matches -> t2 rather than t2
fails to match anything with t3
I can't see what I am doing wrong, and my goal is to capture
euro t1 t2 t3
as four separate items, and what I currently get is
0: "def euro : t1 -> t2 -> t3"
1: "euro"
2: "t1"
3: " -> t3"
You can't use a repeated capturing group in JS regex, all but the last values will be "dropped", re-written upon each subsequent iteration.
When creating a regular expression that needs a capturing group to grab part of the text matched, a common mistake is to repeat the capturing group instead of capturing a repeated group. The difference is that the repeated capturing group will capture only the last iteration, while a group capturing another group that's repeated will capture all iterations.
The way out can be capturing the whole substring and then split it. Here is an example:
var s = "def euro : t1 -> t2 -> t3";
var rx = /^def (\S*)\s:\s(\S*)((?:\s->\s\S*)*)/;
var res = [];
var m = s.match(rx);
if (m) {
res = [m[1], m[2]];
for (var s of m[3].split(" -> ").filter(Boolean)) {
res.push(s);
}
}
console.log(res);
Pattern details
^ - start of string
def - a literal substring
(\S*) - Capturing group 1: 0+ non-whitespace chars
\s:\s - a : enclosed with single whitespaces
(\S*) - Capturing group 2: 0+ non-whitespace chars
((?:\s->\s\S*)*) - Capturing group 3: 0+ repetitions of the following pattern sequences:
\s->\s - whitespace, ->, whitespace
\S* - 0+ non-whitespace chars
Details:
?: - creates a non-capturing group
$1 - recieves the result of first capturing group i.e., \w+
\s[\:\-\>]+\s - matches " : " or " -> "
\w+ - matches repeating alphanumeric pattern
let str = 'def euro : t1 -> t2 -> t3';
let regex = /(?:def\s|\s[\:\-\>]+\s)(\w+)/g;
let match = str.replace(regex, '$1\n').trim().split('\n');
console.log(match);
Related
I found some threads about extracting version number from a string on here but none that does exactly what I want.
How can I filter out the following version numbers from a string with javascript/regex?
Title_v1_1.00.mov filters 1
v.1.0.1-Title.mp3 filters 1.0.1
Title V.3.4A. filters 3.4A
V3.0.4b mix v2 filters 3.0.4b
So look for the first occurrence of: "v" or "v." followed by a digit, followed by digits, letters or dots until either the end of the string or until a whitepace occurs or until a dot (.) occurs with no digit after it.
As per the comments, to match the first version number in the string you could use a capturing group:
^.*?v\.?(\d+(?:\.\d+[a-z]?)*)
Regex demo
That will match:
^ Assert the start of the string
.*? Match 0+ any character non greedy
v\.? Match v followed by an optional dot
( Capturing group
\d+ Match 1+ digits
(?: Non capturing group
\.\d+[a-z]? Match a dot, 1+ digits followed by an optional character a-z
)* Close non capturing group and repeat 0+ times
) Close capturing group
If the character like A in V.3.4A can only be in the last part, you could use:
^.*?v\.?(\d+(?:\.\d+)*[a-z]?)
const strings = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let pattern = /^.*?v\.?(\d+(?:\.\d+[a-z]?)*)/i;
strings.forEach((s) => {
console.log(s.match(pattern)[1]);
});
Details:
v - character "v"
(?:\.)? - matches 1 or 0 repetition of "."
Version capturing group
[0-9a-z\.]* - Matches alphanumeric and "." character
[0-9a-z] - ensures that version number don't ends with "."
You can use RegExp.exec() method to extract matches from string one by one.
const regex = /v(?:\.?)([0-9a-z\.]*[0-9a-z]).*/gi;
let str = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let versions = [];
let v; // variable to store match
for(let i = 0; i < str.length; i++) {
// Executes a check on str[i] to get the result of first capturing group i.e., our version number
if( (v = regex.exec(str[i])) !== null)
versions.push(v[1]); // appends the version number to the array
// If not found, then it checks again if there is a match present or not
else if(str[i].match(regex) !== null)
i--; // if match found then it loops over the same string again
}
console.log(versions);
var test = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b",
];
console.log(test.map(function (a) {
return a.match(/v\.?([0-9a-z]+(?:\.[0-9a-z]+)*)/i)[1];
}));
Explanation:
/ # regex delimiter
v # letter v
\.? # optional dot
( # start group 1, it will contain the version number
[0-9a-z]+ # 1 or more alphanumeric
(?: # start non capture group
\. # a dot
[0-9a-z]+ # 1 or more alphanumeric
)* # end group, may appear 0 or more times
) # end group 1
/i # regex delimiter and flag case insensitive
I would like to remove some numbers and characters from my typescript string by using regex. I think I'm close but I'm missing something.
Here the king of strings I have :
[15620584560] - product name (type)
[1256025] - product name (test+1)
[12560255544220] - product name
What I would like :
Product name
Here the regex I'm using.
product_name = product_name.replace(/\[[0-9]+\]/,'');
You may use
.replace(/^\s*\[[0-9]+]\s*-\s*|\s*\([^()]*\)\s*$/g, '')
See the regex demo
The regex matches two alternatives (separated with |):
^\s*\[[0-9]+]\s*-\s*:
^ - start of string
\s* - 0+ whitespaces
\[ - a [
[0-9]+ - 1+ digits
] - a ] char
\s*-\s* - a - char enclosed with 0+ whitespaces
| - or
\s\([^()]*\)\s*$:
\s* - 0+ whitespaces
\( - a (
[^()]* - 0+ chars other than ( and )
\) - a )
\s* - 0+ whitespaces
$ - end of string.
JS demo:
var strs = ['[15620584560] - product name (type)','[1256025] - product name (test+1)','[12560255544220] - product name'];
var reg = /^\s*\[[0-9]+]\s*-\s*|\s*\([^()]*\)\s*$/g;
for (var s of strs) {
console.log(s, '=>', s.replace(reg, ''));
}
One approach which might work would be to split the input string on dash, and then use a simple regex to remove all terms in parentheses:
var input = '[15620584560] - product name (type)';
var fields = input.split(/\]\s*-/);
var result = fields[1].replace(/\s*\(.*?\)\s*/g, '').trim();
console.log(result);
Criteria:
any word that start with a and end with b having middle char digit. this word should not be on the line which start with char '#'
Given string:
a1b a2b a3b
#a4b a5b a6b
a7b a8b a9b
Expected output:
a1b
a2b
a3b
a7b
a8b
a9b
regex: ?i need it for javascipt.
So far tried below thing:
var text_content =above_mention_content
var reg_exp = /^[^#]?a[0-9]b/gmi;
var matched_text = text_content.match(reg_exp);
console.log(matched_text);
Getting below output:
[ 'a1b', ' a7b' ]
Your /^[^#]?a[0-9]b/gmi will match multiple occurrences of the pattern matching the start of line, then 1 or 0 chars other than #, then a, digit and b. No checking for a whole word, nor actually matching words farther than at the beginning of a string.
You may use a regex that will match lines starting with # and match and capture the words you need in other contexts:
var s = "a1b a2b a3b\n#a4b a5b a6b\n a7b a8b a9b";
var res = [];
s.replace(/^[^\S\r\n]*#.*|\b(a\db)\b/gm, function($0,$1) {
if ($1) res.push($1);
});
console.log(res);
Pattern details:
^ - start of a line (as m multiline modifier makes ^ match the line start)
[^\S\r\n]* - 0+ horizontal whitespaces
#.* - a # and any 0+ chars up to the end of a line
| - or
\b - a leading word boundary
(a\db) - Group 1 capturing a, a digit, a b
\b - a trailing word boundary.
Inside the replace() method, a callback is used where the res array is populated with the contents of Group 1 only.
I would suggest to use 2 reg ex:
First Reg ex fetches the non-hashed lines:
^[^#][a\db\s]+
and then another reg ex for fetching individual words(from each line):
^a\db\s
When I parse Amazon products I get this such of string.
"#19 in Home Improvements (See top 100)"
I figured how to retrieve BSR number which is /#\d*/
But have no idea how to retrieve Category which is going after in and end until brackets (See top 100).
I suggest
#(\d+)\s+in\s+([^(]+?)\s*\(
See the regex demo
var re = /#(\d+)\s+in\s+([^(]+?)\s*\(/;
var str = '#19 in Home Improvements (See top 100)';
var m = re.exec(str);
if (m) {
console.log(m[1]);
console.log(m[2]);
}
Pattern details:
# - a hash
(\d+) - Group 1 capturing 1 or more digits
\s+in\s+ - in enclosed with 1 or more whitespaces
([^(]+?) - Group 2 capturing 1 or more chars other than ( as few as possible before th first...
\s*\( - 0+ whitespaces and a literal (.
I want to convert Two equal character into single one like bannana should be banana //remove "nn" into single "n". ( except : "aa" all should be convert like above)
i/p : khuddar >> o/p : khudar
i/p : maanas >> o/p : maanas
i/p : hello >> o/p : helo
i/p : apple >> o/p : aple
Need regular expression to do these type of work.
Use capturing group and backreference.
Here's a javascript example:
"khuddar".replace(/([^a])\1/g, "$1")
// => "khudar"
"maanas".replace(/([^a])\1/g, "$1")
// => "maanas"
[^a] - matches a character that is not a.
(...) - matches the regular expression and save it to group 1 (2, 3, .. if there's more parentheses after it).
\1 - backreference for the group 1. If the matched part was b, \1 also refer b.
If you need to only match any letters but a, you can use
.replace(/([b-z])\1/ig, "$1")
See the regex demo
Regex explanation:
([b-z]) - Capture group 1 capturing any ASCII letter from b till z (and A to Z because the /i modifier making the pattern case-insensitive)
\1 - a inline backreference that matches the text value captured with the group above (thus, the pattern matches 2 identical ASCII letters)
In the replacement pattern, $1 numbered replacement backreference is used that replaces the 2 identical ASCII letters with 1 occurrence of this letter.
var re = /([b-z])\1/gi;
var str = 'khuddar<br/>maanas<br/>hello<br/>apple<br/>F11';
var subst = '$1';
var result = str.replace(re, subst);
document.body.innerHTML = result;