Regex Conditional Lookahead JavaScript - javascript

I just used regex101, to create the following regex.
([^,]*?)=(.*?)(?(?=, )(?:, )|(?:$))(?(?=[^,]*?=)(?:(?=[^,]*?=))|(?:$))
It seems to work perfectly for my use case of getting keys and values that are comma separated while still preserving commas in the values.
Problem is, I want to use this Regex in Node.js (JavaScript), but while writing this entire Regex in regex101, I had it set to PCRE (PHP).
It looks like JavaScript doesn't support Conditional Lookaheads ((?(?=...)()|()).
Is there a way to get this working in JavaScript?
Examples:
2 matches
group 1: id, group 2: 1
group 1: name, group 2: bob
id=1, name=bob
3 matches
group 1: id, group 2: 2
group 1: type, group 2: store
group 1: description, group 2: Hardwood Store
id=2, type=store, description=Hardwood Store
4 matches
group 1: id, group 2: 4
group 1: type, group 2: road
group 1: name, group 2: The longest road name, in the entire world, and universe, forever
group 1: built, group 2: 20190714
id=4, type=road, name=The longest road name, in the entire world, and universe, forever, built=20190714
3 matches
group 1: id, group 2: 3
group 1: type, group 2: building
group 1: builder, group 2: Random Name, and Other Person, with help from Final Person
id=3, type=building, builder=Random Name, and Other Person, with help from Final Person

You may use
/([^,=\s][^,=]*)=(.*?)(?=(?:,\s*)?[^,=]*=|$)/g
See the regex demo.
Details
([^,=\s][^,=]*) - Group 1:
[^,=\s] - a char other than ,, = and whitespace
[^,=]* - zero or more chars other than , and =
= - a = char
(.*?) - Group 2: any zero or more chars other than line break chars, as few as possible
(?=(?:,\s*)?[^,=]*=|$) - a positive lookahead that requires an optional sequence of , and 0+ whitespaces and then 0+ chars other than , and = and then a = or end of string immediately to the right of the current location
JS demo:
var strs = ['id=1, name=bob','id=2, type=store, description=Hardwood Store', 'id=4, type=road, name=The longest road name, in the entire world, and universe, forever, built=20190714','id=3, type=building, builder=Random Name, and Other Person, with help from Final Person']
var rx = /([^,=\s][^,=]*)=(.*?)(?=(?:,\s*)?[^,=]*=|$)/g;
for (var s of strs) {
console.log("STRING:", s);
var m;
while (m=rx.exec(s)) {
console.log(m[1], m[2])
}
}

Maybe, these expressions would be somewhat close to what you might want to design:
([^=\n\r]*)=\s*([^=\n\r]*)\s*(?:,|$)
or
\s*([^=\n\r]*)=\s*([^=\n\r]*)\s*(?:,|$)
not sure though.
DEMO
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
const regex = /\s*([^=\n\r]*)=\s*([^=\n\r]*)\s*(?:,|$)/gm;
const str = `id=3, type=building, builder=Random Name, and Other Person, with help from Final Person
id=4, type=road, name=The longest road name, in the entire world, and universe, forever, built=20190714
id=2, type=store, description=Hardwood Store
id=1, name=bob
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
RegEx Circuit
jex.im visualizes regular expressions:

Yet another way to do it
\s*([^,=]*?)\s*=\s*((?:(?![^,=]*=)[\S\s])*)(?=[=,]|$)
https://regex101.com/r/J6SSGr/1
Readable version
\s*
( [^,=]*? ) # (1), Key
\s* = \s* # =
( # (2 start), Value
(?:
(?! [^,=]* = )
[\S\s]
)*
) # (2 end)
(?= [=,] | $ )
Ultimate PCRE version
\s*([^,=]*?)\s*=\s*((?:(?!\s*[^,=]*=)[\S\s])*(?<![,\s]))\s*(?=[=,\s]|$)
https://regex101.com/r/slfMR1/1
\s* # Wsp trim
( [^,=]*? ) # (1), Key
\s* = \s* # Wsp trim = Wsp trim
( # (2 start), Value
(?:
(?! \s* [^,=]* = )
[\S\s]
)*
(?<! [,\s] ) # Wsp trim
) # (2 end)
\s* # Wsp trim
(?= [=,\s] | $ ) # Field seperator

Related

Javascript regex to find two characters between two delimitators

EDITED
I need to find two characters between '[' ']' and '/' '/' using Javascript.
I am using this regex:
([^.][/[string]]|\/string\/)|(\[(string))|(\/(string))| ((string)\])|((string)\/)
that gets two charactes but gets too one character.
The question is, how can I do to get just two characters?
Also I want to get exactly the two characters inside the string, I mean not just only the exact match.
Eg.
User input: dz
It must to find just exact matches that contains "dz", e.g. --> "dzone" but not "dazone". Currently I am getting matches with both strings, "dzone" and "dazone".
Demo: https://regex101.com/r/FEs6ib/1
You could optionally repeat any char except the delimiters between the delimiters them selves, and capture in a group what you want to keep.
If you want multiple matches for /dzone/dzone/ you could assert the last delimiter to the right instead of matching it.
The matches are in group 1 or group 2 where you can check for if they exist.
\/[^\/]*(dz)[^\/]*(?=\/)|\[[^\][]*(dz)[^\][]*(?=])
The pattern matches:
\/ Match /
[^\/]*(dz)[^\/]* Capture dz in group 1 between optional chars other than /
(?=\/) Positive lookahead, assert / to the right
| Or
\[ Match [
[^\][]*(dz)[^\][]* Capture dz in group 2 between optional chars other than [ and ]
-(?=]) Positive lookahead, assert ] to the right
Regex demo
This will match 1 occurrence of dz in the word. If you want to match the whole word, the capture group can be broadened to before and after the negated character class like:
\/([^\/]*dz[^\/]*)(?=\/)|\[([^\][]*dz[^\][]*)(?=])
Regex demo
const regex = /\/[^\/]*(dz)[^\/]*(?=\/)|\[[^\][]*(dz)[^\][]*(?=])/g;
[
"[dzone]",
"/dzone/",
"/dzone/dzone/",
"/testdztest/",
"[dazone]",
"/dazone/",
"dzone",
"dazone"
].forEach(s =>
console.log(
`${s} --> ${Array.from(s.matchAll(regex), m => m[2] ? m[2] : m[1])}`
)
);
If supported, you might also match all occurrences of dz between the delimiters using lookarounds with an infinite quantifier:
(?<=\/[^\/]*)dz(?=[^\/]*\/)|(?<=\[[^\][]*)dz(?=[^\][]*])
Regex demo
const regex = /(?<=\/[^\/]*)dz(?=[^\/]*\/)|(?<=\[[^\][]*)dz(?=[^\][]*])/g;
[
"[adzadzone]",
"[dzone]",
"/dzone/",
"/dzone/dzone/",
"/testdztest/",
"[dazone]",
"/dazone/",
"dzone",
"dazone"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(`${s} --> ${s.match(regex)}`);
}
});

javascript multiple regex matches

Given the string below
[NeMo (PROD)] 10.10.100.100 (EFA-B-3) [Brocade FC-Switch ] Sensor:
Power Supply #1 (SNMP Custom Table) Down (No Such Name (SNMP error #
2))
I try to get multiple matches to extract the following values:
var system = "PROD";
var ip = "10.10.100.100";
var location = "EFA-B-3";
var device = "Brocade FC-Switch";
var sensor = "Sensor: Power Supply #1";
var sensorArt = "SNMP Custom Table";
var sensorState = "Down";
var errorMsg = "No Such Name (SNMP error # 2)";
Since I am a beginner with regex I tried to define some "rules":
Extract first value within the first round brackets e.g PROD
Extract the value between the first closing square bracket and
second opening round bracket e.g. 10.10.100.100
Extract the value within the second round brackets e.g EFA-B-3
Extract the value within the second square brackets e.g. Brocade
FC-Switch
Extract the value between the second closing square bracket and the
third opening round bracket e.g. Sensor: Power Supply #1
Extract the value given within the third round brackets e.g. SNMP
Custom Table
Extract the value between the third closing round bracket and the
fourth opening round bracket e.g. Down
Extract the value within the fourth round brackets e.g. No Such Name
(SNMP error # 2)
Using the webpage https://scriptular.com/ I tried to achieve my goal.
So far I managed to build the regex
(?=(([^)]+)))
which gives me my first match (rule 1). Somehow I fail to declare the regex to look between the brackets. What am I missing?
Since there is no way to define separators, the only way is to match the parts and capture them separately.
/\(([^()]+)\)]\s*(.*?)\s*\(([^()]*)\)\s*\[([^\][]*)]\s*(.*?)\s*\(([^()]+)\)\s*(.*?)\s*\((.*)\)/
See the regex demo.
Details
\( - a ( char
([^()]+) - Group 1: 1 or more chars other than ( and )
\)]\s* - )] and 0+ whitespaces
(.*?) - Group 2: any 0+ chars other than line break chars, as few as possible
\s*\( - 0+ whitespaces, (
([^()]*) - Group 3: 1 or more chars other than ( and )
\)\s*\[ - ), 0+ whitespaces, [
([^\][]*) - Group 4: 1 or more chars other than [ and ]
]\s* - ] and 0+ whitespaces
(.*?) - Group 5: any 0+ chars other than line break chars, as few as possible
\s*\( - 0+ whitespaces, (
([^()]+) - Group 6: 1 or more chars other than ( and )
\)\s* - ) and 0+ whitespaces
(.*?) - Group 7: any 0+ chars other than line break chars, as few as possible
\s*\( - 0+ whitespaces and (
(.*) - Group 8: any 0+ chars other than line break chars, as many as possible
\) - ) char.
ES6+ code snippet:
var s = "[NeMo (PROD)] 10.10.100.100 (EFA-B-3) [Brocade FC-Switch ] Sensor: Power Supply #1 (SNMP Custom Table) Down (No Such Name (SNMP error # 2))";
let [_, system, ip, location1, device, sensor, sensorArt, sensorState, errorMsg] = s.match(/\(([^()]+)\)]\s*(.*?)\s*\(([^()]*)\)\s*\[([^\][]*)]\s*(.*?)\s*\(([^()]+)\)\s*(.*?)\s*\((.*)\)/);
console.log(`System=${system}\nIP=${ip}\nLocation=${location1}\nDevice=${device}\nSensor=${sensor}\nSensorArt=${sensorArt}\nSensorState=${sensorState}\nErrorMsg=${errorMsg}`);
ES5:
var s = "[NeMo (PROD)] 10.10.100.100 (EFA-B-3) [Brocade FC-Switch ] Sensor: Power Supply #1 (SNMP Custom Table) Down (No Such Name (SNMP error # 2))";
var system, ip, location1, device, sensor, sensorArt, sensorState, errorMsg;
var rx = /\(([^()]+)\)]\s*(.*?)\s*\(([^()]*)\)\s*\[([^\][]*)]\s*(.*?)\s*\(([^()]+)\)\s*(.*?)\s*\((.*)\)/;
if (m = s.match(rx)) {
system = m[1];
ip = m[2];
location1=m[3];
device=m[4];
sensor=m[5];
sensorArt=m[6];
sensorState=m[7];
errorMsg=m[8];
}
console.log("System="+system+"\nIP="+ip+"\nLocation="+location1+"\nDevice="+device+"\nSensor="+sensor+"\nSensorArt="+sensorArt+"\nSensorState="+sensorState+"\nErrorMsg="+errorMsg);

How can I program a kind of escape character myself in this regular expression?

I want to implement a function that outputs the respective strings as an array from an input string like "str1|str2#str3":
function myFunc(string) { ... }
For the input string, however, it is only necessary that str1 is present. str2 and str3 (with their delimiters) are both optional. For that I have already written a regular expression that performs a kind of split. I can not do a (normal) split because the delimiters are different characters and also the order of str1, str2, and str3 is important. This works kinda with my regex pattern. Now, I'm struggling how to extend this pattern so that you can escape the two delimiters by using \| or \#.
How exactly can I solve this best?
var strings = [
'meaning',
'meaning|description',
'meaning#id',
'meaning|description#id',
'|description',
'|description#id',
'#id',
'meaning#id|description',
'sub1\\|sub2',
'mea\\|ning|descri\\#ption',
'mea\\#ning#id',
'meaning|description#identific\\|\\#ation'
];
var pattern = /^(\w+)(?:\|(\w*))?(?:\#(\w*))?$/ // works without escaping
console.log(pattern.exec(strings[3]));
Accordingly to the problem definition, strings 0-3 and 8-11 should be valid and the rest not. myFunc(strings[3]) and should return ['meaning','description','id'] and myFunc(strings[8]) should return [sub1\|sub2,null,null]
You need to allow \\[|#] alognside the \w in the pattern replacing your \w with (?:\\[#|]|\w) pattern:
var strings = [
'meaning',
'meaning|description',
'meaning#id',
'meaning|description#id',
'|description',
'|description#id',
'#id',
'meaning#id|description',
'sub1\\|sub2',
'mea\\|ning|descri\\#ption',
'mea\\#ning#id',
'meaning|description#identific\\|\\#ation'
];
var pattern = /^((?:\\[#|]|\w)+)(?:\|((?:\\[#|]|\w)*))?(?:#((?:\\[#|]|\w)*))?$/;
for (var s of strings) {
if (pattern.test(s)) {
console.log(s, "=> MATCHES");
} else {
console.log(s, "=> FAIL");
}
}
Pattern details
^ - string start
((?:\\[#|]|\w)+) - Group 1: 1 or more repetitions of \ followed with # or | or a word char
(?:\|((?:\\[#|]|\w)*))? - an optional group matching 1 or 0 occurrences of
\| - a | char
((?:\\[#|]|\w)*) - Group 2: 0 or more repetitions of \ followed with # or | or a word char
(?:#((?:\\[#|]|\w)*))? - an optional group matching 1 or 0 occurrences of
# - a # char
((?:\\[#|]|\w)*) Group 3: 0 or more repetitions of \ followed with # or | or a word char
$ - end of string.
My guess is that you wish to split all your strings, for which we'd be adding those delimiters in a char class maybe, similar to:
([|#\\]+)?([\w]+)
If we don't, we might want to do so for validations, otherwise our validation would become very complicated as the combinations would increase.
const regex = /([|#\\]+)?([\w]+)/gm;
const str = `meaning
meaning|description
meaning#id
meaning|description#id
|description
|description#id
#id
meaning#id|description
sub1\\|sub2
mea\\|ning|descri\\#ption
mea\\#ning#id
meaning|description#identific\\|\\#ation`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
Demo
Seems like what you're looking for may be this?
((?:\\#|\\\||[^\|#])*)*
Explanation:
Matches all sets that include "\#", "\|", or any character except "#" and "|".
https://regexr.com/4fr68

Filter version number from string in javascript?

I found some threads about extracting version number from a string on here but none that does exactly what I want.
How can I filter out the following version numbers from a string with javascript/regex?
Title_v1_1.00.mov filters 1
v.1.0.1-Title.mp3 filters 1.0.1
Title V.3.4A. filters 3.4A
V3.0.4b mix v2 filters 3.0.4b
So look for the first occurrence of: "v" or "v." followed by a digit, followed by digits, letters or dots until either the end of the string or until a whitepace occurs or until a dot (.) occurs with no digit after it.
As per the comments, to match the first version number in the string you could use a capturing group:
^.*?v\.?(\d+(?:\.\d+[a-z]?)*)
Regex demo
That will match:
^ Assert the start of the string
.*? Match 0+ any character non greedy
v\.? Match v followed by an optional dot
( Capturing group
\d+ Match 1+ digits
(?: Non capturing group
\.\d+[a-z]? Match a dot, 1+ digits followed by an optional character a-z
)* Close non capturing group and repeat 0+ times
) Close capturing group
If the character like A in V.3.4A can only be in the last part, you could use:
^.*?v\.?(\d+(?:\.\d+)*[a-z]?)
const strings = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let pattern = /^.*?v\.?(\d+(?:\.\d+[a-z]?)*)/i;
strings.forEach((s) => {
console.log(s.match(pattern)[1]);
});
Details:
v - character "v"
(?:\.)? - matches 1 or 0 repetition of "."
Version capturing group
[0-9a-z\.]* - Matches alphanumeric and "." character
[0-9a-z] - ensures that version number don't ends with "."
You can use RegExp.exec() method to extract matches from string one by one.
const regex = /v(?:\.?)([0-9a-z\.]*[0-9a-z]).*/gi;
let str = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let versions = [];
let v; // variable to store match
for(let i = 0; i < str.length; i++) {
// Executes a check on str[i] to get the result of first capturing group i.e., our version number
if( (v = regex.exec(str[i])) !== null)
versions.push(v[1]); // appends the version number to the array
// If not found, then it checks again if there is a match present or not
else if(str[i].match(regex) !== null)
i--; // if match found then it loops over the same string again
}
console.log(versions);
var test = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b",
];
console.log(test.map(function (a) {
return a.match(/v\.?([0-9a-z]+(?:\.[0-9a-z]+)*)/i)[1];
}));
Explanation:
/ # regex delimiter
v # letter v
\.? # optional dot
( # start group 1, it will contain the version number
[0-9a-z]+ # 1 or more alphanumeric
(?: # start non capture group
\. # a dot
[0-9a-z]+ # 1 or more alphanumeric
)* # end group, may appear 0 or more times
) # end group 1
/i # regex delimiter and flag case insensitive

Regex match the prefix and suffix of strings ending with numbers

Using a regex I would like to find out the prefix and suffix of string like these:
T12231 should match ['T', '12231']
Acw2123 should match ['Acw', '2123']
121Ab should match ['121ab', null]
1213 should match [null, '1213']
Matching only the numbers at the end of the string is easily done with this regex /([0-9]+)$/g.
Matching everything from the beginning of the string up to this point I did not manage to do. The closest I got was for the 1st group to match everything but the last number with /^(.*)([0-9]+)$/g.
You can make the first capture group lazy, .*? so it matches as short as possible, i.e, make the second capture group as long as possible:
var s = ["T12231", "Acw2123", "121Ab", "1213"];
console.log(
s.map(x => x.replace(/^(.*?)([0-9]*)$/, "$1 $2"))
);
Push the split result into an array:
var s = ["T12231", "Acw2123", "121Ab", "1213"];
var arr = [];
s.forEach(x => x.replace(/^(.*?)([0-9]*)$/, (string, $1, $2) => arr.push([$1, $2])));
console.log(arr);
You are almost right. Try using this:
var re = /^(.*?)(\d+.*)$/g;
var groups = re.exec(your_string)
Satisfies all cases
^(?=\d|.*\d$)((?:(?!\d+$).)*)(\d*)$
https://regex101.com/r/BWwsIA/1
^ # BOS
(?= \d | .* \d $ ) # Must begin or end with digit
( # (1 start)
(?: # Cluster begin
(?! \d+ $ ) # Not digits then end
. # Any char
)* # Cluster end, 0 to many times
) # (1 end)
( \d* ) # (2)
$ # EOS

Categories