I just used regex101, to create the following regex.
([^,]*?)=(.*?)(?(?=, )(?:, )|(?:$))(?(?=[^,]*?=)(?:(?=[^,]*?=))|(?:$))
It seems to work perfectly for my use case of getting keys and values that are comma separated while still preserving commas in the values.
Problem is, I want to use this Regex in Node.js (JavaScript), but while writing this entire Regex in regex101, I had it set to PCRE (PHP).
It looks like JavaScript doesn't support Conditional Lookaheads ((?(?=...)()|()).
Is there a way to get this working in JavaScript?
Examples:
2 matches
group 1: id, group 2: 1
group 1: name, group 2: bob
id=1, name=bob
3 matches
group 1: id, group 2: 2
group 1: type, group 2: store
group 1: description, group 2: Hardwood Store
id=2, type=store, description=Hardwood Store
4 matches
group 1: id, group 2: 4
group 1: type, group 2: road
group 1: name, group 2: The longest road name, in the entire world, and universe, forever
group 1: built, group 2: 20190714
id=4, type=road, name=The longest road name, in the entire world, and universe, forever, built=20190714
3 matches
group 1: id, group 2: 3
group 1: type, group 2: building
group 1: builder, group 2: Random Name, and Other Person, with help from Final Person
id=3, type=building, builder=Random Name, and Other Person, with help from Final Person
You may use
/([^,=\s][^,=]*)=(.*?)(?=(?:,\s*)?[^,=]*=|$)/g
See the regex demo.
Details
([^,=\s][^,=]*) - Group 1:
[^,=\s] - a char other than ,, = and whitespace
[^,=]* - zero or more chars other than , and =
= - a = char
(.*?) - Group 2: any zero or more chars other than line break chars, as few as possible
(?=(?:,\s*)?[^,=]*=|$) - a positive lookahead that requires an optional sequence of , and 0+ whitespaces and then 0+ chars other than , and = and then a = or end of string immediately to the right of the current location
JS demo:
var strs = ['id=1, name=bob','id=2, type=store, description=Hardwood Store', 'id=4, type=road, name=The longest road name, in the entire world, and universe, forever, built=20190714','id=3, type=building, builder=Random Name, and Other Person, with help from Final Person']
var rx = /([^,=\s][^,=]*)=(.*?)(?=(?:,\s*)?[^,=]*=|$)/g;
for (var s of strs) {
console.log("STRING:", s);
var m;
while (m=rx.exec(s)) {
console.log(m[1], m[2])
}
}
Maybe, these expressions would be somewhat close to what you might want to design:
([^=\n\r]*)=\s*([^=\n\r]*)\s*(?:,|$)
or
\s*([^=\n\r]*)=\s*([^=\n\r]*)\s*(?:,|$)
not sure though.
DEMO
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
const regex = /\s*([^=\n\r]*)=\s*([^=\n\r]*)\s*(?:,|$)/gm;
const str = `id=3, type=building, builder=Random Name, and Other Person, with help from Final Person
id=4, type=road, name=The longest road name, in the entire world, and universe, forever, built=20190714
id=2, type=store, description=Hardwood Store
id=1, name=bob
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
RegEx Circuit
jex.im visualizes regular expressions:
Yet another way to do it
\s*([^,=]*?)\s*=\s*((?:(?![^,=]*=)[\S\s])*)(?=[=,]|$)
https://regex101.com/r/J6SSGr/1
Readable version
\s*
( [^,=]*? ) # (1), Key
\s* = \s* # =
( # (2 start), Value
(?:
(?! [^,=]* = )
[\S\s]
)*
) # (2 end)
(?= [=,] | $ )
Ultimate PCRE version
\s*([^,=]*?)\s*=\s*((?:(?!\s*[^,=]*=)[\S\s])*(?<![,\s]))\s*(?=[=,\s]|$)
https://regex101.com/r/slfMR1/1
\s* # Wsp trim
( [^,=]*? ) # (1), Key
\s* = \s* # Wsp trim = Wsp trim
( # (2 start), Value
(?:
(?! \s* [^,=]* = )
[\S\s]
)*
(?<! [,\s] ) # Wsp trim
) # (2 end)
\s* # Wsp trim
(?= [=,\s] | $ ) # Field seperator
I found some threads about extracting version number from a string on here but none that does exactly what I want.
How can I filter out the following version numbers from a string with javascript/regex?
Title_v1_1.00.mov filters 1
v.1.0.1-Title.mp3 filters 1.0.1
Title V.3.4A. filters 3.4A
V3.0.4b mix v2 filters 3.0.4b
So look for the first occurrence of: "v" or "v." followed by a digit, followed by digits, letters or dots until either the end of the string or until a whitepace occurs or until a dot (.) occurs with no digit after it.
As per the comments, to match the first version number in the string you could use a capturing group:
^.*?v\.?(\d+(?:\.\d+[a-z]?)*)
Regex demo
That will match:
^ Assert the start of the string
.*? Match 0+ any character non greedy
v\.? Match v followed by an optional dot
( Capturing group
\d+ Match 1+ digits
(?: Non capturing group
\.\d+[a-z]? Match a dot, 1+ digits followed by an optional character a-z
)* Close non capturing group and repeat 0+ times
) Close capturing group
If the character like A in V.3.4A can only be in the last part, you could use:
^.*?v\.?(\d+(?:\.\d+)*[a-z]?)
const strings = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let pattern = /^.*?v\.?(\d+(?:\.\d+[a-z]?)*)/i;
strings.forEach((s) => {
console.log(s.match(pattern)[1]);
});
Details:
v - character "v"
(?:\.)? - matches 1 or 0 repetition of "."
Version capturing group
[0-9a-z\.]* - Matches alphanumeric and "." character
[0-9a-z] - ensures that version number don't ends with "."
You can use RegExp.exec() method to extract matches from string one by one.
const regex = /v(?:\.?)([0-9a-z\.]*[0-9a-z]).*/gi;
let str = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let versions = [];
let v; // variable to store match
for(let i = 0; i < str.length; i++) {
// Executes a check on str[i] to get the result of first capturing group i.e., our version number
if( (v = regex.exec(str[i])) !== null)
versions.push(v[1]); // appends the version number to the array
// If not found, then it checks again if there is a match present or not
else if(str[i].match(regex) !== null)
i--; // if match found then it loops over the same string again
}
console.log(versions);
var test = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b",
];
console.log(test.map(function (a) {
return a.match(/v\.?([0-9a-z]+(?:\.[0-9a-z]+)*)/i)[1];
}));
Explanation:
/ # regex delimiter
v # letter v
\.? # optional dot
( # start group 1, it will contain the version number
[0-9a-z]+ # 1 or more alphanumeric
(?: # start non capture group
\. # a dot
[0-9a-z]+ # 1 or more alphanumeric
)* # end group, may appear 0 or more times
) # end group 1
/i # regex delimiter and flag case insensitive
How do I search a string in Javascript for all instances of [SM_g]randomword[SM_h]. and replace them all with [SM_g]randomword.[SM_h].
I'd also like to do this with commas. For example - I need to be able to turn [SM_g]randomword[SM_h], into [SM_g]randomword,[SM_h].
Here is my code so far:
const periodRegex = /\[SM_g](.*?)\[SM_h](?:[.])/g;
string_afterregex - string_initial.replace(periodRegex, /*I don't know what goes in here*/)
Capture the patterns and then reorder with back reference:
const periodRegex = /(\[SM_g].*?)(\[SM_h])([.,])/g;
var string_initial = '[SM_g]randomword[SM_h].'
var string_afterregex = string_initial.replace(periodRegex, "$1$3$2");
console.log(string_afterregex);
var string_initial = '[SM_g]randomword[SM_h],'
console.log(string_initial.replace(periodRegex, "$1$3$2"))
You don't have to use multiple regexes to achieve what you want if you benefit from lookaheads:
const re = /\[SM_g](?=(((?:(?!\[SM_h]).)*)\[SM_h])([.,]))\1./g;
const str = '[SM_g]$2$3[SM_h]';
console.log("[SM_g]WORD[SM_h], [SM_g]WORD[SM_h]. [SM_g]WORD[SM_h]".replace(re, str));
Breakdown:
\[SM_g] Match [SM_g] literally
(?= Start of a positive lookahead
( Start of capturing group #1
( Start of CG #2
(?: Start of NCG #1, tempered dot
(?!\[SM_h]). Match next character without skipping over a [SM_h]
)* End of NCG #1, repeat as much as possible
) End of CG #2
\[SM_h] Match [SM_h] literally
) End of CG #2
( Start of CG #3
[.,] Match a comma or a period
) End of CG #4
) End of positive lookahead
\1. Match what's matched by CG #1 then a character
Criteria:
any word that start with a and end with b having middle char digit. this word should not be on the line which start with char '#'
Given string:
a1b a2b a3b
#a4b a5b a6b
a7b a8b a9b
Expected output:
a1b
a2b
a3b
a7b
a8b
a9b
regex: ?i need it for javascipt.
So far tried below thing:
var text_content =above_mention_content
var reg_exp = /^[^#]?a[0-9]b/gmi;
var matched_text = text_content.match(reg_exp);
console.log(matched_text);
Getting below output:
[ 'a1b', ' a7b' ]
Your /^[^#]?a[0-9]b/gmi will match multiple occurrences of the pattern matching the start of line, then 1 or 0 chars other than #, then a, digit and b. No checking for a whole word, nor actually matching words farther than at the beginning of a string.
You may use a regex that will match lines starting with # and match and capture the words you need in other contexts:
var s = "a1b a2b a3b\n#a4b a5b a6b\n a7b a8b a9b";
var res = [];
s.replace(/^[^\S\r\n]*#.*|\b(a\db)\b/gm, function($0,$1) {
if ($1) res.push($1);
});
console.log(res);
Pattern details:
^ - start of a line (as m multiline modifier makes ^ match the line start)
[^\S\r\n]* - 0+ horizontal whitespaces
#.* - a # and any 0+ chars up to the end of a line
| - or
\b - a leading word boundary
(a\db) - Group 1 capturing a, a digit, a b
\b - a trailing word boundary.
Inside the replace() method, a callback is used where the res array is populated with the contents of Group 1 only.
I would suggest to use 2 reg ex:
First Reg ex fetches the non-hashed lines:
^[^#][a\db\s]+
and then another reg ex for fetching individual words(from each line):
^a\db\s
I'm trying to match all the words starting with # and words between 2 # (see example)
var str = "#The test# rain in #SPAIN stays mainly in the #plain";
var res = str.match(/(#)[^\s]+/gi);
The result will be ["#The", "#SPAIN", "#plain"] but it should be ["#The test#", "#SPAIN", "#plain"]
Extra: would be nice if the result would be without the #.
Does anyone has a solution for this?
You can use
/#\w+(?:(?: +\w+)*#)?/g
See the demo here
The regex matches:
# - a hash symbol
\w+ - one or more alphanumeric and underscore characters
(?:(?: +\w+)*#)? - one or zero occurrence of:
(?: +\w+)* - zero or more occurrences of one or more spaces followed with one or more word characters followed with
# - a hash symbol
NOTE: If there can be characters other than word characters (those in the [A-Za-z0-9_] range), you can replace \w with [^ #]:
/#[^ #]+(?:(?: +[^ #]+)*#)?/g
See another demo
var re = /#[^ #]+(?:(?: +[^ #]+)*#)?/g;
var str = '#The test-mode# rain in #SPAIN stays mainly in the #plain #SPAIN has #the test# and more #here';
var m = str.match(re);
if (m) {
// Using ES6 Arrow functions
m = m.map(s => s.replace(/#$/g, ''));
// ES5 Equivalent
/*m = m.map(function(s) {
return s.replace(/#$/g, '');
});*/ // getting rid of the trailing #
document.body.innerHTML = "<pre>" + JSON.stringify(m, 0, 4) + "</pre>";
}
You can also try this regex.
#(?:\b[\s\S]*?\b#|\w+)
(?: opens a non capture group for alternation
\b matches a word boundary
\w matches a word character
[\s\S] matches any character
See demo at regex101 (use with g global flag)