EDITED
I need to find two characters between '[' ']' and '/' '/' using Javascript.
I am using this regex:
([^.][/[string]]|\/string\/)|(\[(string))|(\/(string))| ((string)\])|((string)\/)
that gets two charactes but gets too one character.
The question is, how can I do to get just two characters?
Also I want to get exactly the two characters inside the string, I mean not just only the exact match.
Eg.
User input: dz
It must to find just exact matches that contains "dz", e.g. --> "dzone" but not "dazone". Currently I am getting matches with both strings, "dzone" and "dazone".
Demo: https://regex101.com/r/FEs6ib/1
You could optionally repeat any char except the delimiters between the delimiters them selves, and capture in a group what you want to keep.
If you want multiple matches for /dzone/dzone/ you could assert the last delimiter to the right instead of matching it.
The matches are in group 1 or group 2 where you can check for if they exist.
\/[^\/]*(dz)[^\/]*(?=\/)|\[[^\][]*(dz)[^\][]*(?=])
The pattern matches:
\/ Match /
[^\/]*(dz)[^\/]* Capture dz in group 1 between optional chars other than /
(?=\/) Positive lookahead, assert / to the right
| Or
\[ Match [
[^\][]*(dz)[^\][]* Capture dz in group 2 between optional chars other than [ and ]
-(?=]) Positive lookahead, assert ] to the right
Regex demo
This will match 1 occurrence of dz in the word. If you want to match the whole word, the capture group can be broadened to before and after the negated character class like:
\/([^\/]*dz[^\/]*)(?=\/)|\[([^\][]*dz[^\][]*)(?=])
Regex demo
const regex = /\/[^\/]*(dz)[^\/]*(?=\/)|\[[^\][]*(dz)[^\][]*(?=])/g;
[
"[dzone]",
"/dzone/",
"/dzone/dzone/",
"/testdztest/",
"[dazone]",
"/dazone/",
"dzone",
"dazone"
].forEach(s =>
console.log(
`${s} --> ${Array.from(s.matchAll(regex), m => m[2] ? m[2] : m[1])}`
)
);
If supported, you might also match all occurrences of dz between the delimiters using lookarounds with an infinite quantifier:
(?<=\/[^\/]*)dz(?=[^\/]*\/)|(?<=\[[^\][]*)dz(?=[^\][]*])
Regex demo
const regex = /(?<=\/[^\/]*)dz(?=[^\/]*\/)|(?<=\[[^\][]*)dz(?=[^\][]*])/g;
[
"[adzadzone]",
"[dzone]",
"/dzone/",
"/dzone/dzone/",
"/testdztest/",
"[dazone]",
"/dazone/",
"dzone",
"dazone"
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(`${s} --> ${s.match(regex)}`);
}
});
I just used regex101, to create the following regex.
([^,]*?)=(.*?)(?(?=, )(?:, )|(?:$))(?(?=[^,]*?=)(?:(?=[^,]*?=))|(?:$))
It seems to work perfectly for my use case of getting keys and values that are comma separated while still preserving commas in the values.
Problem is, I want to use this Regex in Node.js (JavaScript), but while writing this entire Regex in regex101, I had it set to PCRE (PHP).
It looks like JavaScript doesn't support Conditional Lookaheads ((?(?=...)()|()).
Is there a way to get this working in JavaScript?
Examples:
2 matches
group 1: id, group 2: 1
group 1: name, group 2: bob
id=1, name=bob
3 matches
group 1: id, group 2: 2
group 1: type, group 2: store
group 1: description, group 2: Hardwood Store
id=2, type=store, description=Hardwood Store
4 matches
group 1: id, group 2: 4
group 1: type, group 2: road
group 1: name, group 2: The longest road name, in the entire world, and universe, forever
group 1: built, group 2: 20190714
id=4, type=road, name=The longest road name, in the entire world, and universe, forever, built=20190714
3 matches
group 1: id, group 2: 3
group 1: type, group 2: building
group 1: builder, group 2: Random Name, and Other Person, with help from Final Person
id=3, type=building, builder=Random Name, and Other Person, with help from Final Person
You may use
/([^,=\s][^,=]*)=(.*?)(?=(?:,\s*)?[^,=]*=|$)/g
See the regex demo.
Details
([^,=\s][^,=]*) - Group 1:
[^,=\s] - a char other than ,, = and whitespace
[^,=]* - zero or more chars other than , and =
= - a = char
(.*?) - Group 2: any zero or more chars other than line break chars, as few as possible
(?=(?:,\s*)?[^,=]*=|$) - a positive lookahead that requires an optional sequence of , and 0+ whitespaces and then 0+ chars other than , and = and then a = or end of string immediately to the right of the current location
JS demo:
var strs = ['id=1, name=bob','id=2, type=store, description=Hardwood Store', 'id=4, type=road, name=The longest road name, in the entire world, and universe, forever, built=20190714','id=3, type=building, builder=Random Name, and Other Person, with help from Final Person']
var rx = /([^,=\s][^,=]*)=(.*?)(?=(?:,\s*)?[^,=]*=|$)/g;
for (var s of strs) {
console.log("STRING:", s);
var m;
while (m=rx.exec(s)) {
console.log(m[1], m[2])
}
}
Maybe, these expressions would be somewhat close to what you might want to design:
([^=\n\r]*)=\s*([^=\n\r]*)\s*(?:,|$)
or
\s*([^=\n\r]*)=\s*([^=\n\r]*)\s*(?:,|$)
not sure though.
DEMO
The expression is explained on the top right panel of this demo if you wish to explore/simplify/modify it.
const regex = /\s*([^=\n\r]*)=\s*([^=\n\r]*)\s*(?:,|$)/gm;
const str = `id=3, type=building, builder=Random Name, and Other Person, with help from Final Person
id=4, type=road, name=The longest road name, in the entire world, and universe, forever, built=20190714
id=2, type=store, description=Hardwood Store
id=1, name=bob
`;
let m;
while ((m = regex.exec(str)) !== null) {
// This is necessary to avoid infinite loops with zero-width matches
if (m.index === regex.lastIndex) {
regex.lastIndex++;
}
// The result can be accessed through the `m`-variable.
m.forEach((match, groupIndex) => {
console.log(`Found match, group ${groupIndex}: ${match}`);
});
}
RegEx Circuit
jex.im visualizes regular expressions:
Yet another way to do it
\s*([^,=]*?)\s*=\s*((?:(?![^,=]*=)[\S\s])*)(?=[=,]|$)
https://regex101.com/r/J6SSGr/1
Readable version
\s*
( [^,=]*? ) # (1), Key
\s* = \s* # =
( # (2 start), Value
(?:
(?! [^,=]* = )
[\S\s]
)*
) # (2 end)
(?= [=,] | $ )
Ultimate PCRE version
\s*([^,=]*?)\s*=\s*((?:(?!\s*[^,=]*=)[\S\s])*(?<![,\s]))\s*(?=[=,\s]|$)
https://regex101.com/r/slfMR1/1
\s* # Wsp trim
( [^,=]*? ) # (1), Key
\s* = \s* # Wsp trim = Wsp trim
( # (2 start), Value
(?:
(?! \s* [^,=]* = )
[\S\s]
)*
(?<! [,\s] ) # Wsp trim
) # (2 end)
\s* # Wsp trim
(?= [=,\s] | $ ) # Field seperator
I found some threads about extracting version number from a string on here but none that does exactly what I want.
How can I filter out the following version numbers from a string with javascript/regex?
Title_v1_1.00.mov filters 1
v.1.0.1-Title.mp3 filters 1.0.1
Title V.3.4A. filters 3.4A
V3.0.4b mix v2 filters 3.0.4b
So look for the first occurrence of: "v" or "v." followed by a digit, followed by digits, letters or dots until either the end of the string or until a whitepace occurs or until a dot (.) occurs with no digit after it.
As per the comments, to match the first version number in the string you could use a capturing group:
^.*?v\.?(\d+(?:\.\d+[a-z]?)*)
Regex demo
That will match:
^ Assert the start of the string
.*? Match 0+ any character non greedy
v\.? Match v followed by an optional dot
( Capturing group
\d+ Match 1+ digits
(?: Non capturing group
\.\d+[a-z]? Match a dot, 1+ digits followed by an optional character a-z
)* Close non capturing group and repeat 0+ times
) Close capturing group
If the character like A in V.3.4A can only be in the last part, you could use:
^.*?v\.?(\d+(?:\.\d+)*[a-z]?)
const strings = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let pattern = /^.*?v\.?(\d+(?:\.\d+[a-z]?)*)/i;
strings.forEach((s) => {
console.log(s.match(pattern)[1]);
});
Details:
v - character "v"
(?:\.)? - matches 1 or 0 repetition of "."
Version capturing group
[0-9a-z\.]* - Matches alphanumeric and "." character
[0-9a-z] - ensures that version number don't ends with "."
You can use RegExp.exec() method to extract matches from string one by one.
const regex = /v(?:\.?)([0-9a-z\.]*[0-9a-z]).*/gi;
let str = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b"
];
let versions = [];
let v; // variable to store match
for(let i = 0; i < str.length; i++) {
// Executes a check on str[i] to get the result of first capturing group i.e., our version number
if( (v = regex.exec(str[i])) !== null)
versions.push(v[1]); // appends the version number to the array
// If not found, then it checks again if there is a match present or not
else if(str[i].match(regex) !== null)
i--; // if match found then it loops over the same string again
}
console.log(versions);
var test = [
"Title_v1_1.00.mov filters 1",
"v.1.0.1-Title.mp3 filters 1.0.1",
"Title V.3.4A. filters 3.4A",
"V3.0.4b mix v2 filters 3.0.4b",
];
console.log(test.map(function (a) {
return a.match(/v\.?([0-9a-z]+(?:\.[0-9a-z]+)*)/i)[1];
}));
Explanation:
/ # regex delimiter
v # letter v
\.? # optional dot
( # start group 1, it will contain the version number
[0-9a-z]+ # 1 or more alphanumeric
(?: # start non capture group
\. # a dot
[0-9a-z]+ # 1 or more alphanumeric
)* # end group, may appear 0 or more times
) # end group 1
/i # regex delimiter and flag case insensitive
Criteria:
any word that start with a and end with b having middle char digit. this word should not be on the line which start with char '#'
Given string:
a1b a2b a3b
#a4b a5b a6b
a7b a8b a9b
Expected output:
a1b
a2b
a3b
a7b
a8b
a9b
regex: ?i need it for javascipt.
So far tried below thing:
var text_content =above_mention_content
var reg_exp = /^[^#]?a[0-9]b/gmi;
var matched_text = text_content.match(reg_exp);
console.log(matched_text);
Getting below output:
[ 'a1b', ' a7b' ]
Your /^[^#]?a[0-9]b/gmi will match multiple occurrences of the pattern matching the start of line, then 1 or 0 chars other than #, then a, digit and b. No checking for a whole word, nor actually matching words farther than at the beginning of a string.
You may use a regex that will match lines starting with # and match and capture the words you need in other contexts:
var s = "a1b a2b a3b\n#a4b a5b a6b\n a7b a8b a9b";
var res = [];
s.replace(/^[^\S\r\n]*#.*|\b(a\db)\b/gm, function($0,$1) {
if ($1) res.push($1);
});
console.log(res);
Pattern details:
^ - start of a line (as m multiline modifier makes ^ match the line start)
[^\S\r\n]* - 0+ horizontal whitespaces
#.* - a # and any 0+ chars up to the end of a line
| - or
\b - a leading word boundary
(a\db) - Group 1 capturing a, a digit, a b
\b - a trailing word boundary.
Inside the replace() method, a callback is used where the res array is populated with the contents of Group 1 only.
I would suggest to use 2 reg ex:
First Reg ex fetches the non-hashed lines:
^[^#][a\db\s]+
and then another reg ex for fetching individual words(from each line):
^a\db\s
When I parse Amazon products I get this such of string.
"#19 in Home Improvements (See top 100)"
I figured how to retrieve BSR number which is /#\d*/
But have no idea how to retrieve Category which is going after in and end until brackets (See top 100).
I suggest
#(\d+)\s+in\s+([^(]+?)\s*\(
See the regex demo
var re = /#(\d+)\s+in\s+([^(]+?)\s*\(/;
var str = '#19 in Home Improvements (See top 100)';
var m = re.exec(str);
if (m) {
console.log(m[1]);
console.log(m[2]);
}
Pattern details:
# - a hash
(\d+) - Group 1 capturing 1 or more digits
\s+in\s+ - in enclosed with 1 or more whitespaces
([^(]+?) - Group 2 capturing 1 or more chars other than ( as few as possible before th first...
\s*\( - 0+ whitespaces and a literal (.