Javascript RegExp non-capturing groups - javascript

I am writing a set of RegExps to translate a CSS selector into arrays of ids and classes.
For example, I would like '#foo#bar' to return ['foo', 'bar'].
I have been trying to achieve this with
"#foo#bar".match(/((?:#)[a-zA-Z0-9\-_]*)/g)
but it returns ['#foo', '#bar'], when the non-capturing prefix ?: should ignore the # character.
Is there a better solution than slicing each one of the returned strings?

You could use .replace() or .exec() in a loop to build an Array.
With .replace():
var arr = [];
"#foo#bar".replace(/#([a-zA-Z0-9\-_]*)/g, function(s, g1) {
arr.push(g1);
});
With .exec():
var arr = [],
s = "#foo#bar",
re = /#([a-zA-Z0-9\-_]*)/g,
item;
while (item = re.exec(s))
arr.push(item[1]);

It matches #foo and #bar because the outer group (#1) is capturing. The inner group (#2) is not, but that' probably not what you are checking.
If you were not using global matching mode, an immediate fix would be to use (/(?:#)([a-zA-Z0-9\-_]*)/ instead.
With global matching mode the result cannot be had in just one line because match behaves differently. Using regular expression only (i.e. no string operations) you would need to do it this way:
var re = /(?:#)([a-zA-Z0-9\-_]*)/g;
var matches = [], match;
while (match = re.exec("#foo#bar")) {
matches.push(match[1]);
}
See it in action.

I'm not sure if you can do that using match(), but you can do it by using the RegExp's exec() method:
var pattern = new RegExp('#([a-zA-Z0-9\-_]+)', 'g');
var matches, ids = [];
while (matches = pattern.exec('#foo#bar')) {
ids.push( matches[1] ); // -> 'foo' and then 'bar'
}

Unfortunately there is no lookbehind assertion in Javascript RegExp, otherwise you could do this:
/(?<=#)[a-zA-Z0-9\-_]*/g
Other than it being added to some new version of Javascript, I think using the split post processing is your best bet.

You can use a negative lookahead assertion:
"#foo#bar".match(/(?!#)[a-zA-Z0-9\-_]+/g); // ["foo", "bar"]

The lookbehind assertion mentioned some years ago by mVChr is added in ECMAScript 2018. This will allow you to do this:
'#foo#bar'.match(/(?<=#)[a-zA-Z0-9\-_]*/g) (returns ["foo", "bar"])
(A negative lookbehind is also possible: use (?<!#) to match any character except for #, without capturing it.)

MDN does document that "Capture groups are ignored when using match() with the global /g flag", and recommends using matchAll(). matchAll() isn't available on Edge or Safari iOS, and you still need to skip the complete match (including the#`).
A simpler solution is to slice off the leading prefix, if you know its length - here, 1 for #.
const results = ('#foo#bar'.match(/#\w+/g) || []).map(s => s.slice(1));
console.log(results);
The [] || ... part is necessary in case there was no match, otherwise match returns null, and null.map won't work.
const results = ('nothing matches'.match(/#\w+/g) || []).map(s => s.slice(1));
console.log(results);

Related

Regexp group not excluding dots

Let's say I have the following string: div.classOneA.classOneB#idOne
Trying to write a regexp which extracts the classes (classOneA, classOneB) from it. I was able to do this but with Lookbehind assertion only.
It looks like this:
'div.classOneA.classOneB#idOne'.match(/(?<=\.)([^.#]+)/g)
> (2) ["classOneA", "classOneB"]
Now I would like to archive this without the lookbehind approach and do not really understand why my solution's not working.
'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
> (2) [".classOneA", ".classOneB"]
Thought that the grouping will solve my problem but all matching item contains the dot as well.
There isn't a good way in Javascript to both match multiple times (/g option) and pick up capture groups (in the parens). Try this:
var input = "div.classOneA.classOneB#idOne";
var regex = /\.([^.#]+)/g;
var matches, output = [];
while (matches = regex.exec(input)) {
output.push(matches[1]);
}
This is because with g modifier you get all matching substrings but not its matching groups (that is as if (...) pairs worked just like (?:...) ones.
You see. Whithout g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/)
[ '.classOneA',
'classOneA',
index: 3,
input: 'div.classOneA.classOneB#idOne',
groups: undefined ]
With g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
[ '.classOneA', '.classOneB' ]
In other words: you obtain all matches but only the whole match (0 item) per each.
There are many solutions:
Use LookBehind assertions as you pointed out yourself.
Fix each result later adding .map(x=>x.replace(/^\./, ""))
Or, if your input structure won't be much more complicated than the example you provide, simply use a cheaper approach:
> 'div.classOneA.classOneB#idOne'.replace(/#.*/, "").split(".").slice(1)
[ 'classOneA', 'classOneB' ]
Use .replace() + callback instead of .match() in order to be able to access capture groups of every match:
const str = 'div.classOneA.classOneB#idOne';
const matches = [];
str.replace(/\.([^.#]+)/g, (...args)=>matches.push(args[1]))
console.log(matches); // [ 'classOneA', 'classOneB' ]
I would recommend the third one (if there aren't other possible inputs that could eventually break it) because it is much more efficient (actual regular expressions are used only once to trim the '#idOne' part).
If you want to expand you regex. you can simply map on results and replace . with empty string
let op = 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
.map(e=> e.replace(/\./g,''))
console.log(op)
If you know you are searching for a text containing class, then you can use something like
'div.classOneA.classOneB#idOne'.match(/class[^.#]+/g)
If the only thing you know is that the text is preceded by a dot, then you must use lookbehind.
This regex will work without lookbehind assertion:
'div.classOneA.classOneB#idOne'.match(/\.[^\.#]+/g).map(item => item.substring(1));
Lookbehind assertion is not available in JavaScript recently.
I'm not an expert on using regex - particularly in Javascript - but after some research on MDN I've figured out why your attempt wasn't working, and how to fix.
The problem is that using .match with a regexp with the /g flag will ignore capturing groups. So instead you have to use the .exec method on the regexp object, using a loop to execute it multiple times to get all the results.
So the following code is what works, and can be adapted for similar cases. (Note the grp[1] - this is because the first element of the array returned by .exec is the entire match, the groups are the subsequent elements.)
var regExp = /\.([^.#]+)/g
var result = [];
var grp;
while ((grp = regExp.exec('div.classOneA.classOneB#idOne')) !== null) {
result.push(grp[1]);
}
console.log(result)

Match regex using array

I made this in C++ and I wanted to convert to JavaScript:
foreach (QString pattern, extensions) {
regex.setPattern(QString("\\.%1").arg(pattern));
regex.setPatternOptions(QRegularExpression::CaseInsensitiveOption);
QRegularExpressionMatch match = regex.match(filename);
if (! match.hasMatch()) continue;
return pattern;
}
It means that foreach extensions (that is an array of extensions) as pattern create a pattern with that to be like: \\.png (for example).
If there's a match it will return the found extension.
I tried to create exactly how I did in C++ but I don't know how to concatenate the returned string from the array to match
const filename = 'example.wutt'
const extensions = ['wutt', 'xnss']
extensions.forEach(pattern => {
const match = filename.match(`\\.${pattern}`)
console.log(match)
})
It does work but it's not case-insensitive as I can't put the i flag.
How can I do that (and if there's a solution using ES6)?
Have a look at How do you use a variable in a regular expression? for building the regex.
If you want to find the extension that matches, you can use Array#find:
const matchedExtension = extensions.find(
ext => new RegExp(String.raw`\.${ext}$`, 'i').test(filename)
);
var extensions = ['png', 'jpeg'];
var filename = 'foo.png';
console.log(extensions.find(
ext => new RegExp(String.raw `\.${ext}$`, 'i').test(filename)
));
Couple of notes:
String.raw is necessary to not treat \. as a string escape sequence but to pass it "as is" to the regular expression engine (alternative you could escape the \, but String.raw is cool).
$ at the end of the pattern ensures that the pattern is only matched at the end of the file name.
If you just want to know whether a pattern matches or not, RegExp#test is the preferred method.
If you are doing this a lot it makes sense to generate an array of regular expressions first (instead of creating the regex every time you call the function).
You can use RegExp constructor with "i" passed as second argument
extensions.forEach(pattern => {
const match = filename.match(new RegExp(`\\.${pattern}$`, "i"));
console.log(match);
})

Recursive regex match

I use /<?=>|[^\s\w]|\w+/g to match
<=>
=>
words/letters
but I also want to match K(a,...) where a can be any word/letter and ... can be anything also matched in this final regex. So it actually has to be recursive.
So the new regex should match
<=>
=>
words/letters
K(a,...)
where ... matches
<=>
=>
words/letters
K(a,...)
and so on...
I am not sure if this is possible.
I am not sure if it might be easier to create a function that walks through each character in a string recursively, which is something like https://en.wikipedia.org/wiki/Recursive_descent_parser
You can use the below regexp to match. Based on the example that you gave the below would work.
/(K\(.+,(.+)\)|<?=>)|\w+\1/g
The regex used in the javascript below doesn't use recursion.
Since that's not available in standard javascript regex.
It takes advantage of the fact that all the ) are at the end.
As in the example string from the comments.
var str = "ps<=>q=>pb=>K(ab,K(b,K(c,p => q))) not)";
var re = /\w+(?:\([^()]+)+[)]+|<?=>|\w+(?=<?=>)/g;
var matchArray = [];
var m;
while (m = re.exec(str)) {
matchArray.push(m[0]);
}
console.log(matchArray);

Javascript - Regex finding multiple parentheses matches

So currently, my code works for inputs that contain one set of parentheses.
var re = /^.*\((.*\)).*$/;
var inPar = userIn.replace(re, '$1');
...meaning when the user enters the chemical formula Cu(NO3)2, alerting inPar returns NO3) , which I want.
However, if Cu(NO3)2(CO2)3 is the input, only CO2) is being returned.
I'm not too knowledgable in RegEx, so why is this happening, and is there a way I could put NO3) and CO2) into an array after they are found?
You want to use String.match instead of String.replace. You'll also want your regex to match multiple strings in parentheses, so you can't have ^ (start of string) and $ (end of string). And we can't be greedy when matching inside the parentheses, so we'll use .*?
Stepping through the changes, we get:
// Use Match
"Cu(NO3)2(CO2)3".match(/^.*\((.*\)).*$/);
["Cu(NO3)2(CO2)3", "CO2)"]
// Lets stop including the ) in our match
"Cu(NO3)2(CO2)3".match(/^.*\((.*)\).*$/);
["Cu(NO3)2(CO2)3", "CO2"]
// Instead of matching the entire string, lets search for just what we want
"Cu(NO3)2(CO2)3".match(/\((.*)\)/);
["(NO3)2(CO2)", "NO3)2(CO2"]
// Oops, we're being a bit too greedy, and capturing everything in a single match
"Cu(NO3)2(CO2)3".match(/\((.*?)\)/);
["(NO3)", "NO3"]
// Looks like we're only searching for a single result. Lets add the Global flag
"Cu(NO3)2(CO2)3".match(/\((.*?)\)/g);
["(NO3)", "(CO2)"]
// Global captures the entire match, and ignore our capture groups, so lets remove them
"Cu(NO3)2(CO2)3".match(/\(.*?\)/g);
["(NO3)", "(CO2)"]
// Now to remove the parentheses. We can use Array.prototype.map for that!
var elements = "Cu(NO3)2(CO2)3".match(/\(.*?\)/g);
elements = elements.map(function(match) { return match.slice(1, -1); })
["NO3", "CO2"]
// And if you want the closing parenthesis as Fabrício Matté mentioned
var elements = "Cu(NO3)2(CO2)3".match(/\(.*?\)/g);
elements = elements.map(function(match) { return match.substr(1); })
["NO3)", "CO2)"]
Your regex has anchors to match beginning and end of the string, so it won't suffice to match multiple occurrences. Updated code using String.match with the RegExp g flag (global modifier):
var userIn = 'Cu(NO3)2(CO2)3';
var inPar = userIn.match(/\([^)]*\)/g).map(function(s){ return s.substr(1); });
inPar; //["NO3)", "CO2)"]
In case you need old IE support: Array.prototype.map polyfill
Or without polyfills:
var userIn = 'Cu(NO3)2(CO2)3';
var inPar = [];
userIn.replace(/\(([^)]*\))/g, function(s, m) { inPar.push(m); });
inPar; //["NO3)", "CO2)"]
Above matches a ( and captures a sequence of zero or more non-) characters, followed by a ) and pushes it to the inPar array.
The first regex does essentially the same, but uses the entire match including the opening ( parenthesis (which is later removed by mapping the array) instead of a capturing group.
From the question I assume the closing ) parenthesis is expected to be in the resulting strings, otherwise here are the updated solutions without the closing parenthesis:
For the first solution (using s.slice(1, -1)):
var inPar = userIn.match(/\([^)]*\)/g).map(function(s){ return s.slice(1, -1);});
For the second solution (\) outside of capturing group):
userIn.replace(/\(([^)]*)\)/g, function(s, m) { inPar.push(m); });
You could try the below:
"Cu(NO3)2".match(/(\S\S\d)/gi) // returns NO3
"Cu(NO3)2(CO2)3".match(/(\S\S\d)/gi) // returns NO3 CO2

Can regex matches in javascript match any word after an equal operator?

I am trying to target ?state=wildcard in this statement :
?state=uncompleted&dancing=yes
I would like to target the entire line ?state=uncomplete, but also allow it to find whatever word would be after the = operator. So uncomplete could also be completed, unscheduled, or what have you.
A caveat I am having is granted I could target the wildcard before the ampersand, but what if there is no ampersand and the param state is by itself?
Try this regular expression:
var regex = /\?state=([^&]+)/;
var match = '?state=uncompleted&dancing=yes'.match(regex);
match; // => ["?state=uncompleted", "uncompleted"]
It will match every character after the string "\?state=" except an ampersand, all the way to the end of the string, if necessary.
Alternative regex: /\?state=(.+?)(?:&|$)/
It will match everything up to the first & char or the end of the string
IMHO, you don't need regex here. As we all know, regexes tend to be slow, especially when using look aheads. Why not do something like this:
var URI = '?state=done&user=ME'.split('&');
var passedVals = [];
This gives us ['?state=done','user=ME'], now just do a for loop:
for (var i=0;i<URI.length;i++)
{
passedVals.push(URI[i].split('=')[1]);
}
Passed Vals wil contain whatever you need. The added benefit of this is that you can parse a request into an Object:
var URI = 'state=done&user=ME'.split('&');
var urlObjects ={};
for (var i=0;i<URI.length;i++)
{
urlObjects[URI[i].split('=')[0]] = URI[i].split('=')[1];
}
I left out the '?' at the start of the string, because a simple .replace('?','') can fix that easily...
You can match as many characters that are not a &. If there aren't any &s at all, that will of course also work:
/(\?state=[^&]+)/.exec("?state=uncompleted");
/(\?state=[^&]+)/.exec("?state=uncompleted&a=1");
// both: ["?state=uncompleted", "?state=uncompleted"]

Categories