Regexp group not excluding dots - javascript

Let's say I have the following string: div.classOneA.classOneB#idOne
Trying to write a regexp which extracts the classes (classOneA, classOneB) from it. I was able to do this but with Lookbehind assertion only.
It looks like this:
'div.classOneA.classOneB#idOne'.match(/(?<=\.)([^.#]+)/g)
> (2) ["classOneA", "classOneB"]
Now I would like to archive this without the lookbehind approach and do not really understand why my solution's not working.
'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
> (2) [".classOneA", ".classOneB"]
Thought that the grouping will solve my problem but all matching item contains the dot as well.

There isn't a good way in Javascript to both match multiple times (/g option) and pick up capture groups (in the parens). Try this:
var input = "div.classOneA.classOneB#idOne";
var regex = /\.([^.#]+)/g;
var matches, output = [];
while (matches = regex.exec(input)) {
output.push(matches[1]);
}

This is because with g modifier you get all matching substrings but not its matching groups (that is as if (...) pairs worked just like (?:...) ones.
You see. Whithout g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/)
[ '.classOneA',
'classOneA',
index: 3,
input: 'div.classOneA.classOneB#idOne',
groups: undefined ]
With g modifier:
> 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
[ '.classOneA', '.classOneB' ]
In other words: you obtain all matches but only the whole match (0 item) per each.
There are many solutions:
Use LookBehind assertions as you pointed out yourself.
Fix each result later adding .map(x=>x.replace(/^\./, ""))
Or, if your input structure won't be much more complicated than the example you provide, simply use a cheaper approach:
> 'div.classOneA.classOneB#idOne'.replace(/#.*/, "").split(".").slice(1)
[ 'classOneA', 'classOneB' ]
Use .replace() + callback instead of .match() in order to be able to access capture groups of every match:
const str = 'div.classOneA.classOneB#idOne';
const matches = [];
str.replace(/\.([^.#]+)/g, (...args)=>matches.push(args[1]))
console.log(matches); // [ 'classOneA', 'classOneB' ]
I would recommend the third one (if there aren't other possible inputs that could eventually break it) because it is much more efficient (actual regular expressions are used only once to trim the '#idOne' part).

If you want to expand you regex. you can simply map on results and replace . with empty string
let op = 'div.classOneA.classOneB#idOne'.match(/\.([^.#]+)/g)
.map(e=> e.replace(/\./g,''))
console.log(op)

If you know you are searching for a text containing class, then you can use something like
'div.classOneA.classOneB#idOne'.match(/class[^.#]+/g)
If the only thing you know is that the text is preceded by a dot, then you must use lookbehind.

This regex will work without lookbehind assertion:
'div.classOneA.classOneB#idOne'.match(/\.[^\.#]+/g).map(item => item.substring(1));
Lookbehind assertion is not available in JavaScript recently.

I'm not an expert on using regex - particularly in Javascript - but after some research on MDN I've figured out why your attempt wasn't working, and how to fix.
The problem is that using .match with a regexp with the /g flag will ignore capturing groups. So instead you have to use the .exec method on the regexp object, using a loop to execute it multiple times to get all the results.
So the following code is what works, and can be adapted for similar cases. (Note the grp[1] - this is because the first element of the array returned by .exec is the entire match, the groups are the subsequent elements.)
var regExp = /\.([^.#]+)/g
var result = [];
var grp;
while ((grp = regExp.exec('div.classOneA.classOneB#idOne')) !== null) {
result.push(grp[1]);
}
console.log(result)

Related

How can I include the delimiter with regex String.split()?

I need to parse the tokens from a GS1 UDI format string:
"(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
I would like to split that string with a regex on the "(nnn)" and have the delimiter included with the split values, like this:
[ "(20)987111", "(240)A", "(10)ABC123", "(17)2022-04-01", "(21)888888888888888" ]
Below is a JSFiddle with examples, but in case you want to see it right here:
// This includes the delimiter match in the results, but I want the delimiter included WITH the value
// after it, e.g.: ["(20)987111", ...]
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\))/).filter(Boolean))
// Result: ["(20)", "987111", "(240)", "A", "(10)", "ABC123", "(17)", "2022-04-01", "(21)", "888888888888888"]
// If I include a pattern that should (I think) match the content following the delimiter I will
// only get a single result that is the full string:
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)\W+)/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
// I think this is because I'm effectively mathching the entire string, hence a single result.
// So now I'll try to match only up to the start of the next "(":
str = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888";
console.log(str.split(/(\(\d{2,}\)(^\())/).filter(Boolean))
// Result: ["(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"]
I've found and read this question, however the examples there are matching literals and I'm using character classes and getting different results.
I'm failing to create a regex pattern that will provide what I'm after. Here's a JSFiddle of some of the things I've tried: https://jsfiddle.net/6bogpqLy/
I can't guarantee the order of the "application identifiers" in the input string and as such, match with named captures isn't an attractive option.
You can split on positions where parenthesised element follows, by using a zero-length lookahead assertion:
const text = "(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888"
const parts = text.split(/(?=\(\d+\))/)
console.log(parts)
Instead of split use match to create the array. Then find 1) digits in parenthesis, followed by a group that might contain a digit, a letter, or a hyphen, and then 2) group that whole query.
(PS. I often find a site like Regex101 really helps when it comes to testing out expressions outside of a development environment.)
const re = /(\(\d+\)[\d\-A-Z]+)/g;
const str = '(20)987111(240)A(10)ABC123(17)2022-04-01(21)888888888888888';
console.log(str.match(re));

How can I inverse matched result of the pattern?

Here is my string:
Organization 2
info#something.org.au more#something.com market#gmail.com single#noidea.com
Organization 3
headmistress#money.com head#skull.com
Also this is my pattern:
/^.*?#[^ ]+|^.*$/gm
As you see in the demo, the pattern matches this:
Organization 2
info#something.org.au
Organization 3
headmistress#money.com
My question: How can I make it inverse? I mean I want to match this:
more#something.com market#gmail.com single#noidea.com
head#skull.com
How can I do that? Actually I can write a new (and completely different) pattern to grab expected result, but I want to know, Is "inverting the result of a pattern" possible?
No, I don't believe there is a way to directly inverse a Regular Expression but keeping it the same otherwise.
However, you could achieve something close to what you're after by using your existing RegExp to replace its matches with an empty string:
var everythingThatDidntMatchStr = str.replace(/^.*?#[^ ]+|^.*$/gm, '');
You can replace the matches from first RegExp by using Array.prototype.forEach() to replace matched RegExp with empty string using `String.ptototype.replace();
var re = str.match(/^.*?#[^ ]+|^.*$/gm);
var res = str;
re.forEach(val => res = res.replace(new RegExp(val), ""));

match everything between brackets

I need to match the text between two brackets. many post are made about it but non are supported by JavaScript because they all use the lookbehind.
the text is as followed
"{Code} - {Description}"
I need Code and Description to be matched with out the brackets
the closest I have gotten is this
/{([\s\S]*?)(?=})/g
leaving me with "{Code" and "{Description" and I followed it with
doing a substring.
so... is there a way to do a lookbehind type of functionality in Javascript?
You could simply try the below regex,
[^}{]+(?=})
Code:
> "{Code} - {Description}".match(/[^}{}]+(?=})/g)
[ 'Code', 'Description' ]
Use it as:
input = '{Code} - {Description}';
matches = [], re = /{([\s\S]*?)(?=})/g;
while (match = re.exec(input)) matches.push(match[1]);
console.log(matches);
["Code", "Description"]
Actually, in this particular case, the solution is quite easy:
s = "{Code} - {Description}"
result = s.match(/[^{}]+(?=})/g) // ["Code", "Description"]
Have you tried something like this, which doesn't need a lookahead or lookbehind:
{([^}]*)}
You would probably need to add the global flag, but it seems to work in the regex tester.
The real problem is that you need to specify what you want to capture, which you do with capture groups in regular expressions. The part of the matched regular expression inside of parentheses will be the value returned by that capture group. So in order to omit { and } from the results, you just don't include those inside of the parentheses. It is still necessary to match them in your regular expression, however.
You can see how to get the value of capture groups in JavaScript here.

JavaScript regex not returning match group

I'm trying to get the content in between square brackets within a string but my Regex isn't working.
RegExp: /\[([^\n\]]+)\]/g
It returns the correct match groups on regex101 but when I try something like '[a][b]'.match(/\[([^\n\]]+)\]/g), I get ['[a]', '[b]'] instead of ['a', 'b'].
I can get the correct results if I iterate through and do RegExp.exec, but from looking at examples online it seems like I should be able to get the match groups using String.match
You're using the String .match() method, which has different behavior from RegExp .exec() in the case of regular expressions with the "g" flag. The .match() method gives you all the complete matches across the entire searched string for "g" regular expressions.
If you change your code to
/\[([^\n\]]+)\]/g.exec('[a][b]')
you'll get the result you expect: an array in which the first entry (index 0) is the entire match, and the second and subsequent entries are the groups from the regex.
You'll have to iterate to match all of them:
var re = /\[([^\n\]]+)\]/g, search = "[a][b]", bracketed = [];
for (var m = null; m = re.exec(search); bracketed.push(m[1]));

Javascript RegExp non-capturing groups

I am writing a set of RegExps to translate a CSS selector into arrays of ids and classes.
For example, I would like '#foo#bar' to return ['foo', 'bar'].
I have been trying to achieve this with
"#foo#bar".match(/((?:#)[a-zA-Z0-9\-_]*)/g)
but it returns ['#foo', '#bar'], when the non-capturing prefix ?: should ignore the # character.
Is there a better solution than slicing each one of the returned strings?
You could use .replace() or .exec() in a loop to build an Array.
With .replace():
var arr = [];
"#foo#bar".replace(/#([a-zA-Z0-9\-_]*)/g, function(s, g1) {
arr.push(g1);
});
With .exec():
var arr = [],
s = "#foo#bar",
re = /#([a-zA-Z0-9\-_]*)/g,
item;
while (item = re.exec(s))
arr.push(item[1]);
It matches #foo and #bar because the outer group (#1) is capturing. The inner group (#2) is not, but that' probably not what you are checking.
If you were not using global matching mode, an immediate fix would be to use (/(?:#)([a-zA-Z0-9\-_]*)/ instead.
With global matching mode the result cannot be had in just one line because match behaves differently. Using regular expression only (i.e. no string operations) you would need to do it this way:
var re = /(?:#)([a-zA-Z0-9\-_]*)/g;
var matches = [], match;
while (match = re.exec("#foo#bar")) {
matches.push(match[1]);
}
See it in action.
I'm not sure if you can do that using match(), but you can do it by using the RegExp's exec() method:
var pattern = new RegExp('#([a-zA-Z0-9\-_]+)', 'g');
var matches, ids = [];
while (matches = pattern.exec('#foo#bar')) {
ids.push( matches[1] ); // -> 'foo' and then 'bar'
}
Unfortunately there is no lookbehind assertion in Javascript RegExp, otherwise you could do this:
/(?<=#)[a-zA-Z0-9\-_]*/g
Other than it being added to some new version of Javascript, I think using the split post processing is your best bet.
You can use a negative lookahead assertion:
"#foo#bar".match(/(?!#)[a-zA-Z0-9\-_]+/g); // ["foo", "bar"]
The lookbehind assertion mentioned some years ago by mVChr is added in ECMAScript 2018. This will allow you to do this:
'#foo#bar'.match(/(?<=#)[a-zA-Z0-9\-_]*/g) (returns ["foo", "bar"])
(A negative lookbehind is also possible: use (?<!#) to match any character except for #, without capturing it.)
MDN does document that "Capture groups are ignored when using match() with the global /g flag", and recommends using matchAll(). matchAll() isn't available on Edge or Safari iOS, and you still need to skip the complete match (including the#`).
A simpler solution is to slice off the leading prefix, if you know its length - here, 1 for #.
const results = ('#foo#bar'.match(/#\w+/g) || []).map(s => s.slice(1));
console.log(results);
The [] || ... part is necessary in case there was no match, otherwise match returns null, and null.map won't work.
const results = ('nothing matches'.match(/#\w+/g) || []).map(s => s.slice(1));
console.log(results);

Categories