Recursive regex match - javascript

I use /<?=>|[^\s\w]|\w+/g to match
<=>
=>
words/letters
but I also want to match K(a,...) where a can be any word/letter and ... can be anything also matched in this final regex. So it actually has to be recursive.
So the new regex should match
<=>
=>
words/letters
K(a,...)
where ... matches
<=>
=>
words/letters
K(a,...)
and so on...
I am not sure if this is possible.
I am not sure if it might be easier to create a function that walks through each character in a string recursively, which is something like https://en.wikipedia.org/wiki/Recursive_descent_parser

You can use the below regexp to match. Based on the example that you gave the below would work.
/(K\(.+,(.+)\)|<?=>)|\w+\1/g

The regex used in the javascript below doesn't use recursion.
Since that's not available in standard javascript regex.
It takes advantage of the fact that all the ) are at the end.
As in the example string from the comments.
var str = "ps<=>q=>pb=>K(ab,K(b,K(c,p => q))) not)";
var re = /\w+(?:\([^()]+)+[)]+|<?=>|\w+(?=<?=>)/g;
var matchArray = [];
var m;
while (m = re.exec(str)) {
matchArray.push(m[0]);
}
console.log(matchArray);

Related

How can I inverse matched result of the pattern?

Here is my string:
Organization 2
info#something.org.au more#something.com market#gmail.com single#noidea.com
Organization 3
headmistress#money.com head#skull.com
Also this is my pattern:
/^.*?#[^ ]+|^.*$/gm
As you see in the demo, the pattern matches this:
Organization 2
info#something.org.au
Organization 3
headmistress#money.com
My question: How can I make it inverse? I mean I want to match this:
more#something.com market#gmail.com single#noidea.com
head#skull.com
How can I do that? Actually I can write a new (and completely different) pattern to grab expected result, but I want to know, Is "inverting the result of a pattern" possible?
No, I don't believe there is a way to directly inverse a Regular Expression but keeping it the same otherwise.
However, you could achieve something close to what you're after by using your existing RegExp to replace its matches with an empty string:
var everythingThatDidntMatchStr = str.replace(/^.*?#[^ ]+|^.*$/gm, '');
You can replace the matches from first RegExp by using Array.prototype.forEach() to replace matched RegExp with empty string using `String.ptototype.replace();
var re = str.match(/^.*?#[^ ]+|^.*$/gm);
var res = str;
re.forEach(val => res = res.replace(new RegExp(val), ""));

Get string between “-”

I have this string: 2015-07-023. I want to get 07 from this string.
I used RegExp like this
var regExp = /\(([^)]+-)\)/;
var matches = regExp.exec(id);
console.log(matches);
But I get null as output.
Any idea is appreciated on how to properly configure the RegExp.
The best way to do it is to not use RegEx at all, you can use regular JavaScript string methods:
var id_parts = id.split('-');
alert(id_parts[1]);
JavaScript string methods is often better than RegEx because it is faster, and it is more straight-forward and readable. Any programmer can read this code and quickly know that is is splitting the string into parts from id, and then getting the item at index 1
If you want regex, you can use following regex. Otherwise, it's better to go with string methods as in the answer by #vihan1086.
var str = '2015-07-023';
var matches = str.match(/-(\d+)-/)[1];
document.write(matches);
Regex Explanation
-: matches - literal
(): Capturing group
\d+: Matches one or more digits
Regex Visualization
EDIT
You can also use substr as follow, if the length of the required substring is fixed.
var str = '2015-07-023';
var newStr = str.substr(str.indexOf('-') + 1, 2);
document.write(newStr);
You may try the below positive lookahead based regex.
var string = "2015-07-02";
alert(string.match(/[^-]+(?=-[^-]*$)/))

How to split a long regular expression into multiple lines in JavaScript?

I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)
[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();
Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.
Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.
There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g
The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');
You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);
I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);
Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");
#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i

Javascript RegExp non-capturing groups

I am writing a set of RegExps to translate a CSS selector into arrays of ids and classes.
For example, I would like '#foo#bar' to return ['foo', 'bar'].
I have been trying to achieve this with
"#foo#bar".match(/((?:#)[a-zA-Z0-9\-_]*)/g)
but it returns ['#foo', '#bar'], when the non-capturing prefix ?: should ignore the # character.
Is there a better solution than slicing each one of the returned strings?
You could use .replace() or .exec() in a loop to build an Array.
With .replace():
var arr = [];
"#foo#bar".replace(/#([a-zA-Z0-9\-_]*)/g, function(s, g1) {
arr.push(g1);
});
With .exec():
var arr = [],
s = "#foo#bar",
re = /#([a-zA-Z0-9\-_]*)/g,
item;
while (item = re.exec(s))
arr.push(item[1]);
It matches #foo and #bar because the outer group (#1) is capturing. The inner group (#2) is not, but that' probably not what you are checking.
If you were not using global matching mode, an immediate fix would be to use (/(?:#)([a-zA-Z0-9\-_]*)/ instead.
With global matching mode the result cannot be had in just one line because match behaves differently. Using regular expression only (i.e. no string operations) you would need to do it this way:
var re = /(?:#)([a-zA-Z0-9\-_]*)/g;
var matches = [], match;
while (match = re.exec("#foo#bar")) {
matches.push(match[1]);
}
See it in action.
I'm not sure if you can do that using match(), but you can do it by using the RegExp's exec() method:
var pattern = new RegExp('#([a-zA-Z0-9\-_]+)', 'g');
var matches, ids = [];
while (matches = pattern.exec('#foo#bar')) {
ids.push( matches[1] ); // -> 'foo' and then 'bar'
}
Unfortunately there is no lookbehind assertion in Javascript RegExp, otherwise you could do this:
/(?<=#)[a-zA-Z0-9\-_]*/g
Other than it being added to some new version of Javascript, I think using the split post processing is your best bet.
You can use a negative lookahead assertion:
"#foo#bar".match(/(?!#)[a-zA-Z0-9\-_]+/g); // ["foo", "bar"]
The lookbehind assertion mentioned some years ago by mVChr is added in ECMAScript 2018. This will allow you to do this:
'#foo#bar'.match(/(?<=#)[a-zA-Z0-9\-_]*/g) (returns ["foo", "bar"])
(A negative lookbehind is also possible: use (?<!#) to match any character except for #, without capturing it.)
MDN does document that "Capture groups are ignored when using match() with the global /g flag", and recommends using matchAll(). matchAll() isn't available on Edge or Safari iOS, and you still need to skip the complete match (including the#`).
A simpler solution is to slice off the leading prefix, if you know its length - here, 1 for #.
const results = ('#foo#bar'.match(/#\w+/g) || []).map(s => s.slice(1));
console.log(results);
The [] || ... part is necessary in case there was no match, otherwise match returns null, and null.map won't work.
const results = ('nothing matches'.match(/#\w+/g) || []).map(s => s.slice(1));
console.log(results);

Can regex matches in javascript match any word after an equal operator?

I am trying to target ?state=wildcard in this statement :
?state=uncompleted&dancing=yes
I would like to target the entire line ?state=uncomplete, but also allow it to find whatever word would be after the = operator. So uncomplete could also be completed, unscheduled, or what have you.
A caveat I am having is granted I could target the wildcard before the ampersand, but what if there is no ampersand and the param state is by itself?
Try this regular expression:
var regex = /\?state=([^&]+)/;
var match = '?state=uncompleted&dancing=yes'.match(regex);
match; // => ["?state=uncompleted", "uncompleted"]
It will match every character after the string "\?state=" except an ampersand, all the way to the end of the string, if necessary.
Alternative regex: /\?state=(.+?)(?:&|$)/
It will match everything up to the first & char or the end of the string
IMHO, you don't need regex here. As we all know, regexes tend to be slow, especially when using look aheads. Why not do something like this:
var URI = '?state=done&user=ME'.split('&');
var passedVals = [];
This gives us ['?state=done','user=ME'], now just do a for loop:
for (var i=0;i<URI.length;i++)
{
passedVals.push(URI[i].split('=')[1]);
}
Passed Vals wil contain whatever you need. The added benefit of this is that you can parse a request into an Object:
var URI = 'state=done&user=ME'.split('&');
var urlObjects ={};
for (var i=0;i<URI.length;i++)
{
urlObjects[URI[i].split('=')[0]] = URI[i].split('=')[1];
}
I left out the '?' at the start of the string, because a simple .replace('?','') can fix that easily...
You can match as many characters that are not a &. If there aren't any &s at all, that will of course also work:
/(\?state=[^&]+)/.exec("?state=uncompleted");
/(\?state=[^&]+)/.exec("?state=uncompleted&a=1");
// both: ["?state=uncompleted", "?state=uncompleted"]

Categories