Regexp (recursive?) to match nested pattern alternative in Javascript? [duplicate] - javascript

Example string: $${a},{s$${d}$$}$$
I'd like to match $${d}$$ first and replace it some text so the string would become $${a},{sd}$$, then $${a},{sd}$$ will be matched.

Annoyingly, Javascript does not provide the PCRE recursive parameter (?R), so it is far from easy to deal with the nested issue. It can be done however.
I won't reproduce code, but if you check out Steve Levithan's blog, he has some good articles on the subject. He should do, he is probably the leading authority on RegExp in JS. He wrote XRegExp, which replaces most of the PCRE bits that are missing, there is even a Match Recursive plugin!

I wrote this myself:
String.prototype.replacerec = function (pattern, what) {
var newstr = this.replace(pattern, what);
if (newstr == this)
return newstr;
return newstr.replace(pattern, what);
};
Usage:
"My text".replacerec(/pattern/g,"what");
P.S: As suggested by #lededje, when using this function in production it's good to have a limiting counter to avoid stack overflow.

Since you want to do this recursively, you are probably best off doing multiple matches using a loop.
Regex itself is not well suited for recursive-anything.

you can try \$\${([^\$]*)}\$\$, the [^\$] mean do not capture if captured group contains $
var re = new RegExp(/\$\${([^\$]*)}\$\$/, 'g'),
original = '$${a},{s$${d}$$}$$',
result = original.replace(re, "$1");
console.log('original: ' + original)
console.log('result: ' + result);

var content = "your string content";
var found = true;
while (found) {
found = false;
content = content.replace(/regex/, () => { found = true; return "new value"; });
}

In general, Regexps are not well suited for that kind of problem. It's better to use state machine.

Related

Escaping Regex single-quote creates extra backslashes on repl.it

I am trying to write a prototype method that first needs to escape backslashes \ when the input contains a plain single quote '. (I am aware that extending the prototype is bad practice in almost any other circumstance - this is merely a practice problem I'm trying to solve.)
I've checked out the Regex wiki and tried implementing the solutions to several regex-related questions, but I still seem to be missing something. In all of my attempts, I've been unable to 'escape the escape' as shown below:
String.prototype.escapeQuote = function () {
const regex = /\'/g;
const str = `${this}`;
const subst = `\\'`;
const result = str.replace(regex, subst);
return result;
};
var str = "this method doesn't work...";
str.escapeQuote();
When I run this code, I expect the output to be:
this method doesn\'t work...
But the output I get when I run it on repl.it is:
'this method doesn\\\'t work...'
binding subst to \' or just ' doesn't work either (perhaps it goes without saying) - either way the result is:
'this method doesn\'t work...'
I am pretty fuzzy on Regex, but trying to improve, so I'd appreciate any help you could provide - and, for that matter, any relevant answers I might have missed.
That's a rendering artifact of how the REPL you are using represents a string when it displays a string as the result of evaluating your code.
Note that it also wraps it in ' to indicate it is a string.
There are no slashes in the string itself, which you can see in this example:
String.prototype.escapeQuote = function() {
const regex = /(')/g;
const subst = `\\'`;
const result = this.replace(regex, subst);
return result;
};
const str = `doesn't this sound awesome`;
alert(str + "\n\n" + str.escapeQuote());
You should use the unescaped string inside a regular expression.
Like this:
String.prototype.escapeQuote = function(){
return this.replace(/'/g, '\\\'');
}
console.log("Try it, it's easier!".escapeQuote());
If you don't like that nasty '\\\'', you can use "\\'" instead.
Hope this works for you.
If you want to escape both single and double quotes, you can use this:
String.prototype.escapeQuotes = function(){
return this.replace(/["']/g, '\\$&');
}
console.log("Try it, it's easier!".escapeQuotes());

How to split a long regular expression into multiple lines in JavaScript?

I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)
[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();
Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.
Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.
There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g
The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');
You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);
I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);
Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");
#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i

Taking a String and Capitalizing the first character - Why is this killing the browser

given a string like bobby, I want to the function to return Bobby.
I have the following:
// Capitalizes the first letter.
function toTitleCase(str) {
return str.replace(/(?:^|\s)\w/g, function(match) {
return match.toUpperCase();
});
}
For some reason this is killing the browser, any ideas why? Did I missing something with the REGEX that could be causing memory issues? thanks
Why use regex for something like this?
var s = "my string";
s = s.substring(0, 1).toUpperCase() + s.substring(1);
console.log(s);
Regex is quite a bit more expensive to use than native string functions and as such should only be used when nothing else will solve your particular problem.
Edit
On another note, I'm not sure why it's causing your browser to bail, I have no issues running what you have in either FF or Chrome.

Matching _{number} with Regex in javascript

I am trying the match part of an image src, an example would be:
images/preview_1.jpg
and I want to change _1 to say _6
so I’m trying to match _1
function ClickImgOf(filename, itemid){
var re = new RegExp("(.+)_[0-9])\\.(gif|jpg|jpg|ashx|png)", "g");
return filename.replace(re, "$1_"+itemid+".$2");
}
Is the function I have..
I know that only matches 0-9 but I was just trying to get something to work and even that didn't.
Its fair to say I do not know much about Regex at the moment.
You have an unmatched ) parenthesis there in your pattern. Is that what's throwing you off? Looks okay otherwise. If your problem is being able to match 2-or-more-digit numbers, try [0-9]+.
function ClickImgOf(filename, itemid){
var re = new RegExp("(.+)_([0-9]+)\\.(gif|jpg|jpg|ashx|png)", "g");
return filename.replace(re, "$1_"+itemid+".$3");
}
Try this:
(.+_)[0-9]+(\.(?:gif|jpg|jpg|ashx|png))
Then you can just do:
return filename.replace(re, "$1" + itemid + "$2");
Also, download and install this: http://www.ultrapico.com/ExpressoDownload.htm
It's invaluable when working with regular expressions.
You don't need to build your regex using the regex object, it's both easier and performs better to use a literal.
function ClickImgOf(filename, itemid) {
return filename.replace(/_\d+\.(gif|jpg|jpg|ashx|png)$/g, '_'+itemid+'.$2');
}

JavaScript regex refactoring

I'm performing this on a string:
var poo = poo
.replace(/[%][<]/g, "'<")
.replace(/[>][%]/g, ">'")
.replace(/[%]\s*[+]/g, "'+")
.replace(/[+]\s*[%]/g, "+'");
Given the similar if these statements, can these regexs be comebined somehow?
No, I don't think so. At least, I suspect for any transformation involving fewer replaces I can come up with a string that your original and the proposed alternative treat differently. However, it may be that the text you're working with wouldn't trigger the differences, and so for practical purposes a shorter transformation would work as well. Depends on the text.
You can simplify it a little bit. You don't need all the range syntax
poo
.replace(/%</g, "'<")
.replace(/>%/g, ">'")
.replace(/%\s*\+/g, "'+")
.replace(/\+\s*%/g, "+'");
Since in either case, the replacement only turns % into ' and removes spaces:
var poo = 'some annoying %< string >% with some % + text + %';
poo = poo.replace(/%<|>%|%\s*\+|\+\s*%/g, function(match) {
return match.replace('%', '\'').replace(/\s/g,'');
});
// "some annoying '< string >' with some ' + text + '"
Although that's not much simpler...
Using lookahead assertions and capturing:
var poo = poo.replace(/%(?=<)|(>)%|%\s*(?=\+)|(\+)\s*%/g, "$1$2'");
Using capturing alone:
var poo = poo.replace(/(>)%|(\+)\s*%|%(<)|%\s*(\+)/g, "$1$2'$3$4");
If JS's RegExp supported lookbehind assertions:
var poo = poo.replace(/%(?=<)|(?<=>)%|%\s*(?=\+)|(?<=\+)\s*%/g, "'");
but it doesn't.

Categories