I have a string as:
const str = 'My [Link format](https://google.com) demo'
I want the word array to be like:
['My', '[Link format](https://google.com)', 'demo']
What to do in javascript?
I was trying using split() and str.match(). Nothing worked yet.
This is a simple split on a space as a delimiter, but we us a negative lookahead to check for the combination of open and closed square brackets [] and round brackets ()
const str = 'My [Link format](https://google.com) demo'
console.log(str.split(/\s+(?![^\[]*\])(?![^\(]*\))/));
We also allow for spaces in the URL portion, even though it has a low chance of having spaces, it could still happen
Try it here: https://jsfiddle.net/m4q6e9x7/
["My", "[Link format](https://google.com)", "demo"]
In the fiddle I've tried to show to two separate negative lookaheads for the combination of the types of brackets: (I've put a space in the round brackets to prove the concept)
const str = 'My [Link format](http s://google.com) demo'
ignore space between []
console.log(str.split(/\s+(?![^\[]*\])/));
["My", "[Link format](http", "s://google.com)", "demo"]
ignore space between ()
console.log(str.split(/\s+(?![^\(]*\))/));
["My", "[Link", "format](http s://google.com)", "demo"]
So we can easily combine the two criteria because we need both of them to not match.
Because [] and () need to be escaped, it might be easier to see the regex if we modify and test for spaces between braces {}
const str = 'My {Link format}(https://google.com) demo'
console.log(str.split(/\s+(?![^{]*})/));
["My", "{Link format}(https://google.com)", "demo"]
Both solutions assume, that the string has correct form (meaning basically no space between ']' and '(', no ']' characters inside [...] and similar intuitions. You didn't really provide information about what the input string can be other than your concrete example – so solutions work well in this and very similar cases. Second is very easily modified as needed, first is easily extended to check if the string is in fact not correct.
Solution using Regular Expressions
Below code finds everything before first '[', everything in '[...](...)' pattern (note: first ... must not contain ']', and second – ')', but I assume this would make for an incorrect input in the first place), and everything after that.
So
let regex = /(.*)(\[.*\]\(.*\))(.*)/
let res = str.match(regex).splice(1,3)
gives res as
['My ', '[Link format](https://google.com)', ' demo']
From there, you can trim every entry in this array ('My ' => 'My') for example using a trim function like so:
res.map((val) => val.trim());
Look here for explanation of what the array obtained from .match() method represents, but generally except index 0 it contains capture groups, meaning the parts of string corresponding to parts of regex surrounded by parentheses.
If you are not familiar with Regular Expressions (regexes) in JS, or at all, you will find many online resources about the topic easily. After grasping the basics, regex101 is a nice tool to experiment with regexes and explore their capabilities. When using it, you should probably choose EcmaSCRIPT/JS flavor from the menu on the left.
Equivalent solution without regex
Equivalent solution is to find where is the first '[' manually, as well as where the '[...](...)' pattern ends. Than splice the parts (before '[', pattern, and after pattern) from the string, and probably trim them. So just loop over characters of the string in search of '[' and than ']', '(', ')'. Note that in this case you can easily and granularily decide what to do if the string has unexpected/incorrect form.
TODO: I will probably sketch some code when I have time for it
Regex is your friend!
const regexMdLinks = /!?\[([^\]]*)\]\(([^\)]+)\)/gm
// Example md file contents
const str = `My [Link format](https://google.com) demo My [Link format2](https://google.com/2) demo2`
let regex_splitted = str.split(regexMdLinks);
let arr = [];
//1. Item will be the text (or empty text)
//2. Item is the link text
//3. Item is the url
for(let i = 0; i < regex_splitted.length; i++){
if(i % 3 == 0){ //Split normal text
arr.push(...regex_splitted[i].split(" ").filter(i => i));
} else if(i % 3 == 1){//Add brackets around link text
arr.push("["+regex_splitted[i]+"]");
} else {
arr.push("("+regex_splitted[i]+")");
}
}
console.log(arr)
At the time of login, I need to allow either username (alphanumeric and some special characters) or email address or username\domain format only. For this purpose, I used this regex with or (|) condition. Along with this, I need to allow some other language characters like Japanese, Chinese etc., so included those as well in the same regex. Now, the issue is when I enter characters (>=30) and # or some special character, the evaluation of this regex is taking some seconds and browser goes in hang mode.
export const usernameRegex = /(^[a-zA-Z0-9._~^#!%+\-]+#[a-z0-9.-]+\.[a-z]{2,4})+|^[a-zA-Z0-9._~^#!\-]+\\([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+|^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+$/gu;
When I tried removing the other language character set such as [\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+|^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF] it works fine.
I understood that generally regex looks simple but it does a lot under the hood. Is there any modification that needs to be done in this regex, so that it doesn't take time to evaluate. Any help is much appreciated!
Valid texts:
stackoverflow,
stackoverflow1~,
stackoverflow!#~^-,
stackoverflow#g.co,
stackoverflow!#~^-#g.co,
こんにちは,
你好,
tree\guava
EDIT:
e.g. Input causing the issue
stackoverflowstackoverflowstackoverflow#
On giving the above text it is taking long time.
https://imgur.com/T2Vg4lg
Your regex seems to consist of three regular expressions concatenated with |
(^[a-zA-Z0-9._~^#!%+\-]+#[a-z0-9.-]+\.[a-z]{2,4})+
^[a-zA-Z0-9._~^#!\-]+\\([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+
^([._-~^#!]|[\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF])+$
first regex (^...)+ how many times do you think this entire pattern can occur that starts at the beginning of the string. Either it's a second occurence OR it starts at the beginning of the string it can't be both.
So ^[a-zA-Z0-9._~^#!%+\-]+#[a-z0-9.-]+\.[a-z]{2,4}
parts 2 and 3 are mostly identical, only that nr. 2 contains this block [a-zA-Z0-9._~^#!\-]+\\ followed by what's the rest of the 3rd part.
So let's combine them: ^(?:[a-zA-Z0-9._~^#!\-]+\\)? ... and make sure to use non-capturing groups when possible.
([abc]|[def])+ can be simplified to [abcdef]+. This btw. is the part that's killing your performance.
your regex ends with a $. This was only part of the last part, but I assume you always want to match the entire string? So let's make all 3 (now 2) parts ^ ... $
Summary:
/^[a-zA-Z0-9._~^#!%+-]+#[a-z0-9.-]+\.[a-z]{2,4}$|^(?:[a-zA-Z0-9._~^#!-]+\\)?[._-~^#!\p{Ll}\p{Lm}\p{Lt}a-zA-Z0-9-\u3000-\u303f\u3040-\u309f\u30a0-\u30ff\uff00-\uff9f\u4e00-\u9faf\u3400-\u4dbf\u3130-\u318F\uAC00-\uD7AF]+$/u
A JS example how a simple regex would try to match a string, and how it fails, backtracks, retries with the other side of the | and so on, and so on.
// let's implement what `/([a-z]|[\p{Ll}])+/u` would do,
// how it would try to match something.
const a = /[a-z]/; // left part
const b = /[\p{Ll}]/u; // right part
const string = "abc,";
const testNextCharacter = (index) => {
if (index === string.length) {
return true;
}
const pattern = index + " ".repeat(index + 1) + "%o.test(%o)";
const character = string.charAt(index);
console.log(pattern, a, character);
// checking the left part && if successful checking the next character
if (a.test(character) && testNextCharacter(index + 1)) {
return true;
}
// checking the right part && if successful checking the next character
console.log(pattern, b, character);
if (b.test(character) && testNextCharacter(index + 1)) {
return true;
}
return false;
}
console.log("result", testNextCharacter(0));
.as-console-wrapper{top:0;max-height:100%!important}
And this are only 4 characters. Why don't you try this with 5,6 characters to get an impression how much work this will be at 20characters.
I have a problem with replacing all the occurences of a slash inside my string to a double-slash. Right now this problem is preventing me from opening a file, thus I need to "escape" all the slashes in my path.
The path looks something like this:
C:\Folder\tmp\c235adf5b8c79ee61910a0c04abf9bc1
I want to replace all the slashes to double slashes, so that in the end it would look like this:
C:\\Folder\\tmp\\c235adf5b8c79ee61910a0c04abf9bc1
I already tried using this solution but it doesn't work for me:
str.replace(/\\/g, "\\\\");
The output of the solution above produces the following string:
C:Folder mpc235adf5b8c79ee61910a0c04abf9bc1
EDIT This is the actual code where I'm trying to solve the problem:
exports.register = function (req, res) {
let user = new db.User();
req.files.forEach(function(file) {
const uploadDir = 'upload/' + user._id + '/' + file.name;
const filePath = path.resolve('./' + file.path);
console.log(filePath);
ftp.put(filePath, uploadDir, function(err) {
if(err) return console.log(err);
console.log('Uploaded file: ' + file.name);
});
});
};
I omitted irrelevant parts
The \ character is an escape character in a JavaScript string literal. It doesn't exist in the string.
You can't use a regular expression to fix your problem because it is the source code you need to change, not the data in the string itself.
By the time you get the string, the \ characters will have been parsed and (for example) \t turned into a tab. At this point it is too late to manipulate the data to get the actual directory path back.
If your input is hard coded in your JavaScript, then your escaping the characters in that string unless you escape the escape characters (\\). Try just a console output of that string as you have it
console.log('C:\Folder\tmp\c235adf5b8c79ee61910a0c04abf9bc1');
// outputs "C:Folder mpc235adf5b8c79ee61910a0c04abf9bc1"
So the replacement for the slashes isn't replacing anything because there isn't actually any slashes in it.
'C:\\Folder\\tmp\\c235adf5b8c79ee61910a0c04abf9bc1'.replace(/\\/g, '\\\\')
Running something like that above outputs exactly what you're looking for.
To elaborate a bit, that gap between C:Folder and mpc235adf5b8c79ee61910a0c04abf9bc1 is there because \t is the tab character, and that space is a tab. The c is still there after mp because \c isn't anything, it's not an escape sequence.
If you got this text from a different source (not hard coded in the JS), then you wouldn't need to worry about this and your str.replace(/\\/g, "\\\\"); would work as expected.
var Path = 'C:\\Folder\\tmp\\c235adf5b8c79ee61910a0c04abf9bc1';
console.log(Path);
console.log(Path.replace(/\\/g, '\\\\'))
console.log(Path.replace(/\\/g, '/'))
//VM2431:2 C:\Folder\tmp\c235adf5b8c79ee61910a0c04abf9bc1
//VM2431:3 C:\\Folder\\tmp\\c235adf5b8c79ee61910a0c04abf9bc1
//VM2431:4 C:/Folder/tmp/c235adf5b8c79ee61910a0c04abf9bc1
Use String#raw.
String.raw`C:\Folder\tmp\c235adf5b8c79ee61910a0c04abf9bc1`
Output:
"C:\\Folder\\tmp\\c235adf5b8c79ee61910a0c04abf9bc1"
I need to find all lines with string "new qx.ui.form.Button" WHICH EXCLUDE lines starting with comments "//".
Example
line 1:" //btn = new qx.ui.form.Button(plugin.menuName, plugin.menuIcon).set({"
line 2:" btn = new qx.ui.form.Button(plugin.menuName, plugin.menuIcon).set({"
Pattern should catch only "line 2"!
Be aware about leading spaces.
Finally I have to FIND and REPLACE "new qx.ui.form.Button" in all UNCOMMENTED code lines with "this.__getButton".
I tried.
/new.*Button/g
/[^\/]new.*Button/g
and many others without success.
In JavaScript this is a bit icky:
^\s*(?=\S)(?!//)
excludes a comment at the start of a line. So far, so standard. But you cannot look backwards for this pattern because JS doesn't support arbitrary-length lookbehind, so you have to match and replace more than needed:
^(\s*)(?=\S)(?!//)(.*)(new qx\.ui\.form\.Button)
Replace that by
$1$2this.__getButton
Quick PowerShell test:
PS Home:\> $line1 -replace '^(\s*)(?=\S)(?!//)(.*)(new qx\.ui\.form\.Button)','$1$2this.__getButton'
//btn = new qx.ui.form.Button(plugin.menuName, plugin.menuIcon).set({
PS Home:\> $line2 -replace '^(\s*)(?=\S)(?!//)(.*)(new qx\.ui\.form\.Button)','$1$2this.__getButton'
btn = this.__getButton(plugin.menuName, plugin.menuIcon).set({
That being said, why do you care about what's in the commented lines anyway? It's not as if they had any effect on the program.
Ah, if only JavaScript had lookbehinds... Then all you'd need is
/(?<!\/\/.*)new\s+qx\.ui\.form\.Button/g... Ah well.
This'll work just fine too:
.replace(/(.*)new\s(qx\.ui\.form\.Button)/g,function(_,m) {
// note that the second set of parentheses aren't needed
// they are there for readability, especially with the \s there.
if( m.indexOf("//") > -1) {
// line is commented, return as-is
// note that this allows comments in an arbitrary position
// to only allow comments at the start of the line (with optional spaces)
// use if(m.match(/^\s*\/\//))
return _;
}
else {
// uncommented! Perform replacement
return m+"this.__getButton";
}
});
Grep uses Regular Expressions, this will exclude all white space (if any) plus two // at the beginning of any line.
grep -v "^\s*//"
I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)
[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();
Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.
Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.
There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g
The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');
You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);
I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);
Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");
#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i