row1: 10016/Documents/abc.pdf
row2: 10016-10017/10017/Documents/folder1/folder2/xyz.pdf
I'm trying to retrieve all the characters starting from /Documents but without the last part (file name)
In row 1, I want to retrieve /Documents/
In row 2, I want to retrieve /Documents/folder1/folder2/
I tried
var temp1 = FullPath.split("/Documents/")[0];
var A_Fpath = temp1.split("/");
A_Fpath = A_Fpath[A_Fpath.length - 1];
A simple regex would do the trick:
/\/Documents.*\//
/ start the regex
\/ match literally a "/" (the \ is to escape the / reserved character)
Documents match literally the word "Documents" (case sensitive
.* match 0 or more characters (any characters)
\/ match literally a "/"
/ end the regex
This works because regex will attempt to match the longest possible string
of characters that match the regex.
const row1 = "10016/Documents/abc.pdf";
const row2 = "10016-10017/10017/Documents/folder1/folder2/xyz.pdf";
const regex = /\/Documents.*\//;
const val1 = row1.match(regex)[0];
const val2 = row2.match(regex)[0];
console.log(val1);
console.log(val2);
Here's a Regex101 link to test it out and see more info about this specific regex.
If javascript had a grown-up regular expression engine, one could use a positive, non-capturing lookahead group to determine when to stop.
Since javascript lacks that, the simple, clearer, and more efficient way is to not use a regular expression at all. The algorithm is simple:
Find the [first/leftmost] /Documents in the source text, then
Find the last/rightmost occurrence of / in the source text
Deal with the two special cases where:
The source string doesn't contain /Documents at all, and
The rightmost / is the / in /Documents
Failing a special case as noted above, return the desired substring
extending from /Documents up to and including the last /
Like this:
function getInterestingBitsFrom(path) {
const i = path.indexOf('/Documents');
const j = path.lastIndexOf('/');
const val = i < 0 ? undefined // no '/Documents' in string
: i === j ? path.slice(i) // last '/' in string is the '/' in '/Documents'
: path.slice(i, j+1) // '/Documents/' or '/Documents/.../'
;
return retVal;
}
This also has the laudatory benefit of being easy to understand for someone who has to figure out what you were trying to accomplish.
So currently, my code works for inputs that contain one set of parentheses.
var re = /^.*\((.*\)).*$/;
var inPar = userIn.replace(re, '$1');
...meaning when the user enters the chemical formula Cu(NO3)2, alerting inPar returns NO3) , which I want.
However, if Cu(NO3)2(CO2)3 is the input, only CO2) is being returned.
I'm not too knowledgable in RegEx, so why is this happening, and is there a way I could put NO3) and CO2) into an array after they are found?
You want to use String.match instead of String.replace. You'll also want your regex to match multiple strings in parentheses, so you can't have ^ (start of string) and $ (end of string). And we can't be greedy when matching inside the parentheses, so we'll use .*?
Stepping through the changes, we get:
// Use Match
"Cu(NO3)2(CO2)3".match(/^.*\((.*\)).*$/);
["Cu(NO3)2(CO2)3", "CO2)"]
// Lets stop including the ) in our match
"Cu(NO3)2(CO2)3".match(/^.*\((.*)\).*$/);
["Cu(NO3)2(CO2)3", "CO2"]
// Instead of matching the entire string, lets search for just what we want
"Cu(NO3)2(CO2)3".match(/\((.*)\)/);
["(NO3)2(CO2)", "NO3)2(CO2"]
// Oops, we're being a bit too greedy, and capturing everything in a single match
"Cu(NO3)2(CO2)3".match(/\((.*?)\)/);
["(NO3)", "NO3"]
// Looks like we're only searching for a single result. Lets add the Global flag
"Cu(NO3)2(CO2)3".match(/\((.*?)\)/g);
["(NO3)", "(CO2)"]
// Global captures the entire match, and ignore our capture groups, so lets remove them
"Cu(NO3)2(CO2)3".match(/\(.*?\)/g);
["(NO3)", "(CO2)"]
// Now to remove the parentheses. We can use Array.prototype.map for that!
var elements = "Cu(NO3)2(CO2)3".match(/\(.*?\)/g);
elements = elements.map(function(match) { return match.slice(1, -1); })
["NO3", "CO2"]
// And if you want the closing parenthesis as Fabrício Matté mentioned
var elements = "Cu(NO3)2(CO2)3".match(/\(.*?\)/g);
elements = elements.map(function(match) { return match.substr(1); })
["NO3)", "CO2)"]
Your regex has anchors to match beginning and end of the string, so it won't suffice to match multiple occurrences. Updated code using String.match with the RegExp g flag (global modifier):
var userIn = 'Cu(NO3)2(CO2)3';
var inPar = userIn.match(/\([^)]*\)/g).map(function(s){ return s.substr(1); });
inPar; //["NO3)", "CO2)"]
In case you need old IE support: Array.prototype.map polyfill
Or without polyfills:
var userIn = 'Cu(NO3)2(CO2)3';
var inPar = [];
userIn.replace(/\(([^)]*\))/g, function(s, m) { inPar.push(m); });
inPar; //["NO3)", "CO2)"]
Above matches a ( and captures a sequence of zero or more non-) characters, followed by a ) and pushes it to the inPar array.
The first regex does essentially the same, but uses the entire match including the opening ( parenthesis (which is later removed by mapping the array) instead of a capturing group.
From the question I assume the closing ) parenthesis is expected to be in the resulting strings, otherwise here are the updated solutions without the closing parenthesis:
For the first solution (using s.slice(1, -1)):
var inPar = userIn.match(/\([^)]*\)/g).map(function(s){ return s.slice(1, -1);});
For the second solution (\) outside of capturing group):
userIn.replace(/\(([^)]*)\)/g, function(s, m) { inPar.push(m); });
You could try the below:
"Cu(NO3)2".match(/(\S\S\d)/gi) // returns NO3
"Cu(NO3)2(CO2)3".match(/(\S\S\d)/gi) // returns NO3 CO2
I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)
[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();
Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.
Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.
There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g
The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');
You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);
I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);
Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");
#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i
string str contains somewhere within it http://www.example.com/ followed by 2 digits and 7 random characters (upper or lower case). One possibility is http://www.example.com/45kaFkeLd or http://www.example.com/64kAleoFr. So the only certain aspect is that it always starts with 2 digits.
I want to retrieve "64kAleoFr".
var url = str.match([regex here]);
The regex you’re looking for is /[0-9]{2}[a-zA-Z]{7}/.
var string = 'http://www.example.com/64kAleoFr',
match = (string.match(/[0-9]{2}[a-zA-Z]{7}/) || [''])[0];
console.log(match); // '64kAleoFr'
Note that on the second line, I use the good old .match() trick to make sure no TypeError is thrown when no match is found. Once this snippet has executed, match will either be the empty string ('') or the value you were after.
you could use
var url = str.match(/\d{2}.{7}$/)[0];
where:
\d{2} //two digits
.{7} //seven characters
$ //end of the string
if you don't know if it will be at the end you could use
var url = str.match(/\/\d{2}.{7}$/)[0].slice(1); //grab the "/" at the begining and slice it out
what about using split ?
alert("http://www.example.com/64kAleoFr".split("/")[3]);
var url = "http://www.example.com/",
re = new RegExp(url.replace(/\./g,"\\.") + "(\\d{2}[A-Za-z]{7})");
str = "This is a string with a url: http://www.example.com/45kaFkeLd in the middle.";
var code = str.match(re);
if (code != null) {
// we have a match
alert(code[1]); // "45kaFkeLd"
}
The url needs to be part of the regex if you want to avoid matching other strings of characters elsewhere in the input. The above assumes that the url should be configurable, so it constructs a regex from the url variable (noting that "." has special meaning in a regex so it needs to be escaped). The bit with the two numbers and seven letter is then in parentheses so it can be captured.
Demo: http://jsfiddle.net/nnnnnn/NzELc/
http://www\\.example\\.com/([0-9]{2}\\w{7}) this is your pattern. You'll get your 2 digits and 7 random characters in group 1.
If you notice your example strings, both strings have few digits and a random string after a slash (/) and if the pattern is fixed then i would rather suggest you to split your string with slash and get the last element of the array which was the result of the split function.
Here is how:
var string = "http://www.example.com/64kAleoFr"
ar = string.split("/");
ar[ar.length - 1];
Hope it helps