Create RegExps on the fly using string variables - javascript

Say I wanted to make the following re-usable:
function replace_foo(target, replacement) {
return target.replace("string_to_replace",replacement);
}
I might do something like this:
function replace_foo(target, string_to_replace, replacement) {
return target.replace(string_to_replace,replacement);
}
With string literals this is easy enough. But what if I want to get a little more tricky with the regex? For example, say I want to replace everything but string_to_replace. Instinctually I would try to extend the above by doing something like:
function replace_foo(target, string_to_replace, replacement) {
return target.replace(/^string_to_replace/,replacement);
}
This doesn't seem to work. My guess is that it thinks string_to_replace is a string literal, rather than a variable representing a string. Is it possible to create JavaScript regexes on the fly using string variables? Something like this would be great if at all possible:
function replace_foo(target, string_to_replace, replacement) {
var regex = "/^" + string_to_replace + "/";
return target.replace(regex,replacement);
}

There's new RegExp(string, flags) where flags are g or i. So
'GODzilla'.replace( new RegExp('god', 'i'), '' )
evaluates to
zilla

With string literals this is easy enough.
Not really! The example only replaces the first occurrence of string_to_replace. More commonly you want to replace all occurrences, in which case, you have to convert the string into a global (/.../g) RegExp. You can do this from a string using the new RegExp constructor:
new RegExp(string_to_replace, 'g')
The problem with this is that any regex-special characters in the string literal will behave in their special ways instead of being normal characters. You would have to backslash-escape them to fix that. Unfortunately, there's not a built-in function to do this for you, so here's one you can use:
function escapeRegExp(s) {
return s.replace(/[-/\\^$*+?.()|[\]{}]/g, '\\$&')
}
Note also that when you use a RegExp in replace(), the replacement string now has a special character too, $. This must also be escaped if you want to have a literal $ in your replacement text!
function escapeSubstitute(s) {
return s.replace(/\$/g, '$$$$');
}
(Four $s because that is itself a replacement string—argh!)
Now you can implement global string replacement with RegExp:
function replace_foo(target, string_to_replace, replacement) {
var relit= escapeRegExp(string_to_replace);
var sub= escapeSubstitute(replacement);
var re= new RegExp(relit, 'g');
return target.replace(re, sub);
}
What a pain. Luckily if all you want to do is a straight string replace with no additional parts of regex, there is a quicker way:
s.split(string_to_replace).join(replacement)
...and that's all. This is a commonly-understood idiom.
say I want to replace everything but string_to_replace
What does that mean, you want to replace all stretches of text not taking part in a match against the string? A replacement with ^ certainly doesn't this, because ^ means a start-of-string token, not a negation. ^ is only a negation in [] character groups. There are also negative lookaheads (?!...), but there are problems with that in JScript so you should generally avoid it.
You might try matching ‘everything up to’ the string, and using a function to discard any empty stretch between matching strings:
var re= new RegExp('(.*)($|'+escapeRegExp(string_to_find)+')')
return target.replace(re, function(match) {
return match[1]===''? match[2] : replacement+match[2];
});
Here, again, a split might be simpler:
var parts= target.split(string_to_match);
for (var i= parts.length; i-->0;)
if (parts[i]!=='')
parts[i]= replacement;
return parts.join(string_to_match);

As the others have said, use new RegExp(pattern, flags) to do this. It is worth noting that you will be passing string literals into this constructor, so every backslash will have to be escaped. If, for instance you wanted your regex to match a backslash, you would need to say new RegExp('\\\\'), whereas the regex literal would only need to be /\\/. Depending on how you intend to use this, you should be wary of passing user input to such a function without adequate preprocessing (escaping special characters, etc.) Without this, your users may get some very unexpected results.

Yes you can.
https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions
function replace_foo(target, string_to_replace, replacement) {
var regex = new RegExp("^" + string_to_replace);
return target.replace(regex, replacement);
}

A really simple solution to this is this:
function replace(target, string_to_replace, replacement) {
return target.split(string_to_replace).join(replacement);
}
No need for Regexes at all
It also seems to be the fastest on modern browsers https://jsperf.com/replace-vs-split-join-vs-replaceall

I think I have very good example for highlight text in string (it finds not looking at register but highlighted using register)
function getHighlightedText(basicString, filterString) {
if ((basicString === "") || (basicString === null) || (filterString === "") || (filterString === null)) return basicString;
return basicString.replace(new RegExp(filterString.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\\\$&'), 'gi'),
function(match)
{return "<mark>"+match+"</mark>"});
}
http://jsfiddle.net/cdbzL/1258/

Related

Javascript regular expression: save matching value

I have the following expression in JS (typescript, but I think everyone understands what it transpiles to):
markString(text: string) {
const regEx = new RegExp(this.partToHighlight, 'ig');
return text.replace(regEx, `<mark>${this.partToHighlight}<\/mark>`);
}
The problem is that this way, with the 'ig'-option, the matching value can have any case, upper or lower, but is always replaced by the partToHighlight-value. Instead the function should read the matched value, save it and output it surrounded with the HTML-tags. How do I do this? I am pretty sure this is a duplicate question, but I couldn't find the one asked before.
You need to replace with the found match, $&:
markString(text: string) {
const regEx = new RegExp(this.partToHighlight, 'ig');
return text.replace(regEx, "<mark>$&</mark>");
}
Using $&, you replace with found match with the the same text found and do not need to hardcode the replacement, nor use any callbacks.
See "Specifying a string as a parameter" for more details.
As mentionned in comments you will need to use RegExp.lastMatch or $&, to point out to the matched substring, in your replace() method:
const regEx = new RegExp(this.partToHighlight, 'ig');
return text.replace(regEx, `<mark>$&<\/mark>`);

Regex return undefined in a JavaScript String

I have a little code snippet where I use Regular Expressions to rip off punctuation, numbers etc from a string. I am getting undefined along with output of my ripped string. Can someone explain whats happening? Thanks
var regex = /[^a-zA-z\s\.]|_/gi;
function ripPunct(str) {
if ( str.match(regex) ) {
str = str.replace(regex).replace(/\s+/g, "");
}
return str;
}
console.log(ripPunct("*#£#__-=-=_+_devide-00000110490and586#multiply.edu"));
You should pass a replacement pattern to the first replace method, and also use A-Z, not A-z, in the pattern. Also, there is no point to check for a match before replacing, just use replace directly. Also, it seems the second chained replace is redundant as the first one already removes whitespace (it contains \s). Besides, the |_ alternative is also redundant since the [^a-zA-Z\s.] already matches an underscore as it is not part of the symbols specified by this character class.
var regex = /[^a-zA-Z\s.]/gi;
function ripPunct(str) {
return str.replace(regex, "");
}
console.log(ripPunct("*#£#__-=-=_+_devide-00000110490and586#multiply.edu"));

Replace array of chars in a string

I'm looking for a function that does replaceAll with any start and endcharacter.
I know I can use the regex notation:
string=string.replace(/a/g,"b");
However, because the searched char is in a regex, I sometimes need to escape that character and sometimes not, which is annoying if I want to do this for a full list of chars
convertEncoding= function(string) {
var charMap= {'"':""",'&':"&",...}
for (startChar in charMap) {
endChar=charMap[startChar];
string= string.replaceAll(startChar,endChar);
}
}
Is they a good way to write that function replaceAll, without doing a for loop and using String.replace() (eg the naive way) ?
you can use escape the RegExp special characters in the strings such as described here https://stackoverflow.com/a/6969486/519995:
and then you can use the regexp global replace
function escapeRegExp(str) {
return str.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
}
for (startChar in charMap) {
endChar=charMap[startChar];
result = string.replace(RegExp(escapeRegExp(startChar), 'g'), endChar);
}

Regular expression that remove second occurrence of a character in a string

I'm trying to write a JavaScript function that removes any second occurrence of a character using the regular expression. Here is my function
var removeSecondOccurrence = function(string) {
return string.replace(/(.*)\1/gi, '');
}
It's only removing consecutive occurrence. I'd like it to remove even non consecutive one. for example papirana should become pairn.
Please help
A non-regexp solution:
"papirana".split("").filter(function(x, n, self) { return self.indexOf(x) == n }).join("")
Regexp code is complicated, because JS doesn't support lookbehinds:
str = "papirana";
re = /(.)(.*?)\1/;
while(str.match(re)) str = str.replace(re, "$1$2")
or a variation of the first method:
"papirana".replace(/./g, function(a, n, str) { return str.indexOf(a) == n ? a : "" })
Using a zero-width lookahead assertion you can do something similar
"papirana".replace(/(.)(?=.*\1)/g, "")
returns
"pirna"
The letters are of course the same, just in a different order.
Passing the reverse of the string and using the reverse of the result you can get what you're asking for.
This is how you would do it with a loop:
var removeSecondOccurrence = function(string) {
var results = "";
for (var i = 0; i < string.length; i++)
if (!results.contains(string.charAt(i)))
results += string.charAt(i);
}
Basically: for each character in the input, if you haven't seen that character already, add it to the results. Clear and readable, at least.
What Michelle said.
In fact, I strongly suspect it cannot be done using regular expressions. Or rather, you can if you reverse the string, remove all but the first occurences, then reverse again, but it's a dirty trick and what Michelle suggests is way better (and probably faster).
If you're still hot on regular expressions...
"papirana".
split("").
reverse().
join("").
replace(/(.)(?=.*\1)/g, '').
split("").
reverse().
join("")
// => "pairn"
The reason why you can't find all but the first occurence without all the flippage is twofold:
JavaScript does not have lookbehinds, only lookaheads
Even if it did, I don't think any regexp flavour allows variable-length lookbehinds

How to split a long regular expression into multiple lines in JavaScript?

I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)
[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();
Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.
Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.
There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g
The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');
You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);
I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);
Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");
#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i

Categories