How to ignore brackets in a regex [duplicate] - javascript

This question already has an answer here:
Escape string for use in Javascript regex [duplicate]
(1 answer)
Closed 3 years ago.
I have a regex that takes a template literal and then matches it against a CSV of conditions and links.
const regex = new RegExp(`^${condition},\/.+`, 'gi');
For example, the variable Sore throat would match
'Sore throat,/conditions/sore-throat/'
I've come across an issue where the template literal might contain brackets and therefore the regex no longer matches. So Diabetes (type 1) doesn't match
'Diabetes (type 1),/conditions/type-1-diabetes/'
I've tried removing the brackets and it's contents from the template literal but there are some cases where the brackets aren't always at the end of the string. Such as, Lactate dehydrogenase (LDH) test
'Lactate dehydrogenase (LDH) test,/conditions/ldh-test/'
I'm not too familiar with regex so apologies if this is simple but I haven't been able to find a way to escape the brackets without knowing exactly where they will be in the string, which in my case isn't possible.

You are trying to use a variable that might contain special characters as part of a regex string, but you /don't/ want those special characters to be interpreted using their "regex" meaning. I'm not aware of any native way to do this in Javascript regex - in Perl, you would use \Q${condition}\E, but that doesn't seem to be supported.
Instead, you should escape your condition variable before passing it into the regex, using a function like this one:
function escapeRegExp(string) {
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
}

Related

Regex non-greedy and global modifiers not working as expected with JavaScript .match [duplicate]

This question already has answers here:
Javascript regular expressions modifier U [duplicate]
(3 answers)
Closed 5 months ago.
I'm trying to create a Regex that will return text that is wrapped by parentheses. For example, in the following string combination:
const regexString = "asdf (asdfasd asdfas) asdfasd asdfasd asdf(asfda) asdfasd (asdfasd)"
the regex should return only: (asdfasd asdfas), (asfda), and (asdfasd) as individual capture groups.
Using regex101.com I was able to put this combination together:
/(\(.+\))/gU
This regex combo works, but when I try to implement this in Javascript .match or even with .exec, I am simply returned the entire string.
For example,
regexString.match(/(\(.+\).*?)/g)
returns the entire string.
I believe the issue has to do my use of the ungreedy .*? modifier and the global /g modifier. Both of these are used in the working example from regex101.com, but I haven't been able to determine exactly why these modifiers or possibly the regex are not functioning the same when I try to use them in Javascript directly.
Thank you for any insight!
I believe you dont get entire string, but by using greedy modifier you get all characters between first opening and last closing parentheses. In your example the returned value is array with single string:
['(asdfasd asdfas) asdfasd asdfasd asdf(asfda) asdfasd (asdfasd)']
You need to change your regex with nongreedy ? to get least possible amount of characters between parentheses
regexString.match(/(\(.+?\).*?)/g)
Then the returned result will be:
['(asdfasd asdfas)', '(asfda)', '(asdfasd)']
what you're searching for is /\([^)]*\)/g
\( : will match the opening parenthese
[^)] : will match any non closing parenthese
* : will match many times the last character
\) : will match a closing parenthese

How to combine regular expressions and template literals in JavaScript? [duplicate]

I am pretty new to Regexp and it seems that the \ is used for meta characters. My problem is I want to search this string exactly \"mediaType\":\"img\"
Now I also want to dynamically put a variable in for img. So I want it to be something like this
new RegExp(`\"mediaType\":\"${variable}\"`)
How do I write this to make it work?
Short answer:
function escapeRegEx(s) {
return s.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&');
}
var expression = new RegExp('\\\\"mediaType\\\\":\\\\"' + escapeRegEx(variable) + '\\\\"');
// or, using a template literal:
var expression = new RegExp(`\\\\"mediaType\\\\":\\\\"${escapeRegEx(variable)}\\\\"`);
Long answer:
Besides being used for meta characters, backslash in regular expressions can be used to escape characters that would otherwise have meaning (like *, $, parentheses, and \). So the way to match a backslash in a regular expression is to add another one as an escape character: \\.
Taking that into account, the regular expression you want to end up with is \\"mediaType\\":\\"img\\", and if you were using a regular expression literal that would be it. Unfortunately it gets a little more involved because you need to create an expression dynamically, you need to provide the expression as a string, which also needs the backslashes escaped. That adds a second layer of escaping, so you need to double up each of the \ characters again, and you end up with new RegExp('\\\\"mediaType\\\\":\\\\"img\\\\"').
Another complication is that you want the contents of variable to be matched literally, not interpreted as a regular expression. Unfortunately, there's no built-in way to automatically escape regular expressions in JavaScript, so you'll need to use one of the solutions in Is there a RegExp.escape function in Javascript?. I used a slightly modified version of the accepted answer that defines it as a standalone function instead of adding it to the RegExp object. The exact solution doesn't matter, as long as you escape the dynamic part.
You just want to use String.raw
const variable = 'text'
const regexp = new RegExp(String.raw `\"mediaType\":\"${variable}\"`)
console.log(regexp)

Javascript Replace all commas not in double quotes [duplicate]

This question already has answers here:
Regex to match all instances not inside quotes
(4 answers)
Closed 3 years ago.
I would like to replace all commas in a comma-delimited string with a pipe ('|') except for those that are found in double quotes. I would prefer to use the JavaScript "replace" function if possible.
My regex knowledge is limited at best. I am able to replace all commas with pipes, but that does not give me the desired result for parsing through the data. I also found a regex on here that removed all commas except those in quotations, but does not implement a pipe or some other delimiter.
(?!\B"[^"]*),(?![^"]*"\B)
Here is an example of what I'm trying to accomplish:
string1 = 1234,Cake,,"Smith,John",,"Status: Acknowledge,Accept",,Red,,
and I would like it to look like:
string1 = 1234|Cake||"Smith,John"||"Status: Ackknowledge,Accept"||Red||
One option is to use a replace callback to replace either a quote or a comma with the quote itself or a pipe respectively:
str = `1234,Cake,,"Smith,John",,"Status: Acknowledge,Accept",,Red,,`;
res = str.replace(/(".*?")|,/g, (...m) => m[1] || '|');
console.log(res)
Another (and IMO better in the long run) would be to use a dedicated parser to work with CSV data. CSV is actually trickier than it looks.
We can simply capture our desired commas using alternation with a simple expression such as:
(".+?")|(,)
Demo
RegEx Circuit
jex.im visualizes regular expressions:

Why is the escape syntax different between two Javascript functions: search vs replace? [duplicate]

This question already has answers here:
JavaScript search() fails to find "()"
(2 answers)
Closed 6 years ago.
Use this string as example
s = "a(b"
These two work as expected
s.search("\\(")
1
s.replace("\(", "")
"ab"
But these don't
s.search("\(")
Uncaught SyntaxError: Invalid regular expression: /(/: Unterminated group
s.replace("\\(", "")
"a(b"
Huh? Why does search require one more escape than replace?
Also, shouldn't string input give a literal search, instead of being interpreted as a regexp? In theory, I shouldn't have to use escape characters at all.
The string literal '\(' is equivalent to '(', so you’re not really escaping anything at all with it.
String#search always interprets its argument as a regular expression; if you want to find an exact match, use String#indexOf.
> s.indexOf('(')
1
s.replace accepts either a string or a regular expression. You’re giving it the string ( in the first case (so it replaces the first opening parenthesis it sees) and \( in the second (not in the string, so it replaces nothing).

Javascript RegExp anomaly [duplicate]

I am trying to build a regexp from static text plus a variable in javascript. Obviously I am missing something very basic, see comments in code below. Help is very much appreciated:
var test_string = "goodweather";
// One regexp we just set:
var regexp1 = /goodweather/;
// The other regexp we built from a variable + static text:
var regexp_part = "good";
var regexp2 = "\/" + regexp_part + "weather\/";
// These alerts now show the 2 regexp are completely identical:
alert (regexp1);
alert (regexp2);
// But one works, the other doesn't ??
if (test_string.match(regexp1))
alert ("This is displayed.");
if (test_string.match(regexp2))
alert ("This is not displayed.");
First, the answer to the question:
The other answers are nearly correct, but fail to consider what happens when the text to be matched contains a literal backslash, (i.e. when: regexp_part contains a literal backslash). For example, what happens when regexp_part equals: "C:\Windows"? In this case the suggested methods do not work as expected (The resulting regex becomes: /C:\Windows/ where the \W is erroneously interpreted as a non-word character class). The correct solution is to first escape any backslashes in regexp_part (the needed regex is actually: /C:\\Windows/).
To illustrate the correct way of handling this, here is a function which takes a passed phrase and creates a regex with the phrase wrapped in \b word boundaries:
// Given a phrase, create a RegExp object with word boundaries.
function makeRegExp(phrase) {
// First escape any backslashes in the phrase string.
// i.e. replace each backslash with two backslashes.
phrase = phrase.replace(/\\/g, "\\\\");
// Wrap the escaped phrase with \b word boundaries.
var re_str = "\\b"+ phrase +"\\b";
// Create a new regex object with "g" and "i" flags set.
var re = new RegExp(re_str, "gi");
return re;
}
// Here is a condensed version of same function.
function makeRegExpShort(phrase) {
return new RegExp("\\b"+ phrase.replace(/\\/g, "\\\\") +"\\b", "gi");
}
To understand this in more depth, follows is a discussion...
In-depth discussion, or "What's up with all these backslashes!?"
JavaScript has two ways to create a RegExp object:
/pattern/flags - You can specify a RegExp Literal expression directly, where the pattern is delimited using a pair of forward slashes followed by any combination of the three pattern modifier flags: i.e. 'g' global, 'i' ignore-case, or 'm' multi-line. This type of regex cannot be created dynamically.
new RegExp("pattern", "flags") - You can create a RegExp object by calling the RegExp() constructor function and pass the pattern as a string (without forward slash delimiters) as the first parameter and the optional pattern modifier flags (also as a string) as the second (optional) parameter. This type of regex can be created dynamically.
The following example demonstrates creating a simple RegExp object using both of these two methods. Lets say we wish to match the word "apple". The regex pattern we need is simply: apple. Additionally, we wish to set all three modifier flags.
Example 1: Simple pattern having no special characters: apple
// A RegExp literal to match "apple" with all three flags set:
var re1 = /apple/gim;
// Create the same object using RegExp() constructor:
var re2 = new RegExp("apple", "gim");
Simple enough. However, there are significant differences between these two methods with regard to the handling of escaped characters. The regex literal syntax is quite handy because you only need to escape forward slashes - all other characters are passed directly to the regex engine unaltered. However, when using the RegExp constructor method, you pass the pattern as a string, and there are two levels of escaping to be considered; first is the interpretation of the string and the second is the interpretation of the regex engine. Several examples will illustrate these differences.
First lets consider a pattern which contains a single literal forward slash. Let's say we wish to match the text sequence: "and/or" in a case-insensitive manner. The needed pattern is: and/or.
Example 2: Pattern having one forward slash: and/or
// A RegExp literal to match "and/or":
var re3 = /and\/or/i;
// Create the same object using RegExp() :
var re4 = new RegExp("and/or", "i");
Note that with the regex literal syntax, the forward slash must be escaped (preceded with a single backslash) because with a regex literal, the forward slash has special meaning (it is a special metacharacter which is used to delimit the pattern). On the other hand, with the RegExp constructor syntax (which uses a string to store the pattern), the forward slash does NOT have any special meaning and does NOT need to be escaped.
Next lets consider a pattern which includes a special: \b word boundary regex metasequence. Say we wish to create a regex to match the word "apple" as a whole word only (so that it won't match "pineapple"). The pattern (as seen by the regex engine) needs to be: \bapple\b:
Example 3: Pattern having \b word boundaries: \bapple\b
// A RegExp literal to match the whole word "apple":
var re5 = /\bapple\b/;
// Create the same object using RegExp() constructor:
var re6 = new RegExp("\\bapple\\b");
In this case the backslash must be escaped when using the RegExp constructor method, because the pattern is stored in a string, and to get a literal backslash into a string, it must be escaped with another backslash. However, with a regex literal, there is no need to escape the backslash. (Remember that with a regex literal, the only special metacharacter is the forward slash.)
Backslash SOUP!
Things get even more interesting when we need to match a literal backslash. Let's say we want to match the text sequence: "C:\Program Files\JGsoft\RegexBuddy3\RegexBuddy.exe". The pattern to be processed by the regex engine needs to be: C:\\Program Files\\JGsoft\\RegexBuddy3\\RegexBuddy\.exe. (Note that the regex pattern to match a single backslash is \\ i.e. each must be escaped.) Here is how you create the needed RegExp object using the two JavaScript syntaxes
Example 4: Pattern to match literal back slashes:
// A RegExp literal to match the ultimate Windows regex debugger app:
var re7 = /C:\\Program Files\\JGsoft\\RegexBuddy3\\RegexBuddy\.exe/;
// Create the same object using RegExp() constructor:
var re8 = new RegExp(
"C:\\\\Program Files\\\\JGsoft\\\\RegexBuddy3\\\\RegexBuddy\\.exe");
This is why the /regex literal/ syntax is generally preferred over the new RegExp("pattern", "flags") method - it completely avoids the backslash soup that can frequently arise. However, when you need to dynamically create a regex, as the OP needs to here, you are forced to use the new RegExp() syntax and deal with the backslash soup. (Its really not that bad once you get your head wrapped 'round it.)
RegexBuddy to the rescue!
RegexBuddy is a Windows app that can help with this backslash soup problem - it understands the regex syntaxes and escaping requirements of many languages and will automatically add and remove backslashes as required when pasting to and from the application. Inside the application you compose and debug the regex in native regex format. Once the regex works correctly, you export it using one of the many "copy as..." options to get the needed syntax. Very handy!
You should use the RegExp constructor to accomplish this:
var regexp2 = new RegExp(regexp_part + "weather");
Here's a related question that might help.
The forward slashes are just Javascript syntax to enclose regular expresions in. If you use normal string as regex, you shouldn't include them as they will be matched against. Therefore you should just build the regex like that:
var regexp2 = regexp_part + "weather";
I would use :
var regexp2 = new RegExp(regexp_part+"weather");
Like you have done that does :
var regexp2 = "/goodweather/";
And after there is :
test_string.match("/goodweather/")
Wich use match with a string and not with the regex like you wanted :
test_string.match(/goodweather/)
While this solution may be overkill for this specific question, if you want to build RegExps programmatically, compose-regexp can come in handy.
This specific problem would be solved by using
import {sequence} from 'compose-regexp'
const weatherify = x => sequence(x, /weather/)
Strings are escaped, so
weatherify('.')
returns
/\.weather/
But it can also accept RegExps
weatherify(/./u)
returns
/.weather/u
compose-regexp supports the whole range of RegExps features, and let one build RegExps from sub-parts, which helps with code reuse and testability.

Categories