regex to parse string with escaped characters

regex to parse string with escaped characters - javascript

I am reading information out of a formatted string.
The format looks like this:
"foo:bar:beer:123::lol"
Everything between the ":" is data I want to extract with regex. If a : is followed by another : (like "::") the data for this has to be "" (an empty string).
Currently I am parsing it with this regex:
(.*?)(:|$)
Now it came to my mind that ":" may exist within the data, as well. So it has to be escaped.
Example:
"foo:bar:beer:\::1337"
How can I change my regular expression so that it matches the "\:" as data, too?
Edit: I am using JavaScript as programming language. It seems to have some limitations regarding complex regulat expressions. The solution should work in JavaScript, as well.
Thanks,
McFarlane

var myregexp = /((?:\\.|[^\\:])*)(?::|$)/g;
var match = myregexp.exec(subject);
while (match != null) {
for (var i = 0; i < match.length; i++) {
// Add match[1] to the list of matches
}
match = myregexp.exec(subject);
}
Input: "foo:bar:beer:\\:::1337"
Output: ["foo", "bar", "beer", "\\:", "", "1337", ""]
You'll always get an empty string as the last match. This is unavoidable given the requirement that you also want empty strings to match between delimiters (and the lack of lookbehind assertions in JavaScript).
Explanation:
( # Match and capture:
(?: # Either match...
\\. # an escaped character
| # or
[^\\:] # any character except backslash or colon
)* # zero or more times
) # End of capturing group
(?::|$) # Match (but don't capture) a colon or end-of-string

Here's a solution:
function tokenize(str) {
var reg = /((\\.|[^\\:])*)/g;
var array = [];
while(reg.lastIndex < str.length) {
match = reg.exec(str);
array.push(match[0].replace(/\\(\\|:)/g, "$1"));
reg.lastIndex++;
}
return array;
}
It splits a string into token depending on the : character.
But you can escape the : character with \ if you want it to be part of a token.
you can escape the \ with \ if you want it to be part of a token
any other \ won't be interpreted. (ie: \a remains \a)
So you can put any data in your tokens provided that data is correctly formatted before hand.
Here is an example with the string \a:b:\n::\\:\::x, which should give these token: \a, b, \n, <empty string>, \, :, x.
>>> tokenize("\\a:b:\\n::\\\\:\\::x");
["\a", "b", "\n", "", "\", ":", "x"]
In an attempt to be clearer: the string put into the tokenizer will be interpreted, it has 2 special character: \ and :
\ will only have a special meaning only if followed by \ or :, and will effectively "escape" these character: meaning that they will loose their special meaning for tokenizer, and they'll be considered as any normal character (and thus will be part of tokens).
: is the marker separating 2 tokens.
I realize the OP didn't ask for slash escaping, but other viewers could need a complete parsing library allowing any character in data.

Use a negative lookbehind assertion.
(.*?)((?<!\\):|$)
This will only match : if it's not preceded by \.

Related

Javascript Regex with variable and $1

I have read How do you pass a variable to a Regular Expression javascript
I'm looking to create a regular expression to get and replace a value with a variable..
section = 'abc';
reg = new RegExp('\[' + section + '\]\[\d+\]','g');
num = duplicate.replace(reg,"$1++");
where $1 = \d+ +1
and... without increment... it doesn't work...
it returns something like:
[abc]$1
Any idea?

Your regex is on the right track, however to perform any kind of operation you must use a replacement callback:
section = "abc";
reg = new RegExp("(\\["+section+"\\]\\[)(\\d+)(\\])","g");
num = duplicate.replace(reg,function(_,before,number,after) {
return before + (parseInt(number,10)+1) + after;
});

I think you need to read up more on Regular Expressions. Your current regular expression comes out to:
/[abc][d+]/g
Which will match an "a" "b" or "c", followed by a "d" or "+", like: ad or c+ or bd or even zebra++ etc.
A great resource to get started is: http://www.regular-expressions.info/javascript.html

I see at least two problems.
The \ character has a special meaning in JavaScript strings. It is used to escape special characters in the string. For example: \n is a new line, and \r is a carriage return. You can also escape quotes and apostrophes to include them in your string: "This isn't a normally \"quoted\" string... It has actual \" characters inside the string as well as delimiting it."
The second problem is that, in order to use a backreference ($1, $2, etc.) you must provide a capturing group in your pattern (the regex needs to know what to backreference). Try changing your pattern to:
'\\[' + section + '\\]\\[(\\d+)\\]'
Note the double-backslashes. This escapes the backslash character itself, allowing it to be a literal \ in a string. Also note the use of ( and ) (the capturing group). This tells the regex what to capture for $1.
After the regex is instantiated, with section === 'abc':
new RegExp('\\[' + section + '\\]\\[(\\d+)\\]', 'g');
Your pattern is now:
/\[abc\]\[(\d+)\]/g
And your .replace will return \d+++ (where \d+ is the captured digits from the input string).
Demo: http://jsfiddle.net/U46yx/

regexp to quote only string matches (not numbers)

I'm struggling with string:
"some text [2string] some another[test] and another [4]";
trying to quote every value but number within [], so it could be converted into
"some text ['2string'] some another['test'] and another [4]"
Thanks.

You need a regex that
matches content between [], i. e. a [, any number of characters except ], then a ]
asserts that there is at least one other character besides digits here.
You can solve this using character classes and negative lookahead assertions:
result = subject.replace(/\[(?!\d+\])([^\]]*)\]/g, "['$1']");
Explanation:
\[ # Match [
(?! # Assert that it's impossible to match...
\d+ # one or more digits
\] # followed by ]
) # End of lookahead assertion
( # Match and capture in group number 1:
[^\]]* # any number of characters except ]
) # End of capturing group
\] # Match ]

A longer, but IMO cleaner approach, if performance is not a big concern:
var string = "some text [2string] some another[test] and another [4]";
var output = string.replace(/(\[)(.*?)(\])/g, function(match, a, b, c) {
if(/^\d+$/.test(b)) {
return match;
} else {
return a + "'" + b + "'" + c;
}
});
console.log(output);
You basically match every expression inside square brackets, then test to see if it's a number. If it is, return the string as-it-is, otherwise insert quotes at the specific places.
Output:
some text ['2string'] some another['test'] and another [4]

I'd try something like \[(\d*?[a-z]\w*?)]. This should match any [...] as long as there's at least one letter inside. If underscores (_) aren't valid, replace the \w at the end with [a-z].
\[ is just a simple match for [, it has to be escaped due to the special meaning of [.
\d*? will match any amount of digits (or none), but as few as possible to fulfill the match.
[a-z] will match any character within the given range.
\w*? will match any "word" (alphanumeric) characters (letters, digits, and underscores), again as few as possible to fulfill the match.
] is another simple match, this one doesn't have to be escaped, as it's not misleading (no open [ at this level). It can be escaped, but this is usually a style preference (depends on the actual regex engine).

You can replace it with this regex
input.replace(/(?!\d+\])(\w+)(?=\])/g, "'$1'");

another solution that add a simple regex to your attempt:
str.split('[').join("['").split(']').join("']").replace(/\['(\d+)'\]/, "[$1]");

Commenting Regular Expressions

I'm trying to comment regular expressions in JavaScript.
There seems to be many resources on how to remove comments from code using regex, but not actually how to comment regular expressions in JavaScript so they are easier to understand.

Unfortunately, JavaScript doesn't have a verbose mode for regular expression literals like some other langauges do. You may find this interesting, though.
In lieu of any external libraries, your best bet is just to use a normal string and comment that:
var r = new RegExp(
'(' + //start capture
'[0-9]+' + // match digit
')' //end capture
);
r.test('9'); //true

While Javascript doesn't natively support multi-line and commented regular expressions, it's easy enough to construct something that accomplishes the same thing - use a function that takes in a (multi-line, commented) string and returns a regular expression from that string, sans comments and newlines.
The following snippet imitates the behavior of other flavors' x ("extended") flag, which ignores all whitespace characters in a pattern as well as comments, which are denoted with #:
function makeExtendedRegExp(inputPatternStr, flags) {
// Remove everything between the first unescaped `#` and the end of a line
// and then remove all unescaped whitespace
const cleanedPatternStr = inputPatternStr
.replace(/(^|[^\\])#.*/g, '$1')
.replace(/(^|[^\\])\s+/g, '$1');
return new RegExp(cleanedPatternStr, flags);
}
// The following switches the first word with the second word:
const input = 'foo bar baz';
const pattern = makeExtendedRegExp(String.raw`
^ # match the beginning of the line
(\w+) # 1st capture group: match one or more word characters
\s # match a whitespace character
(\w+) # 2nd capture group: match one or more word characters
`);
console.log(input.replace(pattern, '$2 $1'));
Ordinarily, to represent a backslash in a Javascript string, one must double-escape each literal backslash, eg str = 'abc\\def'. But regular expressions often use many backslashes, and the double-escaping can make the pattern much less readable, so when writing a Javascript string with many backslashes it's a good idea to use a String.raw template literal, which allows a single typed backslash to actually represent a literal backslash, without additional escaping.
Just like with the standard x modifier, to match an actual # in the string, just escape it first, eg
foo\#bar # comments go here
// this function is exactly the same as the one in the first snippet
function makeExtendedRegExp(inputPatternStr, flags) {
// Remove everything between the first unescaped `#` and the end of a line
// and then remove all unescaped whitespace
const cleanedPatternStr = inputPatternStr
.replace(/(^|[^\\])#.*/g, '$1')
.replace(/(^|[^\\])\s+/g, '$1');
return new RegExp(cleanedPatternStr, flags);
}
// The following switches the first word with the second word:
const input = 'foo#bar baz';
const pattern = makeExtendedRegExp(String.raw`
^ # match the beginning of the line
(\w+) # 1st capture group: match one or more word characters
\# # match a hash character
(\w+) # 2nd capture group: match one or more word characters
`);
console.log(input.replace(pattern, '$2 $1'));
Note that to match a literal space character (and not just any whitespace character), while using the x flag in any environment (including the above), you have to escape the space with a \ first, eg:
^(\S+)\ (\S+) # capture the first two words
If you want to frequently match space characters, this can get a bit tedious and make the pattern harder to read, similar to how double-escaping backslashes isn't very desirable. One possible (non-standard) modification to permit unescaped space characters would be to only strip out spaces at the beginning and end of a line, and spaces before a # comment:
function makeExtendedRegExp(inputPatternStr, flags) {
// Remove the first unescaped `#`, any preceeding unescaped spaces, and everything that follows
// and then remove leading and trailing whitespace on each line, including linebreaks
const cleanedPatternStr = inputPatternStr
.replace(/(^|[^\\]) *#.*/g, '$1')
.replace(/^\s+|\s+$|\n/gm, '');
console.log(cleanedPatternStr);
return new RegExp(cleanedPatternStr, flags);
}
// The following switches the first word with the second word:
const input = 'foo bar baz';
const pattern = makeExtendedRegExp(String.raw`
^ # match the beginning of the line
(\w+) (\w+) # capture the first two words
`);
console.log(input.replace(pattern, '$2 $1'));

In several other languages (notably Perl), there's the special x flag. When set, the regexp ignores any whitespace and comments inside of it. Sadly, javascript regexps do not support the x flag.
Lacking syntax, the only way to leverage readability is convention. Mine is to add a comment before the tricky regular expression, containing it as if you've had the x flag. Example:
/*
\+? #optional + sign
(\d*) #the integeric part
( #begin decimal portion
\.
\d+ #decimal part
)
*/
var re = /\+?(\d*)(\.\d+)/;
For more complex examples, you can see what I've done with the technique here and here.

In 2021 we can do this using template literals which have String.raw() applied to it.
VerboseRegExp `
(
foo* // zero or more foos
(?: bar | baz ) // bar or baz
quux? // maybe a quux
)
\s \t \r \n \[ \] \\ \/ \`
H e l l o // invisible whitespace is ignored ...
[ ] // ... unless you put it in a character class
W o r l d !
$ {} // Separate with whitespace to avoid interpolation!
`
`gimy` // flags go here
/*
returns the RegExp
/(foo*(?:bar|baz)quux?)\s\t\r\n\[\]\\\/\`Hello[ ]World!${}/gimy
*/
The implementation of VerboseRegExp:
const VerboseRegExp = (function init_once () {
const cleanupregexp = /(?<!\\)[\[\]]|\s+|\/\/[^\r\n]*(?:\r?\n|$)/g
return function first_parameter (pattern) {
return function second_parameter (flags) {
flags = flags.raw[0].trim()
let in_characterclass = false
const compressed = pattern.raw[0].replace(
cleanupregexp,
function on_each_match (match) {
switch (match) {
case '[': in_characterclass = true; return match
case ']': in_characterclass = false; return match
default: return in_characterclass ? match : ''
}
}
)
return flags ? new RegExp(compressed, flags) : new RegExp(compressed)
}
}
})()
See Verbose Regular Expressions
in JavaScript for what .raw[0] does.
Notice that, unlike regex literals, the Javascript parser will not cache this, so save the generated regexp in a variable if you reuse it.

I would suggest you to put a regular comment above the line with the regular expression in order to explain it.
You will have much more freedom.

You can use verbose-regexp package.
import { rx } from 'verbose-regexp'
const dateTime = rx`
(\d{4}) // year
- // separator
(\d{2}) // month
`
// returns RegExp /(\d{4})-(\d{2})/

Perl's /x flag (allows whitespace and #comments) is a Javascript language proposal, but stuck at stage 1 (of 4) of the process.
The modifiers proposal, e.g. /(?i:ignore case)normal/ now at stage 3 had the x flag removed from it.

Regex needed to split a string by "."

I am in need for a regex in Javascript. I have a string:
'*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5'
I want to split this string by periods such that I get an array:
[
'*window',
'some1',
'some\.2', //ignore the . because it's escaped
'(a.b ? cc\.c : d.n [a.b, cc\.c])', //ignore everything inside ()
'some\.3',
'(this.o.p ? ".mike." [ff\.])',
'some5'
]
What regex will do this?

var string = '*window.some1.some\\.2.(a.b + ")" ? cc\\.c : d.n [a.b, cc\\.c]).some\\.3.(this.o.p ? ".mike." [ff\\.]).some5';
var pattern = /(?:\((?:(['"])\)\1|[^)]+?)+\)+|\\\.|[^.]+?)+/g;
var result = string.match(pattern);
result = Array.apply(null, result); //Convert RegExp match to an Array
Fiddle: http://jsfiddle.net/66Zfh/3/
Explanation of the RegExp. Match a consecutive set of characters, satisfying:
/ Start of RegExp literal
(?: Create a group without reference (example: say, group A)
\( `(` character
(?: Create a group without reference (example: say, group B)
(['"]) ONE `'` OR `"`, group 1, referable through `\1` (inside RE)
\) `)` character
\1 The character as matched at group 1, either `'` or `"`
| OR
[^)]+? Any non-`)` character, at least once (see below)
)+ End of group (B). Let this group occur at least once
| OR
\\\. `\.` (escaped backslash and dot, because they're special chars)
| OR
[^.]+? Any non-`.` character, at least once (see below)
)+ End of group (A). Let this group occur at least once
/g "End of RegExp, global flag"
/*Summary: Match everything which is not satisfying the split-by-dot
condition as specified by the OP*/
There's a difference between + and +?. A single plus attempts to match as much characters as possible, while a +? matches only these characters which are necessary to get the RegExp match. Example: 123 using \d+? > 1 and \d+ > 123.
The String.match method performs a global match, because of the /g, global flag. The match function with the g flag returns an array consisting of all matches subsequences.
When the g flag is omitted, only the first match will be selected. The array will then consist of the following elements:
Index 0: <Whole match>
Index 1: <Group 1>

The regex below :
result = subject.match(/(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g);
Can be used to acquire the desired results. Group 1 has the results since you want to omit the .
Use this :
var myregexp = /(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))/g;
var match = myregexp.exec(subject);
while (match != null) {
for (var i = 0; i < match.length; i++) {
// matched text: match[i]
}
match = myregexp.exec(subject);
}
Explanation :
// (?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))
//
// Match the regular expression below «(?:(\(.*?[^'"]\)|.*?[^\\])(?:\.|$))»
// Match the regular expression below and capture its match into backreference number 1 «(\(.*?[^'"]\)|.*?[^\\])»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\(.*?[^'"]\)»
// Match the character “(” literally «\(»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match a single character NOT present in the list “'"” «[^'"]»
// Match the character “)” literally «\)»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «.*?[^\\]»
// Match any single character that is not a line break character «.*?»
// Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
// Match any character that is NOT a “A \ character” «[^\\]»
// Match the regular expression below «(?:\.|$)»
// Match either the regular expression below (attempting the next alternative only if this one fails) «\.»
// Match the character “.” literally «\.»
// Or match regular expression number 2 below (the entire group fails if this one fails to match) «$»
// Assert position at the end of the string (or before the line break at the end of the string, if any) «$»

It is notoriously difficult to use a Regex to do balanced parenthesis matching, especially in Javascript.
You would be way better off creating your own parser. Here's a clever way to do this that will utilize the strength of Regex's:
Create a Regex that matches and captures any "pattern of interest" - /(?:(\\.)|([\(\[\{])|([\)\]\}])|(\.))/g
Use string.replace(pattern, function (...)), and in the function, keep a count of opening braces and closing braces.
Add the matching text to a buffer.
If the split character is found and the opening and closing braces are balanced, add the buffer to your results array.
This solution will take a bit of work, and requires knowledge of closures, and you should probably see the documentation of string.replace, but I think it is a great way to solve your problem!
Update:
After noticing the number of questions related to this one, I decided to take on the above challenge.
Here is the live code to use a Regex to split a string.
This code has the following features:
Uses a Regex pattern to find the splits
Only splits if there are balanced parenthesis
Only splits if there are balanced quotes
Allows escaping of parenthesis, quotes, and splits using \
This code will work perfectly for your example.

not need regex for this work.
var s = '*window.some1.some\.2.(a.b + ")" ? cc\.c : d.n [a.b, cc\.c]).some\.3.(this.o.p ? ".mike." [ff\.]).some5';
console.log(s.match(/(?:\([^\)]+\)|.*?\.)/g));
output:
["*window.", "some1.", "some.", "2.", "(a.b + ")", "" ? cc.", "c : d.", "n [a.", "b, cc.", "c]).", "some.", "3.", "(this.o.p ? ".mike." [ff.])", "."]

So, was working with this, and now I see that #FailedDev is rather not a failure, since that was pretty nice. :)
Anyhow, here's my solution. I'll just post the regex only.
((\(.*?((?<!")\)(?!")))|((\\\.)|([^.]))+)
Sadly this won't work in your case however, since I'm using negative lookbehind, which I don't think is supported by javascript regex engine. It should work as intended in other engines however, as can be confirmed here: http://gskinner.com/RegExr/. Replace with $1\n.

List of all characters that should be escaped before put in to RegEx?

Could someone please give a complete list of special characters that should be escaped?
I fear I don't know some of them.

Take a look at PHP.JS's implementation of PHP's preg_quote function, that should do what you need:
http://phpjs.org/functions/preg_quote:491
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -

According to this site, the list of characters to escape is
[, the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening round bracket ( and the closing round bracket ).
In addition to that, you need to escape characters that are interpreted by the Javascript interpreter as end of the string, that is either ' or ".

Based off of Tatu Ulmanen's answer, my solution in C# took this form:
private static List<string> RegexSpecialCharacters = new List<string>
{
"\\",
".",
"+",
"*",
"?",
"[",
"^",
"]",
"$",
"(",
")",
"{",
"}",
"=",
"!",
"<",
">",
"|",
":",
"-"
};
foreach (var rgxSpecialChar in RegexSpecialCharacters)
rgxPattern = input.Replace(rgxSpecialChar, "\\" + rgxSpecialChar);
Note that I have switched the positions of '\' and '.', failure to process the slashes first will lead to doubling up of the '\'s
Edit
Here is a javascript translation
var regexSpecialCharacters = [
"\\", ".", "+", "*", "?",
"[", "^", "]", "$", "(",
")", "{", "}", "=", "!",
"<", ">", "|", ":", "-"
];
regexSpecialCharacters.forEach(rgxSpecChar =>
input = input.replace(new RegExp("\\" + rgxSpecChar,"gm"), "\\" +
rgxSpecChar))

Inside a character set, to match a literal hyphen -, it needs to be escaped when not positioned at the start or the end. For example, given the position of the last hyphen in the following pattern, it needs to be escaped:
[a-z0-9\-_]+
But it doesn't need to be escaped here:
[a-z0-9_-]+
If you fail to escape a hyphen, the engine will attempt to interpret it as a range between the preceding character and the next character (just like a-z matches any character between a and z).
Additionally, /s do not be escaped inside a character set (though they do need to be escaped when outside a character set). So, the following syntax is valid;
const pattern = /[/]/;

I was looking for this list in regards to ESLint's "no-useless-escape" setting for reg-ex. And found some of these characters mentioned do not need to be escaped for a regular-expression in JS. The longer list in the other answer here is for PHP, which does require the additional characters to be escaped.
In this github issue for ESLint, about halfway down, user not-an-aardvark explains why the character referenced in the issue is a character that should maybe be escaped.
In javascript, a character that NEEDS to be escaped is a syntax character, or one of these:
^ $ \ . * + ? ( ) [ ] { } |
The response to the github issue I linked to above includes explanation about "Annex B" semantics (which I don't know much about) which allows 4 of the above mentioned characters to be UNescaped: ) ] { }.
Another thing to note is that escaping a character that doesn't require escaping won't do any harm (except maybe if you're trying to escape the escape character). So, my personal rule of thumb is: "When in doubt, escape"

The problem:
const character = '+'
new RegExp(character, 'gi') // error
Smart solutions:
// with babel-polyfill
// Warning: will be removed from babel-polyfill v7
const character = '+'
const escapeCharacter = RegExp.escape(character)
new RegExp(escapeCharacter, 'gi') // /\+/gi
// ES5
const character = '+'
const escapeCharacter = escapeRegExp(character)
new RegExp(escapeCharacter, 'gi') // /\+/gi
function escapeRegExp(string){
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')
}

The answer here has become a bit more complicated with the introduction of Unicode regular expressions in JavaScript (that is, regular expressions constructed with the u flag). In particular:
Non-unicode regular expressions support "identity" escapes; that is, if a character does not have a special interpretation in the regular expression pattern, then escaping it does nothing. This implies that /a/ and /\a/ will match in an identical way.
Unicode regular expressions are more strict -- attempting to escape a character not considered "special" is an error. For example, /\a/u is not a valid regular expression.
The set of specially-interpreted characters can be divined from the ECMAScript standard; for example, with ECMAScript 2021, https://262.ecma-international.org/12.0/#sec-patterns, we see the following "syntax" characters:
SyntaxCharacter :: one of
^ $ \ . * + ? ( ) [ ] { } |
In particular, in contrast to other answers, note that the !, <, >, : and - are not considered syntax characters. Instead, these characters might only have a special interpretation in specific contexts.
For example, the < and > characters only have a special interpretation when used as a capturing group name; e.g. as in
/(?<name>\w+)/
And because < and > are not considered syntax characters, escaping them is an error in unicode regular expressions.
> /\</
/\</
> /\</u
Uncaught SyntaxError: Invalid regular expression: /\</: Invalid escape
Additionally, the - character is only specially interpreted within a character class, when used to express a character range, as in e.g.
/[a-z]/
It is valid to escape a - within a character class, but not outside a character class, for unicode regular expressions.
> /\-/
/\-/
> /\-/u
Uncaught SyntaxError: Invalid regular expression: /\-/: Invalid escape
> /[-]/
/[-]/
> /[\-]/u
/[\-]/u
For a regular expression constructed using the / / syntax (as opposed to new RegExp()), interior slashes (/) would need to be escaped, but this is required for the JavaScript parser rather than the regular expression itself, to avoid ambiguity between a / acting as the end marker for a pattern versus a literal / in the pattern.
> /\//.test("/")
true
> new RegExp("/").test("/")
true
Ultimately though, if your goal is to escape characters so they are not specially interpreted within a regular expression, it should suffice to escape only the syntax characters. For example, if we wanted to match the literal string (?:hello), we might use:
> /\(\?:hello\)/.test("(?:hello)")
true
> /\(\?:hello\)/u.test("(?:hello)")
true
Note that the : character is not escaped. It might seem necessary to escape the : character because it has a special interpretation in the pattern (?:hello), but because it is not considered a syntax character, escaping it is unnecessary. (Escaping the preceding ( and ? characters is enough to ensure : is not interpreted specially.)
Above code snippets were tested with:
$ node -v
v16.14.0
$ node -p process.versions.v8
9.4.146.24-node.20

We Keep Coding

JavaScript is the programming language of the Web.

regex to parse string with escaped characters - javascript

Use a negative lookbehind assertion. (.*?)((?<!\\):|$) This will only match : if it's not preceded by \.

Related

Javascript Regex with variable and $1

regexp to quote only string matches (not numbers)

Commenting Regular Expressions

Regex needed to split a string by "."

List of all characters that should be escaped before put in to RegEx?

Categories

Resources