RegEx working in JavaScript but not in C# - javascript

I currently have a working WordWrap function in Javascript that uses RegEx. I pass the string I want wrapped and the length I want to begin wrapping the text, and the function returns a new string with newlines inserted at appropriate locations in the string as shown below:
wordWrap(string, width) {
let newString = string.replace(
new RegExp(`(?![^\\n]{1,${width}}$)([^\\n]{1,${width}})\\s`, 'g'), '$1\n'
);
return newString;
}
For consistency purposes I won't go into, I need to use an identical or similar RegEx in C#, but I am having trouble successfully replicating the function. I've been through a lot of iterations of this, but this is what I currently have:
private static string WordWrap(string str, int width)
{
Regex rgx = new Regex("(?![^\\n]{ 1,${" + width + "}}$)([^\\n]{1,${" + width + "}})\\s");
MatchCollection matches = rgx.Matches(str);
string newString = string.Empty;
if (matches.Count > 0)
{
foreach (Match match in matches)
{
newString += match.Value + "\n";
}
}
else
{
newString = "No matches found";
}
return newString;
}
This inevitably ends up finding no matches regardless of the string and length I pass. I've read that the RegEx used in JavaScript is different than the standard RegEx functionality in .NET. I looked into PCRE.NET but have had no luck with that either.
Am I heading in the right general direction with this? Can anyone help me convert the first code block in JavaScript to something moderately close in C#?
edit: For those looking for more clarity on what the working function does and what I am looking for the C# function to do: What I am looking to output is a string that has a newline (\n) inserted at the width passed to the function. One thing I forgot to mention (but really isn't related to my issue here) is that the working JavaScript version finds the end of the word so it doesn't cut up the word. So for example this string:
"This string is really really long so we want to use the word wrap function to keep it from running off the page.\n"
...would be converted to this with the width set to 20:
"This string is really \nreally long so we want \nto use the word wrap \nfunction to keep it \nfrom running off the \npage.\n"
Hope that clears it up a bit.

JavaScript and C# Regex engines are different. Also each language has it's own regex pattern executor, so Regex is language dependent. It's not the case, if it is working for one language so it will work for another.
C# supports named groups while JavaScript doesn't support them.
So you can find multiple difference between these two languages regex.

There are issues with the way you've translated the regex pattern from a JavaScript string to a C# string.
You have extra whitespace in the c# version, and you've also left in $ symbols and curly brackets { that are part of the string interpolation syntax in the JavaScript version (they are not part of the actual regex pattern).
You have:
"(?![^\\n]{ 1,${" + width + "}}$)([^\\n]{1,${" + width + "}})\\s"
when what I believe you want is:
"(?![^\\n]{1," + width + "}$)([^\\n]{1," + width + "})\\s"

Related

how to replace all occurrence of string between two symbols?

I'm working with RegEx on Javascript and here is where I stuck.
I have a simple string like
<html><body><span style=3D"font-family:Verdana; color:#000; font-size:10pt;=
"><div><font face=3D"verdana, geneva" size=3D"2">http://72.55.146.142:8880/=
order003.png.zip,120</body></html>
all i need to do is write javascript which can replace all strings in with "<" and ">" symbol.
I wrote something like this -
var strReplaceAll = Body;
var intIndexOfMatch = strReplaceAll.indexOf( "<" );
while (intIndexOfMatch != -1){
strReplaceAll = strReplaceAll.replace(/<.*>/,'')
intIndexOfMatch = strReplaceAll.indexOf( "<" );
}
but the problem is if body contains -
test<abc>test2<adg>
it will give me -
test
only or if body contains like -
<html>test<abc>test2<adg>
it will give me nothing please let me know how i can get-
testtest2
as a final output.
Try this regex instead:
<[^>]+>
DEMO:
http://regex101.com/r/kI5cJ7/2
DISCUSSION
Put the html code in a string and apply to this string the regex.
var htmlCode = ...;
htmlCode = htmlCode.replace(/<[^>]+>/g, '');
The original regex take too much characters (* is a greedy operator).
Check this page about Repetition with Star and Plus, especially the part on "Watch Out for The Greediness!".
Most people new to regular expressions will attempt to use <.+>. They will be surprised when they test it on a string like This is a <EM>first</EM> test. You might expect the regex to match <EM> and when continuing after that match, </EM>.
But it does not. The regex will match <EM>first</EM>. Obviously not what we wanted.
/(<.*?>)/
Just use this. Replace all the occurrences with "".
See demo.

Matching on words with possibly special characters

I'm trying to replace all the occurrence of a given word in a string but it is possible that the word contains a special character that needs to be escaped. Here's an example:
The ERA is the mean of earned runs given up by a pitcher per nine
innings pitched. Meanwhile, the ERA+, the adjusted ERA, is a pitcher's
earned run average (ERA) according to the pitcher's ballpark (in case
the ballpark favors batters or pitchers) and the ERA of the pitcher's
league.
I would like to be able to do the following:
string = "The ERA..." // from above
string = string.replaceAll("ERA", "<b>ERA</b>");
string = string.replaceAll("ERA+", "<u>ERA+</u>");
without ERA and ERA conflicting. I've been using the protoype replaceAll posted previously along with a regular expression found somewhere else on SO (I can't seem to find the link in my history unfortunately)
String.prototype.replaceAll = function (find, replace) {
var str = this;
return str.replace(new RegExp(find.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&'), 'g'), replace);
};
function loadfunc() {
var markup = document.getElementById('thetext').innerHTML;
var terms = Object.keys(acronyms);
for (i=0; i<terms.length; i++) {
markup = markup.replaceAll(terms[i], '<abbr title=\"' + acronyms[terms[i]] + '\">' + terms[i] + '</abbr>');
}
document.getElementById('thetext').innerHTML = markup;
}
Basically what the code does is adding an tag to abbreviation to include the definition when mouseovering. The problem is that the current regular expression is way too loose. My previous attempts worked partially but failed to make the difference between things like ERA and ERA+ or would completely skip over something like "K/9" or "IP/GS" (which should be a match by itself and not for "IP" or "GS" individually)
I should mention that acronyms is an array that looks like:
var acronyms = {
"ERA": "Earned Run Average: ...",
"ERA+": "Earned Run Average adjusted to ..."
};
Also (although this is fairly obvious) 'thetext' is a dummy div containing some text. The loadfunc() function is executed from <body onload="loadfunc()">
Thanks!
OK, this is a lot to work with -- after looking at your jsFiddle.
I think the best you're going to get is searching for whole words that begin with a capital letter and may contain / or %. Something like this: ([A-Z][\w/%]+)
Caveat: no matter how you do this, if you're doing it in the browser (e.g. you can't update the raw data) it's going to be process intensive.
And you can implement it like this:
var repl = str.replace(/([A-Z][\w\/%]+)/g, function(match) {
//alert(match);
if (match in acronyms)
return "<abbr title='" + acronyms[match] + "'>" + match + "</abbr>";
else
return match;
});
Here's a working jsFiddle: http://jsfiddle.net/remus/9z6fg/
Note that jQuery isn't required, just used it in this case for ease of updating the DOM in jsFiddle.
You want to use regular expressions with negative lookahead:
string.replace(/\bERA(?!\+)\b/g, "<b>ERA</b>");
and
string.replace(/\bERA\+/g, "<u>ERA+</u>");
The zero-width word boundary \b has been added for good measure, so you don't accidentally match strings like 'BERA', etc.
Another idea is to sort the list of acronyms by longest key to smallest. This way you are sure to substitute all 'ERA+' before 'ERA', so there is no substring conflict.

How to split a long regular expression into multiple lines in JavaScript?

I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think.
Here's pattern sample:
var pattern = /^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;
Extending #KooiInc answer, you can avoid manually escaping every special character by using the source property of the RegExp object.
Example:
var urlRegex= new RegExp(''
+ /(?:(?:(https?|ftp):)?\/\/)/.source // protocol
+ /(?:([^:\n\r]+):([^#\n\r]+)#)?/.source // user:pass
+ /(?:(?:www\.)?([^\/\n\r]+))/.source // domain
+ /(\/[^?\n\r]+)?/.source // request
+ /(\?[^#\n\r]*)?/.source // query
+ /(#?[^\n\r]*)?/.source // anchor
);
or if you want to avoid repeating the .source property you can do it using the Array.map() function:
var urlRegex= new RegExp([
/(?:(?:(https?|ftp):)?\/\/)/ // protocol
,/(?:([^:\n\r]+):([^#\n\r]+)#)?/ // user:pass
,/(?:(?:www\.)?([^\/\n\r]+))/ // domain
,/(\/[^?\n\r]+)?/ // request
,/(\?[^#\n\r]*)?/ // query
,/(#?[^\n\r]*)?/ // anchor
].map(function(r) {return r.source}).join(''));
In ES6 the map function can be reduced to:
.map(r => r.source)
[Edit 2022/08] Created a small github repository to create regular expressions with spaces, comments and templating.
You could convert it to a string and create the expression by calling new RegExp():
var myRE = new RegExp (['^(([^<>()[\]\\.,;:\\s#\"]+(\\.[^<>(),[\]\\.,;:\\s#\"]+)*)',
'|(\\".+\\"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
Notes:
when converting the expression literal to a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
RegExp accepts modifiers as a second parameter
/regex/g => new RegExp('regex', 'g')
[Addition ES20xx (tagged template)]
In ES20xx you can use tagged templates. See the snippet.
Note:
Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \n etc).
(() => {
const createRegExp = (str, opts) =>
new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
const yourRE = createRegExp`
^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|
(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
console.log(yourRE);
const anotherLongRE = createRegExp`
(\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
(\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
(\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
${"gi"}`;
console.log(anotherLongRE);
})();
Using strings in new RegExp is awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.
Let's split this regex
/^foo(.*)\bar$/
We will use a function to make things more beautiful later
function multilineRegExp(regs, options) {
return new RegExp(regs.map(
function(reg){ return reg.source; }
).join(''), options);
}
And now let's rock
var r = multilineRegExp([
/^foo/, // we can add comments too
/(.*)/,
/\bar$/
]);
Since it has a cost, try to build the real regex just once and then use that.
Thanks to the wonderous world of template literals you can now write big, multi-line, well-commented, and even semantically nested regexes in ES6.
//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
.replace(/((^|\n)(?:[^\/\\]|\/[^*\/]|\\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '$1')
.replace(/((^|\n)(?:[^\/\\]|\/[^\/]|\\.)*?)\s*\/\/[^\n]*/g, '$1')
.replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
new RegExp(interpolations.reduce(
(regex, insert, index) => (regex + insert + clean(raw[index + 1])),
clean(raw[0])
))
);
Using this you can now write regexes like this:
let re = regex`I'm a special regex{3} //with a comment!`;
Outputs
/I'm a special regex{3}/
Or what about multiline?
'123hello'
.match(regex`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`)
[2]
Outputs hel, neat!
"What if I need to actually search a newline?", well then use \n silly!
Working on my Firefox and Chrome.
Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:
regex`^\s*
(
//closing the object
(\})|
//starting from open or comma you can...
(?:[,{]\s*)(?:
//have a rest operator
(\.\.\.)
|
//have a property key
(
//a non-negative integer
\b\d+\b
|
//any unencapsulated string of the following
\b[A-Za-z$_][\w$]*\b
|
//a quoted string
//this is #5!
("|')(?:
//that contains any non-escape, non-quote character
(?!\5|\\).
|
//or any escape sequence
(?:\\.)
//finished by the quote
)*\5
)
//after a property key, we can go inside
\s*(:|)
|
\s*(?={)
)
)
((?:
//after closing we expect either
// - the parent's comma/close,
// - or the end of the string
\s*(?:[,}\]=]|$)
|
//after the rest operator we expect the close
\s*\}
|
//after diving into a key we expect that object to open
\s*[{[:]
|
//otherwise we saw only a key, we now expect a comma or close
\s*[,}{]
).*)
$`
It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/
And running it with a little demo?
let input = '{why, hello, there, "you huge \\"", 17, {big,smelly}}';
for (
let parsed;
parsed = input.match(r);
input = parsed[parsed.length - 1]
) console.log(parsed[1]);
Successfully outputs
{why
, hello
, there
, "you huge \""
, 17
,
{big
,smelly
}
}
Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!
If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.
There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:
RegExp.prototype.append = function(re) {
return new RegExp(this.source + re.source, this.flags);
};
let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);
console.log(regex); //=> /[a-z][A-Z][0-9]/g
The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.
let EMAIL_REGEXP =
new RegExp (['^(([^<>()[\\]\\\.,;:\\s#\"]+(\\.[^<>()\\[\\]\\\.,;:\\s#\"]+)*)',
'|(".+"))#((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.',
'[0-9]{1,3}\])|(([a-zA-Z\\-0-9]+\\.)+',
'[a-zA-Z]{2,}))$'].join(''));
To avoid the Array join, you can also use the following syntax:
var pattern = new RegExp('^(([^<>()[\]\\.,;:\s#\"]+' +
'(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#' +
'((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
'(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');
You can simply use string operation.
var pattenString = "^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|"+
"(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);
I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.
To use this snippet you need to call the variadic function combineRegex whose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.
Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.
Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegex encounters an array.
Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/ becomes [/()?:abc/]).
This is best explained using a simple example:
var regex = /abcd(efghi)+jkl/;
would become:
var regex = combineRegex(
/ab/,
/cd/,
[
/ef/,
/ghi/
],
/()+jkl/ // Note the added '()' in front of '+'
);
If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of () you have to use ] as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/ becomes {"":[/]+?/]}
Here is the snippet and a more complete example:
function combineRegexStr(dummy, ...regex)
{
return regex.map(r => {
if(Array.isArray(r))
return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
else
return r.source.replace(dummy, "");
}).join("");
}
function combineRegex(...regex)
{
return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}
//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
combineRegex(
/ab/,
/cd/,
[
/()?:ef/,
{"": [/]+A-Z/, /0-9/]},
/gh/
],
/()+$/
).source
);
Personally, I'd go for a less complicated regex:
/\S+#\S+\.\S+/
Sure, it is less accurate than your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.
However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:
var box1 = "([^<>()[\]\\\\.,;:\s#\"]+(\\.[^<>()[\\]\\\\.,;:\s#\"]+)*)";
var box2 = "(\".+\")";
var host1 = "(\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])";
var host2 = "(([a-zA-Z\-0-9]+\\.)+[a-zA-Z]{2,})";
var regex = new RegExp("^(" + box1 + "|" + box2 + ")#(" + host1 + "|" + host2 + ")$");
#Hashbrown's great answer got me on the right track. Here's my version, also inspired by this blog.
function regexp(...args) {
function cleanup(string) {
// remove whitespace, single and multi-line comments
return string.replace(/\s+|\/\/.*|\/\*[\s\S]*?\*\//g, '');
}
function escape(string) {
// escape regular expression
return string.replace(/[-.*+?^${}()|[\]\\]/g, '\\$&');
}
function create(flags, strings, ...values) {
let pattern = '';
for (let i = 0; i < values.length; ++i) {
pattern += cleanup(strings.raw[i]); // strings are cleaned up
pattern += escape(values[i]); // values are escaped
}
pattern += cleanup(strings.raw[values.length]);
return RegExp(pattern, flags);
}
if (Array.isArray(args[0])) {
// used as a template tag (no flags)
return create('', ...args);
}
// used as a function (with flags)
return create.bind(void 0, args[0]);
}
Use it like this:
regexp('i')`
//so this is a regex
//here I am matching some numbers
(\d+)
//Oh! See how I didn't need to double backslash that \d?
([a-z]{1,3}) /*note to self, this is group #2*/
`
To create this RegExp object:
/(\d+)([a-z]{1,3})/i

Javascript regex with deep objects replacement issue

Hey all, having an issue doing string replacement for template engine code I am writing. If my tokens are 1 level deep everything works fine. Example {someProperty}. But if I try searching for a nested object it never does the replace. Example {myobj.deep.test}. I've attached the code I am playing around with. Thanks for the help!
function replaceStuff(content, fieldName, fieldValue) {
var regexstr = "{" + fieldName + "}";
console.log("regexstr: ", regexstr);
//var regex = new RegExp("{myobj\.deep\.test}", "g"); //this works as expected
var regex = new RegExp(regexstr, "g"); //this doesn't
return content.replace(regex, fieldValue);
}
replaceStuff("test: {myobj.deep.test}", "myobj.deep.test", "my value");
See this SO question about curly braces. Perhaps your browser of choice isn't as understanding as chrome is?
You need to escape the '.' characters in the string you're passing in as the fieldName parameter. Ditto for any other special regex characters that you want to be interpreted literally. Basically, fieldName is being treated as part of a regex pattern.
If you don't want fieldName to be evaluated as regex code, you might want to consider using string manipulation for this instead.
Edit: I just ran your code in FireFox and it worked perfectly. You may have something else going on here.

Regexp for matching numbers and units in an HTML fragment?

I'm trying to make a regexp that will match numbers, excluding numbers that are part of other words or numbers inside certain html tags. The part for matching numbers works well but I can't figure out how to find the numbers inside the html.
Current code:
//number regexp part
var prefix = '\\b()';//for future use
var baseNumber = '((\\+|-)?([\\d,]+)(?:(\\.)(\\d+))?)';
var SIBaseUnit = 'm|kg|s|A|K|mol|cd';
var SIPrefix = 'Y|Z|E|P|T|G|M|k|h|ia|d|c|m|µ|n|p|f|a|z|y';
var SIUnit = '(?:('+SIPrefix+')?('+SIBaseUnit+'))';
var generalSuffix = '(PM|AM|pm|am|in|ft)';
var suffix = '('+SIUnit+'|'+generalSuffix+')?\\b';
var number = '(' + prefix + baseNumber + suffix + ')';
//trying to make it match only when not within tags or inside excluded tags
var htmlBlackList = 'script|style|head'
var htmlStartTag = '<[^(' + htmlBlackList + ')]\\b[^>]*?>';
var reDecimal = new RegExp(htmlStartTag + '[^<]*?' + number + '[^>]*?<');
<script>
var htmlFragment = "<script>alert('hi')</script>";
var style = "<style>.foo { font-size: 14pt }</style>";
// ...
</script>
<!-- turn off this style for now
<style> ... </style>
-->
Good luck getting a regular expression to figure that out.
You're using JavaScript, so I'm guessing you're probably running in a browser. Which means you have access to the DOM, giving you access to the browser's very capable HTML parser. Use it.
The [^] regex modifier only works on single characters, not on compound expressions like (script|style|head). What you want is ?! :
var htmlStartTag = '<(?!(' + htmlBlackList + ')\\b)[^>]*?>';
(?! ... ) means 'not followed by ...' but [^ ... ] means 'a single character not in ...'.
I'm trying to make a regexp that will match numbers, excluding numbers that are part of other words or numbers inside certain html tags.
Regex cannot parse HTML. Do not use regex to parse HTML. Do not pass Go. Do not collect £200.
To ‘only match something not-within something else’ you would need a negative lookbehind assertion (“(?<!”), but JavaScript Regexps do not support lookbehind, and most other regex implementations don't support the complex variable-length lookbehind you'd need to have any hope of matching a context like being inside a tag. Even if you did have variable-length lookbehind, that'd still not reliably parse HTML, because as previously mentioned many times every day, regex cannot parse HTML.
Use an HTML parser. A browser HTML parser will be able to digest even partial input without complaining.

Categories