Javascript: Back-Referencing in Regex without remembering last match - javascript

Is there any thing that doesn't remember the last Matched value in back referencing ?
(abc|def)=\1 matches abc=abc or def=def, but not abc=def or def=abc.
In a string i need to match a pattern that matches abc=def or def=abc with back-referencing. But i can match only abc=abc or def=def.
I can do it with regex pattern like (abc|def)=(abc|def).This matches all the cases like abc=abc, def=def, abc=def, def=abc
In my case abc or def is a very long thing to search and if i have to change the regex i have to change in both the places. Is there any way to back-reference to the group without remembrance ?
https://regex101.com/r/bL9wU1/1

You cannot use back references for this task.
Backreferences match the same text as previously matched by a capturing group.
That is, they do not use the same pattern defined in a capturing group.
A workaround is to build a dynamic regex pattern with blocks and use a RegExp constructor:
var ss = [ "abc=abc", "def=def", "abc=def", "def=abc"]; // Test strings
var block = "(?:abc|def)"; // Define the pattern block
var rx = RegExp(block + "=" + block); // Build the regex dynamically
document.body.innerHTML += "Pattern: <b>" + rx.source
+ "</b><br/>"; // Display resulting pattern
for (var s = 0; s < ss.length; s++) { // Demo
document.body.innerHTML += "Testing \"<i>" + ss[s] + "</i>\"... ";
document.body.innerHTML += "Matched: <b>" + rx.test(ss[s]) + "</b><br/>";
}

Related

Regex match character set without previous duplication

What is the best way to create a dynamic regex to match a distinct set of characters (characters and their order provided during runtime).
character set: abcd
character format: ??j? (question mark represents a a character from character set)
Example
abjd = match
bdja = match
dbja = match
ab = no match
aajd = no match
abjdd = no match
abj = no match
I have created a regex builder (in js) as follows:
// characters are the character set
// wordFormat is the character format
// replace(str, search, replacement) replaces search in str with replacement
var chars = "[" + characters + "]{1}";
var afterSpecialConversion = replace(wordFormat, "?", chars);
var myRegex = new RegExp("^" + afterSpecialConversion + "$", "gi");
Unfortunately this does not achieve the result as it does not consider duplicate items. I thought about using matching groups to avoid duplicates however I don't know how to negate the already existing character group from the remainder of the set.
Also given character set aabcd now a can exist twice. Any suggestions?
Your regex-builder approach is correct (though a bit of a maintanability mess, so document it carefully), but not quite sophisticated enough. What you need to do is use lookaheads.
I've provided an example regex on Regex101 for the demo in your question.
The more general principle is to replace each set of n question marks with a pattern that matches this:
(?:([<chars>])(?!.*\<m>)){<n>}
Where <chars> is the character set you want to use, m is the index of the set of question marks (starting from 1 - more on this in a moment), and <n> is the number of question marks in the group. This yields regex-builder code that looks like this:
function getRe(pattern, chars) {
var re = "^";
var qMarkGroup = 1;
var qMarkCount = 0;
for (var index in pattern) {
var char = pattern[index];
if (char === "?") {
qMarkCount += 1;
} else {
if (qMarkCount > 0) {
re += "(?:([" + chars + "])(?!.*\\" + qMarkGroup + ")){" + qMarkCount + "}" + char;
qMarkCount = 0;
qMarkGroup += 1;
}
}
}
// Need to do this again in case we have a group of question marks at the end of the pattern
if (qMarkCount > 0) {
re += "(?:([" + chars + "])(?!.*\\" + qMarkGroup + ")){" + qMarkCount + "}";
}
re += "$";
return new Regexp(re, "gi");
}
Code demo on Repl.it
Obviously, this function definition is very verbose, to demonstrate the principles involved. Feel free to golf it (but remember to watch out for fencepost issues like I've described in the comments).
Additionally, be sure to sanitize the inputs. This is an example and will break if someone, for instance, puts in ] in chars.

JS Regexp - how to find text in a string

There is some text, exp: "The string class is an instantiation of the basic_string class template that uses char".
I need to find the text - "basic_string", but if there is no word "the" in front of him.
If use negative lookbehind, it was be:
(?<!\sthe)\s+basic_string
But javascript not understand negative lookbehind, what to do?
If the only allowed character between "the" and "basic_string" is the white-space:
([^e\s]|[^h]e|[^t]he)\s+basic_string
You can use xregexp library to get advanced regex features like lookbehind in Javascript.
Alternatively you can use alternation and capture group as a workaround:
var s = 'The string class is an instantiation of the basic_string class template that uses char';
var kw = s.match(/\bthe basic_string\b|(\bbasic_string\b)/)[1];
// undefined
s = 'instantiation of basic_string class template'
kw = s.match(/\bthe basic_string\b|(\bbasic_string\b)/)[1]
//=> "basic_string"
In this regex, captured group #1 will only be populated if bbasic_string isn't preceded by word the.
You can use RegExp /(the)(?\sbasic_string)/ or new RegExp("(" + before + ")(?=" + match + ")") to match "the" if followed by " basic_string", .match() to retrieve .index of matched string, .slice() to get "basic_string"
var str = "The string class is an instantiation of the basic_string class template that uses char";
var before = "the";
var match = " basic_string";
var index = str.match(new RegExp("(" + before + ")(?=" + match + ")")).index
+ before.length + 1;
console.log(str.slice(index, index + match.length));
The easiest way to emulate the negative lookbehind is via an optional capturing group, and check if the group participated in the match:
/(\bthe)?\s+basic_string/g
^^^^^^^^
See this JS demo:
var s = 'The string class is an instantiation of the basic_string class template that uses char, not basic_string.';
var re = /(\bthe)?(\s+basic_string)/gi;
var res = s.replace(re, function(match, group1, group2) {
return group1 ? match : "<b>" + group2 + "</b>";
});
document.body.innerHTML = res;

Specific regex test

I need to make a specific regex for something like this:
(\d{1,3}\/\d{1,3}\/\d{1,3})\-(.*)
example:
1/2/3-abc
It accepts:
1/2/3 - capture index 1
and abc - capture index 2
I need from capture index 1 just 123 without '/' characters.
I tried it with positive/ negative lookahead, but it won't work.
Many Thanks
You can achieve what you need with some string operations:
var s = "1/2/3-abc";
if (s.indexOf("-") > -1) { // Check if there is a hyphen in the string
document.write( s.substring(0, s.indexOf("-")).replace(/\//g, ""));
}
The s.indexOf("-") will find the index of the first - character in the input string, and after we get the substring from the start till the - (with s.substring(0, s.indexOf("-"))), we can remove the / symbols with .replace(/\//g, "").
You cannot extract characters out of an individual match. You need to capture the whole group. After that, you can replace the characters you do not want.
You can extract the group using a matcher or a replacer.
function processMatcher(str) {
var match = str.match(/(\d{1,3}\/\d{1,3}\/\d{1,3})\-(.*)/);
return match[1].replace(/[\/]/g, '');
}
function processReplacer(str) {
return str.replace(/(\d{1,3}\/\d{1,3}\/\d{1,3})\-(.*)/, function(match, p1, p2, offset, string) {
return p1.replace(/[\/]/g, '');
});
}
document.body.innerHTML = 'Matcher: ' + processMatcher('1/2/3-abc') + '</br />';
document.body.innerHTML += 'Replacer: ' + processReplacer('1/2/3-abc');

Regex match quotes inside bracket regex

I'm working on a regex that must match only the text inside quotes but not in a comment, my macthes must only the strings in bold
<"love";>
>/*"love"*/<
<>'love'<>
"lo
more love
ve"
I'm stunck on this:
/(?:((\"|\')(.|\n)*?(\"|\')))(?=(?:\/\**\*\/))/gm
The first one (?:((\"|\')(.|\n)*?(\"|\'))) match all the strings
the second one (?=(?:\/\**\*\/)) doesn't match text inside quotes inside /* "mystring" */
bit my logic is cleary wrong
Any suggestion?
Thanks
Maybe you just need to use a negative lookahead to check for the comment end */?
But first, I'd split the string into separate lines
var arrayOfLines = input_str.split(/\r?\n/);
or, without empty lines:
var arrayOfLines = input_str.match(/[^\r\n]+/g);
and then use this regex:
["']([^'"]+)["'](?!.*\*\/)
Sample code:
var rebuilt_string = ''
var re = /["']([^'"]+)["'](?!.*\*\/)/g;
var subst = '<b>$1</b>';
for (i = 0; i < arrayOfLines.length; i++)
{
rebuilt_string = rebuilt_string + arrayOfLines[i].replace(re, subst) + "\r\n";
}
The way to avoid commented parts is to match them before. The global pattern looks like this:
/(capture parts to avoid)|target/
Then use a callback function for the replacement (when the capture group exists, return the match without change, otherwise, replace the match with what you want.
Example:
var result = text.replace(/(\/\*[^*]*(?:\*+(?!\/)[^*]*)*\*\/)|"[^"\\]*(?:\\[\s\S][^"\\]*)*"|'[^'\\]*(?:\\[\s\S][^'\\]*)*'/g,
function (m, g1) {
if (g1) return g1;
return '<b>' + m + '</b>';
});

Regex to get word started with # in javascript

I have a problem replace certain words started with #. I have the following code
var x="#google",
eval("var pattern = /" + '\\b' + x + '\\b');
txt.replace(pattern,"MyNewWord");
when I use the following code it works fine
var x="google",
eval("var pattern = /" + '\\b' + x + '\\b');
txt.replace(pattern,"MyNewWord");
it works fine
any suggestion how to make the first part of code working
ps. I use eval because x will be a user input.
The problem is that \b represents a boundary between a "word" character (letter, digit, or underscore) and a "non-word" character (anything else). # is a non-word character, so \b# means "a # that is preceded by a word character" — which is not at all what you want. If anything, you want something more like \B#; \B is a non-boundary, so \B# means "a # that is not preceded by a word character".
I'm guessing that you want your words to be separated by whitespace, instead of by a programming-language concept of what makes something a "word" character or a "non-word" character; for that, you could write:
var x = '#google'; // or 'google'
var pattern = new RegExp('(^|\\s)' + x);
var result = txt.replace(pattern, '$1' + 'MyNewWord');
Edited to add: If x is really supposed to be a literal string, not a regex at all, then you should "quote" all of the special characters in it, with a backslash. You can do that by writing this:
var x = '#google'; // or 'google' or '$google' or whatever
var quotedX = x.replace(/[^\w\s]/g, '\\$&');
var pattern = new RegExp('(^|\\s)' + quotedX);
var result = txt.replace(pattern, '$1' + 'MyNewWord');
Make you patter something like this:
/(#)?\w*/
If you want to make a Regular Expression, try this instead of eval:
var pattern = new RegExp(x);
Btw the line:
eval("var pattern = /" + '\\b' + x + '\\b');
will make an error because of no enclose pattern, should be :
eval("var pattern = /" + '\\b' + x + '\\b/');
How about
var x = "#google";
x.match(/^\#/);

Categories