Concatenate / simplify RegExp

Concatenate / simplify RegExp - javascript

I've this working RegExp in my JavaScript file:
var reA = new RegExp(urlValueToRemove);
var reB = new RegExp('(,&)');
var reC = new RegExp('(,,)');
var reD = new RegExp('(=,)');
var reE = new RegExp('(,$)');
window.history.pushState(null, null, decodeURIComponent(window.location.search).replace(reA, '').replace(reB, '&').replace(reC, ',').replace(reD, '=').replace(reE, ''));
Is it possible to concatenate / simplify this so that I don't need to do the replace 5 times?
I've asked this in the codereview community but there is nobody available so I think I must need to wait days there.
Example
When I have this URL here:
http://localhost.com/?color=Red,Blue,Green&size=X,L,M,S
When I want to remove now the Green from the URL I can pass Green to the first Regex reA and it gets removed from the URL:
http://localhost.com/?color=Red,Blue&size=X,L,M,S

You can use the capture group to indicate what should be kept, and join the two cases with a |: one case needs to keep the character that precedes the word (like =), the other what follows the word (like &):
function removeWord(url, text) {
const re = new RegExp(`,${text}(&|,|$)|(=)${text},`, 'g');
return url.replace(re, '$1$2');
}
const url = "http://localhost.com/?color=Red,Blue,Green&size=X,L,M,S"
console.log(removeWord(url, "Green"));

Related

Regex Help - Match any URL Parameter & Value not in List

Thank you for looking at this!
I am trying to build some Regex that works in JavaScript that will match ALL URL parameters and their values that are not in my predefined list. Example:
Raw URL:
/folder/index.html?knownParamA=1234&unknownParamA=1234&knownParamB=1234&unknownParamB=1234
My List of Know Parameters:
((knownParamA|knownParamB|knownParamC)=[^&]*&?)/gi
Resulting (Cleaned up) URL:
/folder/index.html?knownParamA=1234&unknownParam=1234
Ultimately, I want to capture a cleaned up version of any URL with only the parameters and values I need. There's tons of parameters on my website that are meaningless to me and only get in the way. One solution I found required a look back but I don't think JavaScript supports those.
Thank you so much for the help!!!
Solution Based on Feedback Below:
pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, "").replace(urlCleanerRegexStep2, '?$1');

Negative searches are tricky, and require zero-width lookaheads.
This will find the unknown parameters and strip them out of the URL: (Update 2: This doesn't keep unknown parameters that start with known parameters any more.)
step1 = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"
However, if the first parameter gets stripped out, your first remaining parameter will be preceded by a & instead of a ?, and you will need to replace that too:
clean = step1.replace(/[?&]([^=]+=[^&]*)/, '?$1');
// "/folder/index.html?knownParamA=1234&knownParamB=1234"
You can chain these together, of course:
clean = url.replace(/[?&](?!(?:knownParamA|knownParamB)(?==))[^=]+=[^&]*/gi, '').
replace(/[?&]([^=]+=[^&]*)/, '?$1');
Update: I have included user3842539's expansion of the code, as it's easier to read here than in a comment.
pageURL = window.location.pathname + window.location.search;
knownParams = 'knownParamA|knownParamB|knownParamC|knownParamD';
var urlCleanerRegexStep1 = new RegExp('[?&](?!(?:' + knownParams + ')(?==))[^=]+=[^&]*', 'gi');
var urlCleanerRegexStep2 = new RegExp('[?&]([^=]+=[^&]*)', '');
cleanPageURL = pageURL.replace(urlCleanerRegexStep1, '').replace(urlCleanerRegexStep2, '?$1');
To help you interpret these regexes:
[?&] = either ? or &
(...) = captured group
(?!...) = not followed by a match for this group
(?:...) = uncaptured group
(?=...) = followed by a match for this group
= = =
[^=] = any character other than =
+ = one or more times
[^&] = any character other than &
* = zero or more times
Outside the regex body,
The g flag means 'all matches' (as opposed to only the first)
The i flag means 'case-insensitive'
In the replacement string, $1 means 'captured group 1'

Why replace() with regex change condition not result

I have the following code:
var code = $('#code');
var str = code.val();
console.log(str);
var re = new RegExp("^press\\\(\\\"" + "KEY_6" + "\\\"\\\)\\n+(.*)", "m");
console.log(re);
var res = str.replace(re,'');
console.log(res);
code.val(res);
When a user inputs this into textarea:
press("KEY_3")
press("KEY_6")
press("KEY_9")
It should replace press("KEY_9") with empty string. However, it also replaces the condition press("KEY_6")
Could you help me to understand possible reasons why it's not working as supposed? There's following link with example: http://jsfiddle.net/vfn8dtn4/

You should capture the group you want to keep, and then replace with $1:
...
var re = new RegExp("^([\\s\\S]*press\\\(\\\"" + "KEY_6" + "\\\"\\\))[\\n\\r]+.*", "m");
console.log(re);
var res = str.replace(re,'$1');
...
See updated code
Output:
press("KEY_6")
press("KEY_1")
press("KEY_6")
When we add [\\s\\S]* at the pattern start, we make sure we match as many characters as possible before the first press, so we'll capture the last KEY_6.

The (.*) at the end is consuming all the characters coming after "KEY_6" and the new line character. If you remove that i.e.
"^press\\\(\\\"" + "KEY_6" + "\\\"\\\)\n+"
works fine

Javascript RegExp match & Multiple backreferences

I'm having trouble trying to use multiple back references in a javascript match so far I've got: -
function newIlluminate() {
var string = "the time is a quarter to two";
var param = "time";
var re = new RegExp("(" + param + ")", "i");
var test = new RegExp("(time)(quarter)(the)", "i");
var matches = string.match(test);
$("#debug").text(matches[1]);
}
newIlluminate();
#Debug when matching the Regex 're' prints 'time' which is the value of param.
I've seen match examples where multiple back references are used by wrapping the match in parenthesis however my match for (time)(quarter)... is returning null.
Where am I going wrong? Any help would be greatly appreciated!

Your regex is literally looking for timequarterthe and splitting the match (if it finds one) into the three backreferences.
I think you mean this:
var test = /time|quarter|the/ig;

Your regex test simply doesn't match the string (as it does not contain the substring timequarterthe). I guess you want alternation:
var test = /time|quarter|the/ig; // does not even need a capturing group
var matches = string.match(test);
$("#debug").text(matches!=null ? matches.join(", ") : "did not match");

How to obtain index of subpattern in JavaScript regexp?

I wrote a regular expression in JavaScript for searching searchedUrl in a string:
var input = '1234 url( test ) 5678';
var searchedUrl = 'test';
var regexpStr = "url\\(\\s*"+searchedUrl+"\\s*\\)";
var regex = new RegExp(regexpStr , 'i');
var match = input.match(regex);
console.log(match); // return an array
Output:
["url( test )", index: 5, input: "1234 url( test ) 5678"]
Now I would like to obtain position of the searchedUrl (in the example above it is the position of test in 1234 url( test ) 5678.
How can I do that?

As far as I could tell it wasn't possible to get the offset of a sub-match automatically, you have to do the calculation yourself using either lastIndex of the RegExp, or the index property of the match object returned by exec(). Depending on which you use you'll either have to add or subtract the length of groups leading up to your sub-match. However, this does mean you have to group the first or last part of the Regular Expression, up to the pattern you wish to locate.
lastIndex only seems to come into play when using the /g/ global flag, and it will record the index after the entire match. So if you wish to use lastIndex you'll need to work backwards from the end of your pattern.
For more information on the exec() method, see here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec
The following succinctly shows the solution in operation:
var str = '---hello123';
var r = /([a-z]+)([0-9]+)/;
var m = r.exec( str );
alert( m.index + m[1].length ); // will give the position of 123
update
This would apply to your issue using the following:
var input = '1234 url( test ) 5678';
var searchedUrl = 'test';
var regexpStr = "(url\\(\\s*)("+searchedUrl+")\\s*\\)";
var regex = new RegExp(regexpStr , 'i');
var match = regex.exec(input);
Then to get the submatch offset you can use:
match.index + match[1].length
match[1] now contains url( (plus two spaces) due to the bracket grouping which allows us to tell the internal offset.
update 2
Obviously things are a little more complicated if you have patterns in the RegExp, that you wish to group, before the actual pattern you want to locate. This is just a simple act of adding together each group length.
var s = '~- [This may or may not be random|it depends on your perspective] -~';
var r = /(\[)([a-z ]+)(\|)([a-z ]+)(\])/i;
var m = r.exec( s );
To get the offset position of it depends on your perspective you would use:
m.index + m[1].length + m[2].length + m[3].length;
Obviously if you know the RegExp has portions that never change length, you can replace those with hard coded numeric values. However, it's probably best to keep the above .length checks, just in case you — or someone else — ever changes what your expression matches.

JS doesn't have a direct way to get the index of a subpattern/capturing group. But you can work around that with some tricks. For example:
var reStr = "(url\\(\\s*)" + searchedUrl + "\\s*\\)";
var re = new RegExp(reStr, 'i');
var m = re.exec(input);
if(m){
var index = m.index + m[1].length;
console.log("url found at " + index);
}

You can add the 'd' flag to the regex in order to generate indices for substring matches.
const input = '1234 url( test ) 5678';
const searchedUrl = 'test';
const regexpStr = "url\\(\\s*("+searchedUrl+")\\s*\\)";
const regex = new RegExp(regexpStr , 'id');
const match = regex.exec(input).indices[1]
console.log(match); // return [11, 15]

You don't need the index.
This is a case where providing just a bit more information would have gotten a much better answer. I can't fault you for it; we're encouraged to create simple test cases and cut out irrelevant detail.
But one important item was missing: what you plan to do with that index. In the meantime, we were all chasing the wrong problem. :-)
I had a feeling something was missing; that's why I asked you about it.
As you mentioned in the comment, you want to find the URL in the input string and highlight it in some way, perhaps by wrapping it in a <b></b> tag or the like:
'1234 url( <b>test</b> ) 5678'
(Let me know if you meant something else by "highlight".)
You can use character indexes to do that, however there is a much easier way using the regular expression itself.
Getting the index
But since you asked, if you did need the index, you could get it with code like this:
var input = '1234 url( test ) 5678';
var url = 'test';
var regexpStr = "^(.*url\\(\\s*)"+ url +"\\s*\\)";
var regex = new RegExp( regexpStr , 'i' );
var match = input.match( regex );
var start = match[1].length;
This is a bit simpler than the code in the other answers, but any of them would work equally well. This approach works by anchoring the regex to the beginning of the string with ^ and putting all the characters before the URL in a group with (). The length of that group string, match[1], is your index.
Slicing and dicing
Once you know the starting index of test in your string, you could use .slice() or other string methods to cut up the string and insert the tags, perhaps with code something like this:
// Wrap url in <b></b> tag by slicing and pasting strings
var output =
input.slice( 0, start ) +
'<b>' + url + '</b>' +
input.slice( start + url.length );
console.log( output );
That will certainly work, but it is really doing things the hard way.
Also, I left out some error handling code. What if there is no matching URL? match will be undefined and the match[1] will fail. But instead of worrying about that, let's see how we can do it without any character indexing at all.
The easy way
Let the regular expression do the work for you. Here's the whole thing:
var input = '1234 url( test ) 5678';
var url = 'test';
var regexpStr = "(url\\(\\s*)(" + url + ")(\\s*\\))";
var regex = new RegExp( regexpStr , 'i' );
var output = input.replace( regex, "$1<b>$2</b>$3" );
console.log( output );
This code has three groups in the regular expression, one to capture the URL itself, with groups before and after the URL to capture the other matching text so we don't lose it. Then a simple .replace() and you're done!
You don't have to worry about any string lengths or indexes this way. And the code works cleanly if the URL isn't found: it returns the input string unchanged.

Javascript Regexp loop all matches

I'm trying to do something similar with stack overflow's rich text editor. Given this text:
[Text Example][1]
[1][http://www.example.com]
I want to loop each [string][int] that is found which I do this way:
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
var arrMatch = null;
var rePattern = new RegExp(
"\\[(.+?)\\]\\[([0-9]+)\\]",
"gi"
);
while (arrMatch = rePattern.exec(Text)) {
console.log("ok");
}
This works great, it alerts 'ok' for each [string][int]. What I need to do though, is for each match found, replace the initial match with components of the second match.
So in the loop $2 would represent the int part originally matched, and I would run this regexp (pseduo)
while (arrMatch = rePattern.exec(Text)) {
var FindIndex = $2; // This would be 1 in our example
new RegExp("\\[" + FindIndex + "\\]\\[(.+?)\\]", "g")
// Replace original match now with hyperlink
}
This would match
[1][http://www.example.com]
End result for first example would be:
Text Example
Edit
I've gotten as far as this now:
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
reg = new RegExp(
"\\[(.+?)\\]\\[([0-9]+)\\]",
"gi");
var result;
while ((result = reg.exec(Text)) !== null) {
var LinkText = result[1];
var Match = result[0];
Text = Text.replace(new RegExp(Match, "g"), '" + LinkText + "');
}
console.log(Text);

I agree with Jason that it’d be faster/safer to use an existing Markdown library, but you’re looking for String.prototype.replace (also, use RegExp literals!):
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
var rePattern = /\[(.+?)\]\[([0-9]+)\]/gi;
console.log(Text.replace(rePattern, function(match, text, urlId) {
// return an appropriately-formatted link
return `${text}`;
}));

I managed to do it in the end with this:
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
reg = new RegExp(
"\\[(.+?)\\]\\[([0-9]+)\\]",
"gi");
var result;
while (result = reg.exec(Text)) {
var LinkText = result[1];
var Match = result[0];
var LinkID = result[2];
var FoundURL = new RegExp("\\[" + LinkID + "\\]\\[(.+?)\\]", "g").exec(Text);
Text = Text.replace(Match, '' + LinkText + '');
}
console.log(Text);

Here we're using exec method, it helps to get all matches (with help while loop) and get position of matched string.
var input = "A 3 numbers in 333";
var regExp = /\b(\d+)\b/g, match;
while (match = regExp.exec(input))
console.log("Found", match[1], "at", match.index);
// → Found 3 at 2 // Found 333 at 15

Using back-references to to restrict the match so that the code will match if your text is:
[Text Example][1]\n[1][http://www.example.com]
and the code will not match if your text is:
[Text Example][1]\n[2][http://www.example.com]
var re = /\[(.+?)\]\[([0-9]+)\s*.*\s*\[(\2)\]\[(.+?)\]/gi;
var str = '[Text Example][1]\n[1][http://www.example.com]';
var subst = '$1';
var result = str.replace(re, subst);
console.log(result);
\number is used in regex to refer a group match number, and $number is used by the replace function in the same way, to refer group results.

This format is based on Markdown. There are several JavaScript ports available. If you don't want the whole syntax, then I recommend stealing the portions related to links.

Another way to iterate over all matches without relying on exec and match subtleties, is using the string replace function using the regex as the first parameter and a function as the second one. When used like this, the function argument receives the whole match as the first parameter, the grouped matches as next parameters and the index as the last one:
var text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
var arrMatch = null;
var rePattern = new RegExp("\\[(.+?)\\]\\[([0-9]+)\\]", "gi");
text.replace(rePattern, function(match, g1, g2, index){
// Do whatever
})
You can even iterate over all groups of each match using the global JS variable arguments, excluding the first and last ones.

I know it's old, but since I stumble upon this post, I want to strait the things up.
First of all, your way of thinking into solving this problem is too complicated, and when the solution of supposedly simple problem becomes too complicated, it is time to stop and think what went wrong.
Second, your solution is super inefficient in a way, that you are first trying to find what you want to replace and then you are trying to search the referenced link information in the same text. So calculation complexity eventually becomes O(n^2).
This is very disappointing to see so many upvotes on something wrong, because people that are coming here, learning mostly from the accepted solution, thinking that this seems be legit answer and using this concept in their project, which then becomes a very badly implemented product.
The approach to this problem is pretty simple. All you need to do, is to find all referenced links in the text, save them as a dictionary and only then search for the placeholders to replace, using the dictionary. That's it. It is so simple! And in this case you will get complexity of just O(n).
So this is how it goes:
const text = `
[2][https://en.wikipedia.org/wiki/Scientific_journal][5][https://en.wikipedia.org/wiki/Herpetology]
The Wells and Wellington affair was a dispute about the publication of three papers in the Australian Journal of [Herpetology][5] in 1983 and 1985. The publication was established in 1981 as a [peer-reviewed][1] [scientific journal][2] focusing on the study of [3][https://en.wikipedia.org/wiki/Amphibian][amphibians][3] and [reptiles][4] ([herpetology][5]). Its first two issues were published under the editorship of Richard W. Wells, a first-year biology student at Australia's University of New England. Wells then ceased communicating with the journal's editorial board for two years before suddenly publishing three papers without peer review in the journal in 1983 and 1985. Coauthored by himself and high school teacher Cliff Ross Wellington, the papers reorganized the taxonomy of all of Australia's and New Zealand's [amphibians][3] and [reptiles][4] and proposed over 700 changes to the binomial nomenclature of the region's herpetofauna.
[1][https://en.wikipedia.org/wiki/Academic_peer_review]
[4][https://en.wikipedia.org/wiki/Reptile]
`;
const linkRefs = {};
const linkRefPattern = /\[(?<id>\d+)\]\[(?<link>[^\]]+)\]/g;
const linkPlaceholderPattern = /\[(?<text>[^\]]+)\]\[(?<refid>\d+)\]/g;
const parsedText = text
.replace(linkRefPattern, (...[,,,,,ref]) => (linkRefs[ref.id] = ref.link, ''))
.replace(linkPlaceholderPattern, (...[,,,,,placeholder]) => `${placeholder.text}`)
.trim();
console.log(parsedText);

We Keep Coding

JavaScript is the programming language of the Web.

Concatenate / simplify RegExp - javascript

Related

Regex Help - Match any URL Parameter & Value not in List

Why replace() with regex change condition not result

Javascript RegExp match & Multiple backreferences

How to obtain index of subpattern in JavaScript regexp?

Javascript Regexp loop all matches

Categories

Resources