Javascript Regexp loop all matches

Javascript Regexp loop all matches - javascript

I'm trying to do something similar with stack overflow's rich text editor. Given this text:
[Text Example][1]
[1][http://www.example.com]
I want to loop each [string][int] that is found which I do this way:
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
var arrMatch = null;
var rePattern = new RegExp(
"\\[(.+?)\\]\\[([0-9]+)\\]",
"gi"
);
while (arrMatch = rePattern.exec(Text)) {
console.log("ok");
}
This works great, it alerts 'ok' for each [string][int]. What I need to do though, is for each match found, replace the initial match with components of the second match.
So in the loop $2 would represent the int part originally matched, and I would run this regexp (pseduo)
while (arrMatch = rePattern.exec(Text)) {
var FindIndex = $2; // This would be 1 in our example
new RegExp("\\[" + FindIndex + "\\]\\[(.+?)\\]", "g")
// Replace original match now with hyperlink
}
This would match
[1][http://www.example.com]
End result for first example would be:
Text Example
Edit
I've gotten as far as this now:
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
reg = new RegExp(
"\\[(.+?)\\]\\[([0-9]+)\\]",
"gi");
var result;
while ((result = reg.exec(Text)) !== null) {
var LinkText = result[1];
var Match = result[0];
Text = Text.replace(new RegExp(Match, "g"), '" + LinkText + "');
}
console.log(Text);

I agree with Jason that it’d be faster/safer to use an existing Markdown library, but you’re looking for String.prototype.replace (also, use RegExp literals!):
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
var rePattern = /\[(.+?)\]\[([0-9]+)\]/gi;
console.log(Text.replace(rePattern, function(match, text, urlId) {
// return an appropriately-formatted link
return `${text}`;
}));

I managed to do it in the end with this:
var Text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
reg = new RegExp(
"\\[(.+?)\\]\\[([0-9]+)\\]",
"gi");
var result;
while (result = reg.exec(Text)) {
var LinkText = result[1];
var Match = result[0];
var LinkID = result[2];
var FoundURL = new RegExp("\\[" + LinkID + "\\]\\[(.+?)\\]", "g").exec(Text);
Text = Text.replace(Match, '' + LinkText + '');
}
console.log(Text);

Here we're using exec method, it helps to get all matches (with help while loop) and get position of matched string.
var input = "A 3 numbers in 333";
var regExp = /\b(\d+)\b/g, match;
while (match = regExp.exec(input))
console.log("Found", match[1], "at", match.index);
// → Found 3 at 2 // Found 333 at 15

Using back-references to to restrict the match so that the code will match if your text is:
[Text Example][1]\n[1][http://www.example.com]
and the code will not match if your text is:
[Text Example][1]\n[2][http://www.example.com]
var re = /\[(.+?)\]\[([0-9]+)\s*.*\s*\[(\2)\]\[(.+?)\]/gi;
var str = '[Text Example][1]\n[1][http://www.example.com]';
var subst = '$1';
var result = str.replace(re, subst);
console.log(result);
\number is used in regex to refer a group match number, and $number is used by the replace function in the same way, to refer group results.

This format is based on Markdown. There are several JavaScript ports available. If you don't want the whole syntax, then I recommend stealing the portions related to links.

Another way to iterate over all matches without relying on exec and match subtleties, is using the string replace function using the regex as the first parameter and a function as the second one. When used like this, the function argument receives the whole match as the first parameter, the grouped matches as next parameters and the index as the last one:
var text = "[Text Example][1]\n[1][http: //www.example.com]";
// Find resource links
var arrMatch = null;
var rePattern = new RegExp("\\[(.+?)\\]\\[([0-9]+)\\]", "gi");
text.replace(rePattern, function(match, g1, g2, index){
// Do whatever
})
You can even iterate over all groups of each match using the global JS variable arguments, excluding the first and last ones.

I know it's old, but since I stumble upon this post, I want to strait the things up.
First of all, your way of thinking into solving this problem is too complicated, and when the solution of supposedly simple problem becomes too complicated, it is time to stop and think what went wrong.
Second, your solution is super inefficient in a way, that you are first trying to find what you want to replace and then you are trying to search the referenced link information in the same text. So calculation complexity eventually becomes O(n^2).
This is very disappointing to see so many upvotes on something wrong, because people that are coming here, learning mostly from the accepted solution, thinking that this seems be legit answer and using this concept in their project, which then becomes a very badly implemented product.
The approach to this problem is pretty simple. All you need to do, is to find all referenced links in the text, save them as a dictionary and only then search for the placeholders to replace, using the dictionary. That's it. It is so simple! And in this case you will get complexity of just O(n).
So this is how it goes:
const text = `
[2][https://en.wikipedia.org/wiki/Scientific_journal][5][https://en.wikipedia.org/wiki/Herpetology]
The Wells and Wellington affair was a dispute about the publication of three papers in the Australian Journal of [Herpetology][5] in 1983 and 1985. The publication was established in 1981 as a [peer-reviewed][1] [scientific journal][2] focusing on the study of [3][https://en.wikipedia.org/wiki/Amphibian][amphibians][3] and [reptiles][4] ([herpetology][5]). Its first two issues were published under the editorship of Richard W. Wells, a first-year biology student at Australia's University of New England. Wells then ceased communicating with the journal's editorial board for two years before suddenly publishing three papers without peer review in the journal in 1983 and 1985. Coauthored by himself and high school teacher Cliff Ross Wellington, the papers reorganized the taxonomy of all of Australia's and New Zealand's [amphibians][3] and [reptiles][4] and proposed over 700 changes to the binomial nomenclature of the region's herpetofauna.
[1][https://en.wikipedia.org/wiki/Academic_peer_review]
[4][https://en.wikipedia.org/wiki/Reptile]
`;
const linkRefs = {};
const linkRefPattern = /\[(?<id>\d+)\]\[(?<link>[^\]]+)\]/g;
const linkPlaceholderPattern = /\[(?<text>[^\]]+)\]\[(?<refid>\d+)\]/g;
const parsedText = text
.replace(linkRefPattern, (...[,,,,,ref]) => (linkRefs[ref.id] = ref.link, ''))
.replace(linkPlaceholderPattern, (...[,,,,,placeholder]) => `${placeholder.text}`)
.trim();
console.log(parsedText);

Related

js Split array add space between words (not first)

I have a string Topic: Computer Science
And want to strip out topic: (but in fact I want this to work with any header on the string line) and return Computer Science.
I thought about splitting the components and then adding the spaces back in:
var subjectLine = thisLine.split(" ");
var subjectString = "";
for (i = 1; i < subjectLine.length; i++) {
subjectString += subjectLine[i] + " ";
}
But then I need to remove the last space from the string.
For each doesn't work as I need to NOT have the first element appended.
I'm not sure how to do this in js so it is reusable for many different lines and topic names that can come from the subjectLine

After splitting the line, remove the first element from the array, then join the rest back together.
var thisLine = "Topic: Computer Science";
var subjectLine = thisLine.split(" ");
subjectLine.splice(0, 1);
var subjectString = subjectLine.join(" ");
console.log(subjectString);

You might consider using a regular expression, it'll probably be a lot easier than working with arrays: match the non-space characters at the beginning of the string, followed by at least one space, and .replace with the empty string:
const subjectString = thisLine.replace(/^\S+\s+/, '');
const transform = line => line.replace(/^\S+\s+/, '');
console.log(transform('Topic: Computer Science'));

You need to know where the heading stops and the real data starts. Then delete all characters up to that point.
So, for instance, if you know that the heading ends with a colon, then do:
var line = "this is the topic: Computer Science";
var topic = line.replace(/^.*:\s*/, "");
console.log(topic);

Concatenate / simplify RegExp

I've this working RegExp in my JavaScript file:
var reA = new RegExp(urlValueToRemove);
var reB = new RegExp('(,&)');
var reC = new RegExp('(,,)');
var reD = new RegExp('(=,)');
var reE = new RegExp('(,$)');
window.history.pushState(null, null, decodeURIComponent(window.location.search).replace(reA, '').replace(reB, '&').replace(reC, ',').replace(reD, '=').replace(reE, ''));
Is it possible to concatenate / simplify this so that I don't need to do the replace 5 times?
I've asked this in the codereview community but there is nobody available so I think I must need to wait days there.
Example
When I have this URL here:
http://localhost.com/?color=Red,Blue,Green&size=X,L,M,S
When I want to remove now the Green from the URL I can pass Green to the first Regex reA and it gets removed from the URL:
http://localhost.com/?color=Red,Blue&size=X,L,M,S

You can use the capture group to indicate what should be kept, and join the two cases with a |: one case needs to keep the character that precedes the word (like =), the other what follows the word (like &):
function removeWord(url, text) {
const re = new RegExp(`,${text}(&|,|$)|(=)${text},`, 'g');
return url.replace(re, '$1$2');
}
const url = "http://localhost.com/?color=Red,Blue,Green&size=X,L,M,S"
console.log(removeWord(url, "Green"));

Change occurrences of sum(something) to something_sum

Admittedly I'm terrible with RegEx and pattern replacements, so I'm wondering if anyone can help me out with this one as I've been trying now for a few hours and in the process of pulling my hair out.
Examples:
sum(Sales) needs to be converted to Sales_sum
max(Sales) needs to be converted to Sales_max
min(Revenue) needs to be converted to Revenue_min
The only available prefixed words will be sum, min, max, avg, xcount - not sure if this makes a difference in the solution.
Hopefully that's enough information to kind of show what I'm trying to do. Is this possible via RegEx?
Thanks in advance.

There are a few possible ways, for example :
var str = "min(Revenue)";
var arr = str.match(/([^(]+)\(([^)]+)/);
var result = arr[2]+'_'+arr[1];
result is then "Revenue_min".
Here's a more complex example following your comment, handling many matches and lowercasing the verb :
var str = "SUM(Sales) + MIN(Revenue)";
var result = str.replace(/\b([^()]+)\(([^()]+)\)/g, function(_,a,b){
return b+'_'+a.toLowerCase()
});
Result : "Sales_sum + Revenue_min"

Try with:
var input = 'sum(Sales)',
matches = input.match(/^([^(]*)\(([^)]*)/),
output = matches[2] + '_' + matches[1];
console.log(output); // Sales_sum
Also:
var input = 'sum(Sales)',
output = input.replace(/^([^(]*)\(([^)]*)\)/, '$2_$1');

You can use replace with tokens:
'sum(Sales)'.replace(/(\w+)\((\w+)\)/, '$2_$1')

Using a whitelist for your list of prefixed words:
output = input.replace(/\b(sum|min|max|avg|xcount)\((.*?)\)/gi,function(_,a,b) {
return b.toLowerCase()+"_"+a;
});
Added \b, a word boundary. This prevents something like "haxcount(xorz)" from becoming "haxorz_xcount"

Javascript RegExp match & Multiple backreferences

I'm having trouble trying to use multiple back references in a javascript match so far I've got: -
function newIlluminate() {
var string = "the time is a quarter to two";
var param = "time";
var re = new RegExp("(" + param + ")", "i");
var test = new RegExp("(time)(quarter)(the)", "i");
var matches = string.match(test);
$("#debug").text(matches[1]);
}
newIlluminate();
#Debug when matching the Regex 're' prints 'time' which is the value of param.
I've seen match examples where multiple back references are used by wrapping the match in parenthesis however my match for (time)(quarter)... is returning null.
Where am I going wrong? Any help would be greatly appreciated!

Your regex is literally looking for timequarterthe and splitting the match (if it finds one) into the three backreferences.
I think you mean this:
var test = /time|quarter|the/ig;

Your regex test simply doesn't match the string (as it does not contain the substring timequarterthe). I guess you want alternation:
var test = /time|quarter|the/ig; // does not even need a capturing group
var matches = string.match(test);
$("#debug").text(matches!=null ? matches.join(", ") : "did not match");

How to obtain index of subpattern in JavaScript regexp?

I wrote a regular expression in JavaScript for searching searchedUrl in a string:
var input = '1234 url( test ) 5678';
var searchedUrl = 'test';
var regexpStr = "url\\(\\s*"+searchedUrl+"\\s*\\)";
var regex = new RegExp(regexpStr , 'i');
var match = input.match(regex);
console.log(match); // return an array
Output:
["url( test )", index: 5, input: "1234 url( test ) 5678"]
Now I would like to obtain position of the searchedUrl (in the example above it is the position of test in 1234 url( test ) 5678.
How can I do that?

As far as I could tell it wasn't possible to get the offset of a sub-match automatically, you have to do the calculation yourself using either lastIndex of the RegExp, or the index property of the match object returned by exec(). Depending on which you use you'll either have to add or subtract the length of groups leading up to your sub-match. However, this does mean you have to group the first or last part of the Regular Expression, up to the pattern you wish to locate.
lastIndex only seems to come into play when using the /g/ global flag, and it will record the index after the entire match. So if you wish to use lastIndex you'll need to work backwards from the end of your pattern.
For more information on the exec() method, see here:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec
The following succinctly shows the solution in operation:
var str = '---hello123';
var r = /([a-z]+)([0-9]+)/;
var m = r.exec( str );
alert( m.index + m[1].length ); // will give the position of 123
update
This would apply to your issue using the following:
var input = '1234 url( test ) 5678';
var searchedUrl = 'test';
var regexpStr = "(url\\(\\s*)("+searchedUrl+")\\s*\\)";
var regex = new RegExp(regexpStr , 'i');
var match = regex.exec(input);
Then to get the submatch offset you can use:
match.index + match[1].length
match[1] now contains url( (plus two spaces) due to the bracket grouping which allows us to tell the internal offset.
update 2
Obviously things are a little more complicated if you have patterns in the RegExp, that you wish to group, before the actual pattern you want to locate. This is just a simple act of adding together each group length.
var s = '~- [This may or may not be random|it depends on your perspective] -~';
var r = /(\[)([a-z ]+)(\|)([a-z ]+)(\])/i;
var m = r.exec( s );
To get the offset position of it depends on your perspective you would use:
m.index + m[1].length + m[2].length + m[3].length;
Obviously if you know the RegExp has portions that never change length, you can replace those with hard coded numeric values. However, it's probably best to keep the above .length checks, just in case you — or someone else — ever changes what your expression matches.

JS doesn't have a direct way to get the index of a subpattern/capturing group. But you can work around that with some tricks. For example:
var reStr = "(url\\(\\s*)" + searchedUrl + "\\s*\\)";
var re = new RegExp(reStr, 'i');
var m = re.exec(input);
if(m){
var index = m.index + m[1].length;
console.log("url found at " + index);
}

You can add the 'd' flag to the regex in order to generate indices for substring matches.
const input = '1234 url( test ) 5678';
const searchedUrl = 'test';
const regexpStr = "url\\(\\s*("+searchedUrl+")\\s*\\)";
const regex = new RegExp(regexpStr , 'id');
const match = regex.exec(input).indices[1]
console.log(match); // return [11, 15]

You don't need the index.
This is a case where providing just a bit more information would have gotten a much better answer. I can't fault you for it; we're encouraged to create simple test cases and cut out irrelevant detail.
But one important item was missing: what you plan to do with that index. In the meantime, we were all chasing the wrong problem. :-)
I had a feeling something was missing; that's why I asked you about it.
As you mentioned in the comment, you want to find the URL in the input string and highlight it in some way, perhaps by wrapping it in a <b></b> tag or the like:
'1234 url( <b>test</b> ) 5678'
(Let me know if you meant something else by "highlight".)
You can use character indexes to do that, however there is a much easier way using the regular expression itself.
Getting the index
But since you asked, if you did need the index, you could get it with code like this:
var input = '1234 url( test ) 5678';
var url = 'test';
var regexpStr = "^(.*url\\(\\s*)"+ url +"\\s*\\)";
var regex = new RegExp( regexpStr , 'i' );
var match = input.match( regex );
var start = match[1].length;
This is a bit simpler than the code in the other answers, but any of them would work equally well. This approach works by anchoring the regex to the beginning of the string with ^ and putting all the characters before the URL in a group with (). The length of that group string, match[1], is your index.
Slicing and dicing
Once you know the starting index of test in your string, you could use .slice() or other string methods to cut up the string and insert the tags, perhaps with code something like this:
// Wrap url in <b></b> tag by slicing and pasting strings
var output =
input.slice( 0, start ) +
'<b>' + url + '</b>' +
input.slice( start + url.length );
console.log( output );
That will certainly work, but it is really doing things the hard way.
Also, I left out some error handling code. What if there is no matching URL? match will be undefined and the match[1] will fail. But instead of worrying about that, let's see how we can do it without any character indexing at all.
The easy way
Let the regular expression do the work for you. Here's the whole thing:
var input = '1234 url( test ) 5678';
var url = 'test';
var regexpStr = "(url\\(\\s*)(" + url + ")(\\s*\\))";
var regex = new RegExp( regexpStr , 'i' );
var output = input.replace( regex, "$1<b>$2</b>$3" );
console.log( output );
This code has three groups in the regular expression, one to capture the URL itself, with groups before and after the URL to capture the other matching text so we don't lose it. Then a simple .replace() and you're done!
You don't have to worry about any string lengths or indexes this way. And the code works cleanly if the URL isn't found: it returns the input string unchanged.

We Keep Coding

JavaScript is the programming language of the Web.

Javascript Regexp loop all matches - javascript

This format is based on Markdown. There are several JavaScript ports available. If you don't want the whole syntax, then I recommend stealing the portions related to links.

Related

js Split array add space between words (not first)

Concatenate / simplify RegExp

Change occurrences of sum(something) to something_sum

Javascript RegExp match & Multiple backreferences

How to obtain index of subpattern in JavaScript regexp?

Categories

Resources