I'm highlighting lines that contain a certain phrase using regex.
My current highlight function will read the whole text and place every instance of the phrase within a highlight span.
const START = "<span name='highlight' style='background-color: yellow;'>";
const END = "</span>"
function highlight(text, toReplace) {
let reg = new RegExp(toReplace, 'ig');
return text.replace(reg, START + toReplace + END);
}
I want to expand my regex so that, for each phrase, it highlights from the preceding <br> to the following <br>.
highlight("This<br>is some text to<br>highlight.", "text");
Current output:
This<br>is some<span name="highlight" style="background-color:yellow;">text</span> to<br>highlight."
Wanted output:
This<br><span name="highlight" style="background-color:yellow;">is some text to</span><br>highlight.
You may want to match all chars other than < and > before and after the text and it is advisable to escape the literal text you pass to the RegExp constructor. Also, to replace with the whole match, just use $& placeholder:
const START = "<span name='highlight' style='background-color: yellow;'>";
const END = "</span>"
function highlight(text, toReplace) {
let reg = new RegExp("(<br/?>)|[^<>]*" + toReplace.replace(/[-\/\\^$*+?.()|[\]{}]/g, '\\$&') + "[^<>]*", 'ig');
return text.replace(reg, function ($0,$1) { return $1 ? $1 : START + $0 + END; });
}
console.log(highlight("This<br>is some text to<br>highlight.", "text"));
console.log(highlight("This<br>is a bunch of<br>text", "b"));
The regex will look like /[^<>]*text[^<>]*/gi, it will match 0 or more chars other than < and >, then text in a case insensitive way and then again 0 or more chars other than < and >, and the $& in the replacement will put the matched value into the highlighting tags.
My guess is that this simple expression,
(<br>)(.*?)(\1)
might work here.
const regex = /(<br>)(.*?)(\1)/gs;
const str = `This<br>is some text to<br>highlight. This<br>is some text to<br>highlight. This<br>is some text to<br>highlight.
This<br>is some
text to<br>highlight. This<br>is some text to<br>highlight. This<br>is some text to<br>highlight.`;
const subst = `$1<span name='highlight' style='background-color: yellow;'>$2</span>$3`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log(result);
In this demo, the expression is explained, if you might be interested.
Related
I'm trying to replace all [space] with - between __tt and tt__
I could replace space in the entire string with below regex.
var str = document.getElementById('tt').value;
str = str.replace(/(?<=__tt.*) (?=.*tt__)/g, '-');
console.log(str);
textarea {
width: 400px;
min-height: 100px;
}
<textarea id="tt">This is a long text __tt where i want
to replace
some text tt__ between some character
</textarea>
Is there a way I could do the replace only between __tt and tt__ tag ???
You could take positive look behind and look ahead.
var str = 'This is a long text __tt where i want to replace some text tt__ between some character';
str = str.replace(/(?<=__tt.*) (?=.*tt__)/g, '-');
console.log(str);
Without lookarounds, which are not yet fully supported by all browsers you might also use a replacement using a callback function over the selected match only.
str = str.replace(/__tt.*?tt__/g, m => m.replace(/ /g, "-"));
var str = 'This is a long text __tt where i want to replace some text tt__ between some character';
str = str.replace(/__tt.*?tt__/g, m => m.replace(/ /g, "-"));
console.log(str);
Note
If you want a single hyphen in the replacement for multiple consecutive spaces, you could repeat the space 1 or more times using + or match 1 or more whitespace chars using \s+
With the updated question, get the text of the element:
var elm = document.getElementById("tt");
elm.textContent = elm.textContent.replace(/__tt[^]*?tt__/g, m => m.replace(/ +/g, "-"));
<textarea id="tt" rows="4" cols="50">This is a long text __tt where i want
to replace
some text tt__ between some character
</textarea>
can try it
let str = 'This is a long text __tt where i want to replace some text tt__ between some character';
str = str.replace(/__tt.*?tt__/g, (item) => item.replace(/ /g, "-"));
console.log(str);
I am building an autocomplete in JavaScript that needs to highlight words when doing a search:
That works fine, but there's an issue with escaped characters.
When trying to highlight a text with escaped characters (for example regex &>< example), the following is happening:
That's happening because I am doing the following:
element.innerHTML.replace(/a/g, highlight)
function highlight(str) {
return '<span class="foo"' + '>' + str + '</span>';
}
and innerHTML includes the word &, so it makes sense.
In conclusion, I need a way to solve that so I would like a function that:
receives a and regex <br> example and returns regex <br> ex<span class="foo">a</span>mple
receives r and regex <br> example and returns <span class="foo">r</span>egex <b<span class="foo">r</span>> example
receives < and regex <br> example and returns regex <span class="foo"><</span>br> example
The entries may or may not contain html blocks, see the issue here (search for <br> or &)
str.replace only returns a new string with the intended replacements. The original string is unchanged.
var str = 'replace me';
var str2 = str.replace(/e/g, 'E');
// For display only
document.write('<pre>' + JSON.stringify({
str: str,
str2: str2
}, null, 2) + '</pre>');
Therefore the code needs to set the returned value from the replace back to the desired element.
Also, innerHTML will return the escaped text rather than the unescaped text. This could be unescaped itself within the function but why bother if you can use textContent. However by using innerHTML when it's time to set the highlighted text to the element it will auto-escape the text for us. :)
UPDATE: the values are passed to the function and then set to the element:
NOTES:
The regexp could probably be made a bit more robust to avoid having to handle the special case using lastIndex
There needs to be some protection on the input as someone could provide a nasty regexp pattern. There is a minimal protection check in this example.
higlightElemById('a', 'regex &>< example', 'a');
higlightElemById('b', 'regex &>< example', '&');
higlightElemById('c', 'regex <br> example', '<');
higlightElemById('d', 'regex <br> example', 'e');
higlightElemById('e', 'regex <br> example', '[aex]');
function higlightElemById(id, str, match) {
var itemElem = document.getElementById(id);
// minimal regexp escape to prevent shenanigans
var safeMatch = match.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
// construct regexp to match highlight text
var regexp = new RegExp('(.*?)(' + safeMatch + ')', 'g');
var text = '';
var lastIndex;
var matches;
while (matches = regexp.exec(str)) {
// Escape the non-matching prefix
text += escapeHTML(matches[1]);
// Highlight the match
text += highlight(matches[2]);
// Cache the lastIndex in case no regexp at end
lastIndex = regexp.lastIndex;
}
if (text) {
text += escapeHTML(str.substr(lastIndex));
} else {
text += escapeHTML(str);
}
itemElem.innerHTML = text;
}
function highlight(str) {
return '<span class="myHighlightClass">' + str + '</span>';
}
function escapeHTML(html) {
this.el = this.el || document.createElement('textarea');
this.el.textContent = html;
return this.el.innerHTML;
}
.myHighlightClass {
text-decoration: underline;
color: red;
}
<div id="a"></div>
<div id="b"></div>
<div id="c"></div>
<div id="d"></div>
<div id="e"></div>
I know the regex that separates two words as following:
input:
'WonderWorld'
output:
'Wonder World'
"WonderWorld".replace(/([A-Z])/g, ' $1');
Now I am looking to remove number in year format from string, what changes should be done in the above code to get:
input
'WonderWorld 2016'
output
'Wonder World'
You can match the location before an uppercase letter (but excluding the beginning of a line) with \B(?=[A-Z]) and match the trailing spaces if any with 4 digits right before the end (\s*\b\d{4}\b). In a callback, check if the match is not empty, and replace accordingly. If a match is empty, we matched the location before an uppercase letter (=> replace with a space) and if not, we matched the year at the end (=> replace with empty string). The four digit chunks are only matched as whole words due to the \b word boundaries around the \d{4}.
var re = /\B(?=[A-Z])|\s*\d{4}\b/g;
var str = 'WonderWorld 2016';
var result = str.replace(re, function(match) {
return match ? "" : " ";
});
document.body.innerHTML = "<pre>'" + result + "'</pre>";
A similar approach, just a different pattern for matching glued words (might turn out more reliable):
var re = /([a-z])(?=[A-Z])|\s*\b\d{4}\b/g;
var str = 'WonderWorld 2016';
var result = str.replace(re, function(match, group1) {
return group1 ? group1 + " " : "";
});
document.body.innerHTML = "<pre>'" + result + "'</pre>";
Here, ([a-z])(?=[A-Z]) matches and captures into Group 1 a lowercase letter that is followed with an uppercase one, and inside the callback, we check if Group 1 matched (with group1 ?). If it matched, we return the group1 + a space. If not, we matched the year at the end, and remove it.
Try this:
"WonderWorld 2016".replace(/([A-Z])|\b[0-9]{4}\b/g, ' $1')
How about this, a single regex to do what you want:
"WonderWorld 2016".replace(/([A-Z][a-z]+)([A-Z].*)\s.*/g, '$1 $2');
"Wonder World"
get everything apart from digits and spaces.
re-code of #Wiktor Stribiżew's solution:
str can be any "WonderWorld 2016" | "OneTwo 1000 ThreeFour" | "Ruby 1999 IamOnline"
str.replace(/([a-z])(?=[A-Z])|\s*\d{4}\b/g, function(m, g) {
return g ? g + " " : "";
});
import re
remove_year_regex = re.compile(r"[0-9]{4}")
Test regex expression here
We have a string:
var dynamicString = "This isn't so dynamic, but it will be in real life.";
User types in some input:
var userInput = "REAL";
I want to match on this input, and wrap it with a span to highlight it:
var result = " ... but it will be in <span class='highlight'>real</span> life.";
So I use some RegExp magic to do that:
// Escapes user input,
var searchString = userInput.replace(/[\-\[\]\/\{\}\(\)\*\+\?\.\\\^\$\|]/g, "\\$&");
// Now we make a regex that matches *all* instances
// and (most important point) is case-insensitive.
var searchRegex = new RegExp(searchString , 'ig');
// Now we highlight the matches on the dynamic string:
dynamicString = dynamicString.replace(reg, '<span class="highlight">' + userInput + '</span>');
This is all great, except here is the result:
console.log(dynamicString);
// -> " ... but it will be in <span class='highlight'>REAL</span> life.";
I replaced the content with the user's input, which means the text now gets the user's dirty case-insensitivity.
How do I wrap all matches with the span shown above, while maintaining the original value of the matches?
Figured out, the ideal result would be:
// user inputs 'REAL',
// We get:
console.log(dynamicString);
// -> " ... but it will be in <span class='highlight'>real</span> life.";
You'd use regex capturing groups and backreferences to capture the match and insert it in the string
var searchRegex = new RegExp('('+userInput+')' , 'ig');
dynamicString = dynamicString.replace(searchRegex, '<span class="highlight">$1</span>');
FIDDLE
You can use it without capturing groups too.
dynamicString = text.replace(new RegExp(userInput, 'ig'), '<span class="highlight">$&</span>');
I'm trying to take a chunk of plain text and convert parts of it into html tags. I don't need a full rich editor, just these few tags:
**bold**
__underline__
~~italics~~
--strike--
<<http://www.link.com>>
This is the method I have attempted to write but my lack of regex/js seems to be holding it back:
function toMarkup($this) {
var text = $this.text();
text = text.replace("\*\*(.*)\*\*", "<b>$1</b>");
text = text.replace("__(.*)__", "<u>$1</u>");
text = text.replace("~~(.*)~~", "<i>$1</i>");
text = text.replace("--(.*)--", "<del>$1</del>");
text = text.replace("<<(.*)>>", "<a href='$1'>Link</a>");
$this.html(text);
}
Any glaring errors as to why these replaces are not working? Another issue I'm just now realizing is by converting this text to html I am unescaping any other potential tags that may be malicious. A bonus would be any advice on how to only escape these elements and nothing else.
First of all, they are just string, not regexs. Secondly you should use not-greedy .*.
Also, you may want to use the g modifier to match every occourrence in the text.
function toMarkup($this) {
var text = $this.text();
text = text.replace(/\*\*(.*?)\*\*/g, "<b>$1</b>");
text = text.replace(/__(.*?)__/g, "<u>$1</u>");
text = text.replace(/~~(.*?)~~/g, "<i>$1</i>");
text = text.replace(/--(.*?)--/g, "<del>$1</del>");
text = text.replace(/<<(.*?)>>/g, "<a href='$1'>Link</a>");
$this.html(text);
}
Use a Regexp object as the first argument to text.replace() instead of a string:
function toMarkup($this) {
var text = $this.text();
text = text.replace(/\*\*(.*?)\*\*/g, "<b>$1</b>");
text = text.replace(/__(.*?)__/g, "<u>$1</u>");
text = text.replace(/~~(.*?)~~/g, "<i>$1</i>");
text = text.replace(/--(.*?)--/g, "<del>$1</del>");
text = text.replace(/<<(.*?)>>/g, "<a href='$1'>Link</a>");
$this.html(text);
}
Note that I also replaced all of the .* with .*? which will match as few characters as possible, otherwise your matches may be too long. For example you would match from the first ** to the very last ** instead of stopping at the next one. The regex also needs the g flag so that all matches will be replaced (thanks Aaron).
function toMarkup($this) {
$this.html ($this.text ().replace (/(__|~~|--|\*\*)(.*?)\1|<<(.*?)>>\/g,
function (m, m1, m2, m3) {
m[1] = {'**' : 'b>', '__': 'u>', '--': 'del>', '~~': 'i>'}[m[1]];
return m[3] ? 'Link'
: ('<' + m[1] + m[2] + '</' + m[1]);
});
}
Note that you cannot nest these, i.e. if you say __--abc--__ will be converted to <u>--abc--</u>.