I have a short but complex regular expression to trim spaces regardless of html tags present in the string.
var text = "<span><span>ex ample </span> </span>";
// trim from start; not relevant in this example
text = text.replace(/^((<[^>]*>)*)\s+/g, "$1");
// trim from end
text = text.replace(/\s+((<[^>]*>)*)$/g, "$1");
console.log(text);
<span><span>ex ample </span> </span> - example input
<span><span>ex ample</span></span> - expected output
<span><span>ex ample </span></span> - observed output
How do I achieve my expected output?
I've tried adding the /g flag because it should supposedly match more than once and that should fix it (running the replace twice does work for the example) but it doesn't seem to repeat anything at all.
Alternative ways to trim strings regardless of tags are also appreciated because that is my primary objective. The secondary objective is learning why this didn't work.
You need to add some meaning to your tags, some need their spaces, some don't.
Try this:
text.replace(/\s*(<\/?(span|div)>)\s*/g, "$1")
.trim()
.replace(/\s+/g, ' ');
It:
replaces spaces around tags "surrounding" content
trims spaces around global string
removes redundant spaces
The list of "surrounding" tags can be changed to include things like tr...
Steps 2 and 3 might come first to speed things up.
Tried it with:
var text = "<div> <i>ano</i> <b>ther</b> <span> <b>my</b> <i>ex</i> <u> ample </u> </span> </div>";
First answer, prior to comments.
The idea is to remove all spaces between:
a non-space character and an opening tag
a closing tag and a non-space character
text.replace(/([^\s])\s*(<)/g, "$1$2")
.replace(/([>])\s*([^\s])/g, "$1$2")
.trim();
Preamble: don't just copy this, read to the end.
Thinking from the other way around - by replacing until no match is found instead of until no change is made, this seems to work very simply.
var text = "<span><span>ex ample </span> </span>";
var trim_start = /^((<[^>]*>)*)\s+/;
while(text.match(trim_start)) {
text = text.replace(trim_start, "$1");
}
var trim_end = /\s+((<[^>]*>)*)$/;
while (text.match(trim_end)) {
text = text.replace(trim_end, "$1");
}
console.log(text);
The output is as expected - the only space is between ex ample
But this has a big problem if the replace might not change anything. Simply changing \s+ to \s* makes it turn into an infinite cycle. So, all in all, it works for my case but is not robust and to use it, you must be completely sure every single replace will change something when the regex matches.
Related
I have the lyrics in the format like:
[00:26.8]Lo [00:27.0]rem[00:27.2] Ipsum[00:27.4] sam[00:27.6]ple [00:27.9]text[00:28.1] to [00:28.5]
[00:28.51]demonstrate[00:28.7] the[00:28.9] lyrics[00:29.1] text
I use the following regex to match time tags ([hh:mm.ss]):
/\[\d{1,2}:\d{1,2}\.\d{0,2}\]/ig
But how can I find and delete the last time tag ([00:29.1] in the example above)? Understand that I can match all occurences, take the last one, find the position of according tag within the text (with lastIndexOf usage), then delete the tag. But is there any better way to achieve it?
Upd. There is one more condition - if the time tag is at the beginning of the line, then it shouldn't be removed. I.e. in case of the lyrics:
[00:26.8]Lo [00:27.0]rem[00:27.2] Ipsum[00:27.4] sam[00:27.6]ple [00:27.9]text[00:28.1] to [00:28.5]
[00:28.51]demonstrate
The tag found and deleted should be [00:28.5], not [00:28.51].
Add a look ahead assertion to ensure that not [..] follows the matched string as
/\[\d{1,2}:\d{1,2}\.\d{0,2}\](?!(.|\n)*\[)/
Regex Demo
Also can do this without lookahead by adding a greedy [^]* or [\s\S]* before to eat up.
var str = str.replace(/^([^]*)(\[\d{1,2}:\d{1,2}\.\d{0,2}\])/, "$1");
Replace with captured first part. See fiddle
Ad update: add a [^\n] before:
var str = str.replace(/^([\s\S]*[^\n])(\[\d{1,2}:\d{1,2}\.\d{0,2}\])/, "$1");
See fiddle
The following should do (It makes sure the wanted [hh:mm.ss] isnt followed by any other opening or closing bracket till the end of the string... wich means that it's the last one you're looking for):
(\[\d{1,2}:\d{1,2}\.\d{0,2}\])(?=[^\]\[]*$)
DEMO
I need help with a Regex in JavaScript (for a Photoshop script) to match bold tags around words in a string. (not worried about italic or bolditalic at this time).
I don't want to split the string at this stage, I just want to chop it up into certain alternating chunks into using match.
// Be <b>bold!</b> Be fabulous!
Should get match to // ("Be ", "bold!", "Be fabulous!") // line commented for obvious reasons
After that, I'll remove the bold tags - unless Regex can do that in one pass - don't underestimate it's power!
This is what I have so far
(.*?)([<b>]+[\S]+[<\/b>]+[\s]+)+(.*)/g
Only it doesn't match everything as seen here
Just for the record, before anyone suggests a much easier JS solution:
In the Photoshop DOM you can't script regular text mixed with bold. You probably can with Action Manager code, but with generating text that could be a big headache.
To get around this (not an ideal solution) I'll be using regular text & splitting it up at the appropriate places & swapping to bold.
[<b>] is character class, use simply <b> instead.
/(.*?)(<b>+\S+<\/b>+\s+)+(.*)/g
and change \S to [^<]
/(.*?)(<b>+[^<]+<\/b>+\s+)+(.*)/g
You can try:
<b>(.*?)<\/b>
Here is online demo
sample code:
var re = /<b>(.*?)<\/b>/gi;
var str = 'Be <b>bold!</b> Be fabulous! ';
var subst = '$1';
var result = str.replace(re, subst);
output:
Be bold! Be fabulous!
Better try with String.split() function:
var re = /\s*<\/?b>\s*/gi;
var str = 'Be <b>bold!</b> Be fabulous!';
console.log(str.split(re));
output:
["Be", "bold!", "Be fabulous!"]
I have made a javascript function to replace some words with other words in a text area, but it doesn't work. I have made this:
function wordCheck() {
var text = document.getElementById("eC").value;
var newText = text.replace(/hello/g, '<b>hello</b>');
document.getElementById("eC").innerText = newText;
}
When I alert the variable newText, the console says that the variable doesn't exist.
Can anyone help me?
Edit:
Now it replace the words, but it replaces it with <b>hello</b>, but I want to have it bold. Is there a solution?
Update:
In response to your edit, about your wanting to see the word "hello" show up in bold. The short answer to that is: it can't be done. Not in a simple textarea, at least. You're probably looking for something more like an online WYSIWYG editor, or at least a RTE (Richt Text Editor). There are a couple of them out there, like tinyMCE, for example, which is a decent WYSIWYG editor. A list of RTE's and HTML editors can be found here.
First off: As others have already pointed out: a textarea element's contents is available through its value property, not the innerText. You get the contents alright, but you're trying to update it through the wrong property: use value in both cases.
If you want to replace all occurrences of a string/word/substring, you'll have to resort to using a regular expression, using the g modifier. I'd also recommend making the matching case-insensitive, to replace "hello", "Hello" and "HELLO" all the same:
var txtArea = document.querySelector('#eC');
txtArea.value = txtArea.value.replace(/(hello)/gi, '<b>$1</b>');
As you can see: I captured the match, and used it in the replacement string, to preserve the caps the user might have used.
But wait, there's more:
What if, for some reason, the input already contains <b>Hello</b>, or contains a word containing the string "hello" like "The company is called hellonearth?" Enter conditional matches (aka lookaround assertions) and word boundaries:
txtArea.value = txtArea.value.replace(x.value.replace(/(?!>)\b(hello)\b(?!<)/gi, '<b>$1</b>');
fiddle
How it works:
(?!>): Only match the rest if it isn't preceded by a > char (be more specific, if you want to and use (?!<b>). This is called a negative look-ahead
\b: a word boundary, to make sure we're not matching part of a word
(hello): match and capture the string literal, provided (as explained above) it is not preceded by a > and there is a word boundary
(?!<): same as above, only now we don't want to find a matching </b>, so you can replace this with the more specific (?!<\/b>)
/gi: modifiers, or flags, that affect the entire pattern: g for global (meaning this pattern will be applied to the entire string, not just a single match). The i tells the regex engine the pattern is case-insensitive, ie: h matches both the upper and lowercase character.
The replacement string <b>$1</b>: when the replacement string contains $n substrings, where n is a number, they are treated as backreferences. A regex can group matches into various parts, each group has a number, starting with 1, depending on how many groups you have. We're only grouping one part of the pattern, but suppose we wrote:
'foobar hello foobar'.replace(/(hel)(lo)/g, '<b>$1-$2</b>');
The output would be "foobar <b>hel-lo</b> foobar", because we've split the match up into 2 parts, and added a dash in the replacement string.
I think I'll leave the introduction to RegExp at that... even though we've only scratched the surface, I think it's quite clear now just how powerful regex's can be. Put some time and effort into learning more about this fantastic tool, it is well worth it.
If <textarea>, then you need to use .value property.
document.getElementById("eC").value = newText;
And, as mentioned Barmar, replace() replaces only first word. To replace all word, you need to use simple regex. Note that I removed quotes. /g means global replace.
var newText = text.replace(/hello/g, '<b>hello</b>');
But if you want to really bold your text, you need to use content editable div, not text area:
<div id="eC" contenteditable></div>
So then you need to access innerHTML:
function wordCheck() {
var text = document.getElementById("eC").innerHTML;
var newText = text.replace(/hello/g, '<b>hello</b>');
newText = newText.replace(/<b><b>/g,"<b>");//These two lines are there to prevent <b><b>hello</b></b>
newText = newText.replace(/<\/b><\/b>/g,"</b>");
document.getElementById("eC").innerHTML = newText;
}
I got a task to write code highlighter for C#. Everything's pretty good, but I wish to optimize indentation. So, I have a regexp looking like /(\t|[ ]{4})/g, so I replace tabulation or 4 space chars with <span style="margin-left: 2em;" /> and it looks good, but it creates a lot of unnecessary spans. I want to use something like /^[ ]{x}/g and replace with <span style='margin-left: "+(0.5*x)+"em;' /> to have only one span per line with appropriate margin. str.match() won't work because it searches in all document, not per line.
If your regular expression has the g flag, you can execute it over and over again, getting all matches from the string, including the length of the match:
var re = /^(\t|[ ]{4})/g;
var match;
while((match = re.exec(text)) {
// use match.index and match[0].length
}
I want to wrap every word of a string in a <span> tag, without breaking any existing html tags and without including any punctuation marks.
For example the following string:
This... is, an. example! <em>string</em>?!
should be wrapped as:
<span>This</span>... <span>is</span>, <span>an</span>. <span>example</span>!
<span><em>string</em></span>?!
Ideally, I just need to wrap the words and nothing else.
Except for apostrophes, they should be wrapped, too.
it's => <span>it's</span>
give 'em => <span>give</span> <span>'em</span>
teachers' => <span>teachers'</span>
Right now I'm using a very simple regular expression:
str.replace(/([^\s<>]+)(?:(?=\s)|$)/g, '<span>$1</span>');
I found it somewhere here on stackoverflow. But it only wraps every word on white spaces and wraps punctuation marks too, which is undesirable in my case.
I know I should be ashamed for being so lousy at regular expressions.
Can someone please help me?
Many thanks!
Try this regex:
var str = "This string... it's, an. example! <em>string</em>?!";
str.replace(/([A-Za-z0-9'<>/]+)/g, '<span>$1</span>');
// "<span>This</span> <span>string</span>... <span>it's</span>, <span>an</span>. <span>example</span>! <span><em>string</em></span>?!"
I played around and got this to work :
String toMarkUp = "Each word needs a strong tag around it. I really want to wrap each and every word";
String markedUp = toMarkUp.replaceAll("\\b(\\w+)\\b","<span>$1</span>");
The regex is capturing every word with 1 or more characters (\w+) surrounded by word boundaries, and using forward lookup group to reference it in the replacement with a $1, 1 being the first capture group in the regex.
Output :
<span>Each</span> <span>word</span> <span>needs</span> <span>a</span> <span>strong</span> <span>tag</span> <span>around</span> <span>it</span>. <span>I</span> <span>really</span> <span>want</span> <span>to</span> <span>emphasize</span> <span>each</span> <span>and</span> <span>every</span> <span>word</span>