Regular expressions: grab part of string - javascript

So I got the following input inside my textarea element:
<quote>hey</quote>
what's up?
I want to separate the text between the <quote> and </quote> ( so the result would be 'hey' and nothing else in this case.
I tried with .replace and the following regular expression, but it did not achieve the right result and I can't see why:
quoteContent = value.replace(/<quote>|<\/quote>.*/gi, ''); (the result is 'hey what's up'it doesn't remove the last part, in this case 'what's up', it only removes the quote marks)
Does someone know how to solve this?

Even if it's only a small html snippet, don't use regex to do any html parsing. Instead, take the value, use DOM methods and extract the text from an element. A bit more code, but the better and safer way to do that:
const el = document.getElementById('foo');
const tmp = document.createElement('template');
tmp.innerHTML = el.value;
console.log(tmp.content.querySelector('quote').innerText);
<textarea id="foo">
<quote>hey</quote>
what's up?
</textarea>

You could also try using the match method:
quoteContent = value.match(/<quote>(.+)<\/quote>/)[1];

You should try to avoid parsing HTML using regular expressions.
<quote><!-- parsing HTML is hard when </quote> can appear in a comment -->hey</quote>
You can just use the DOM to do it for you.
// Parse your fragment
let doc = new DOMParser().parseFromString(
'<quote>hey</quote>\nWhat\'s up?', 'text/html')
// Use DOM lookup to find a <quote> element and get its
// text content.
let { textContent } = doc.getElementsByTagName('quote')[0]
// We get plain text and don't need to worry about "<"s
textContent === 'hey'

The dot . will not match new lines.
Try this:
//(.|\n)* will match anything OR a line break
quoteContent = value.replace(/<quote>|<\/quote>(.|\n)*/gi, '');

Related

Javascript - regex replace string [duplicate]

I want to find and replace text in a HTML document between, say inside the <title> tags. For example,
var str = "<html><head><title>Just a title</title></head><body>Do nothing</body></html>";
var newTitle = "Updated title information";
I tried using parseXML() in jQuery (example below), but it is not working:
var doc= $($.parseXML(str));
doc.find('title').text(newTitle);
str=doc.text();
Is there a different way to find and replace text inside HTML tags? Regex or may be using replaceWith() or something similar?
I did something similar in a question earlier today using regexes:
str = str.replace(/<title>[\s\S]*?<\/title>/, '<title>' + newTitle + '<\/title>');
That should find and replace it. [\s\S]*? means [any character including space and line breaks]any number of times, and the ? makes the asterisk "not greedy," so it will stop (more quickly) when it finds </title>.
You can also do something like this:
var doc = $($.parseXML(str));
doc.find('title').text(newTitle);
// get your new data back to a string
str = (new XMLSerializer()).serializeToString(doc[0]);
Here is a fiddle: http://jsfiddle.net/Z89dL/1/
This would be a wonderful time to use Javascript's stristr(haystack, needle, bool) method. First, you need to get the head of the document using $('head'), then get the contents using .innerHTML.
For the sake of the answer, let's store $('head').innerHTML in a var called head. First, let's get everything before the title with stristr(head, '<title>', true), and what's after the title with stristr(head, '</title>') and store them in vars called before and after, respectively. Now, the final line is simple:
head.innerHTML = before + "<title>" + newTitle + after;

How to find and replace text in between two tags in HTML or XML document using jQuery?

I want to find and replace text in a HTML document between, say inside the <title> tags. For example,
var str = "<html><head><title>Just a title</title></head><body>Do nothing</body></html>";
var newTitle = "Updated title information";
I tried using parseXML() in jQuery (example below), but it is not working:
var doc= $($.parseXML(str));
doc.find('title').text(newTitle);
str=doc.text();
Is there a different way to find and replace text inside HTML tags? Regex or may be using replaceWith() or something similar?
I did something similar in a question earlier today using regexes:
str = str.replace(/<title>[\s\S]*?<\/title>/, '<title>' + newTitle + '<\/title>');
That should find and replace it. [\s\S]*? means [any character including space and line breaks]any number of times, and the ? makes the asterisk "not greedy," so it will stop (more quickly) when it finds </title>.
You can also do something like this:
var doc = $($.parseXML(str));
doc.find('title').text(newTitle);
// get your new data back to a string
str = (new XMLSerializer()).serializeToString(doc[0]);
Here is a fiddle: http://jsfiddle.net/Z89dL/1/
This would be a wonderful time to use Javascript's stristr(haystack, needle, bool) method. First, you need to get the head of the document using $('head'), then get the contents using .innerHTML.
For the sake of the answer, let's store $('head').innerHTML in a var called head. First, let's get everything before the title with stristr(head, '<title>', true), and what's after the title with stristr(head, '</title>') and store them in vars called before and after, respectively. Now, the final line is simple:
head.innerHTML = before + "<title>" + newTitle + after;

how to force jquery attr() to add the attribute with single quotes

I create an in memory div:
var video_div = document.createElement('div');
video_div.className = "vidinfo-inline";
In essence I have some variables:
var key = "data-video-srcs";
var value = '{"video1":"http://www.youtube.com/watch?v=KdxEAt91D7k&list=TLhaPoOja-0f4","video2":"http://www.youtube.com/watch?v=dVlaZfLlWQc&list=TLalXwg9bTOmo"}';
And I use jquery to add that data attribute to the div:
$(video_div).attr(key, value);
Here is my problem. After doing that I get this:
<div class="vidinfo-inline" data-video-srcs="{"video1":"http://www.youtube.com/watch?v=KdxEAt91D7k&list=TLhaPoOja-0f4","video2":"http://www.youtube.com/watch?v=dVlaZfLlWQc&list=TLalXwg9bTOmo"}"></div>
And that doesn't work putting that json in there. It has to be in single quotes. It has to look like this:
<div class="vidinfo-inline" data-video-srcs='{"video1":"http://www.youtube.com/watch?v=KdxEAt91D7k&list=TLhaPoOja-0f4","video2":"http://www.youtube.com/watch?v=dVlaZfLlWQc&list=TLalXwg9bTOmo"}'></div>
As later on I do something like this:
var video_srcs = $('.vidinfo-inline').data('video-srcs');
And that won't work unless the json is in single quotes.
Does anyone have any ideas?
EDIT:
According to jquery: http://api.jquery.com/data/#data-html5
When the data attribute is an object (starts with '{') or array (starts with '[') then jQuery.parseJSON is used to parse the string; it must follow valid JSON syntax including quoted property names. If the value isn't parseable as a JavaScript value, it is left as a string.
Thus I can't escape the double quotes, it has to be inside single quotes. I have a work around and I'll post that as an answer unless someone else has a better answer.
I have a workaround. And if anyone has a better solution, I'd love to see it.
I wrote a replace method:
var fixJson = function(str) {
return String(str)
.replace(/"{/g, "'{")
.replace(/}"/g, "}'");
};
So basically I send the html into this function and insert it into the DOM.
For example:
var html = htmlUnescape($('#temp_container').html());
html = fixJson(html);
I realize that has some code smell to it. I mean, going through everything on that element just to fix the double quotes to single quotes stinks. But for lack of other options or ideas, it works. :\
Replace the double quotes with HTML entities:
var value = '{"video1":"http://www.youtube.com/watch?v=KdxEAt91D7k&list=TLhaPoOja-0f4","video2":"http://www.youtube.com/watch?v=dVlaZfLlWQc&list=TLalXwg9bTOmo"}';
# Naive approach:
value = value.replace('&', '&').replace('"', '"');
# Using jQuery:
var $tmp = jQuery('<div></div>');
value = $tmp.text(value).html();
// Then store it as normal

Is there a way to convert HTML into normal text without actually write it to a selector with Jquery?

I understand so far that in Jquery, with html() function, we can convert HTML into text, for example,
$("#myDiv").html(result);
converts "result" (which is the html code) into normal text and display it in myDiv.
Now, my question is, is there a way I can simply convert the html and put it into a variable?
for example:
var temp;
temp = html(result);
something like this, of course this does not work, but how can I put the converted into a variable without write it to the screen? Since I'm checking the converted in a loop, thought it's quite and waste of resource if keep writing it to the screen for every single loop.
Edit:
Sorry for the confusion, for example, if result is " <p>abc</p> " then $(#mydiv).html(result) makes mydiv display "abc", which "converts" html into normal text by removing the <p> tags. So how can I put "abc" into a variable without doing something like var temp=$(#mydiv).text()?
Here is no-jQuery solution:
function htmlToText(html) {
var temp = document.createElement('div');
temp.innerHTML = html;
return temp.textContent; // Or return temp.innerText if you need to return only visible text. It's slower.
}
Works great in IE ≥9.
No, the html method doesn't turn HTML code into text, it turns HTML code into DOM elements. The browser will parse the HTML code and create elements from it.
You don't have to put the HTML code into the page to have it parsed into elements, you can do that in an independent element:
var d = $('<div>').html(result);
Now you have a jQuery object that contains a div element that has the elements from the parsed HTML code as children. Or:
var d = $(result);
Now you have a jQuery object that contains the elements from the parsed HTML code.
You could simply strip all HTML tags:
var text = html.replace(/(<([^>]+)>)/g, "");
Why not use .text()
$("#myDiv").html($(result).text());
you can try:
var tmp = $("<div>").attr("style","display:none");
var html_text = tmp.html(result).text();
tmp.remove();
But the way with modifying string with regular expression is simpler, because it doesn't use DOM traversal.
You may replace html to text string with regexp like in answer of user Crozin.
P.S.
Also you may like the way when <br> is replacing with newline-symbols:
var text = html.replace(/<\s*br[^>]?>/,'\n')
.replace(/(<([^>]+)>)/g, "");
var temp = $(your_selector).html();
the variable temp is a string containing the HTML
$("#myDiv").html(result); is not formatting text into html code. You can use .html() to do a couple of things.
if you say $("#myDiv").html(); where you are not passing in parameters to the `html()' function then you are "GETTING" the html that is currently in that div element.
so you could say,
var whatsInThisDiv = $("#myDiv").html();
console.log(whatsInThisDiv); //will print whatever is nested inside of <div id="myDiv"></div>
if you pass in a parameter with your .html() call you will be setting the html to what is stored inside the variable or string you pass. For instance
var htmlToReplaceCurrent = '<div id="childOfmyDiv">Hi! Im a child.</div>';
$("#myDiv").html(htmlToReplaceCurrent);
That will leave your dom looking like this...
<div id="myDiv">
<div id="childOfmyDiv">Hi! Im a child.</div>
</div>
Easiest, safe solution - use Dom Parser
For more advanced usage - I suggest you try Dompurify
It's cross-browser (and supports Node js). only 19kb gziped
Here is a fiddle I've created that converts HTML to text
const dirty = "Hello <script>in script<\/script> <b>world</b><p> Many other <br/>tags are stripped</p>";
const config = { ALLOWED_TAGS: [''], KEEP_CONTENT: true, USE_PROFILES: { html: true } };
// Clean HTML string and write into the div
const clean = DOMPurify.sanitize(dirty, config);
document.getElementById('sanitized').innerText = clean;
Input: Hello <script>in script<\/script> <b>world</b><p> Many other <br/>tags are stripped</p>
Output: Hello world Many other tags are stripped
Using the dom has several disadvantages. The one not mentioned in the other answers: Media will be loaded, causing network traffic.
I recommend using a regular expression to remove the tags after replacing certain tags like br, p, ol, ul, and headers into \n newlines.

Regex for visible text, not HTML

If i had a string:
hey user, what are you doing?
How, with regex could I say: look for user, but not inside of < or > characters? So the match would grab the user between the <a></a> but not the one inside of the href
I'd like this to work for any tag, so it wont matter what tags.
== Update ==
Why i can't use .text() or innerText is because this is being used to highlight results much like the native cmd/ctrl+f functionality in browsers and I dont want to lose formatting. For example, if i search for strong here:
Some <strong>strong</strong> text.
If i use .text() itll return "Some strong text" and then I'll wrap strong with a <span> which has a class for styling, but now when I go back and try to insert this into the DOM it'll be missing the <strong> tags.
If you plan to replace the HTML using html() again then you will loose all event handlers that might be bound to inner elements and their data (as I said in my comment).
Whenever you set the content of an element as HTML string, you are creating new elements.
It might be better to recursively apply this function to every text node only. Something like:
$.fn.highlight = function(word) {
var pattern = new RegExp(word, 'g'),
repl = '<span class="high">' + word + '</span>';
this.each(function() {
$(this).contents().each(function() {
if(this.nodeType === 3 && pattern.test(this.nodeValue)) {
$(this).replaceWith(this.nodeValue.replace(pattern, repl));
}
else if(!$(this).hasClass('high')) {
$(this).highlight(word);
}
});
});
return this;
};
DEMO
It could very well be that this is not very efficient though.
To emulate Ctrl-F (which I assume is what you're doing), you can use window.find for Firefox, Chrome, and Safari and TextRange.findText for IE.
You should use a feature detect to choose which method you use:
function highlightText(str) {
if (window.find)
window.find(str);
else if (window.TextRange && window.TextRange.prototype.findText) {
var bodyRange = document.body.createTextRange();
bodyRange.findText(str);
bodyRange.select();
}
}
Then, after you the text is selected, you can style the selection with CSS using the ::selection selector.
Edit: To search within a certain DOM object, you could use a roundabout method: use window.find and see whether the selection is in a certain element. (Perhaps say s = window.getSelection().anchorNode and compare s.parentNode == obj, s.parentNode.parentNode == obj, etc.). If it's not in the correct element, repeat the process. IE is a lot easier: instead of document.body.createTextRange(), you can use obj.createTextRange().
$("body > *").each(function (index, element) {
var parts = $(element).text().split("needle");
if (parts.length > 1)
$(element).html(parts.join('<span class="highlight">needle</span>'));
});
jsbin demo
at this point it's evolving to be more and more like Felix's, so I think he's got the winner
original:
If you're doing this in javascript, you already have a handy parsed version of the web page in the DOM.
// gives "user"
alert(document.getElementById('user').innerHTML);
or with jQuery you can do lots of nice shortcuts:
alert($('#user').html()); // same as above
$("a").each(function (index, element) {
alert(element.innerHTML); // shows label text of every link in page
});
I like regexes, but because tags can be nested, you will have to use a parser. I recommend http://simplehtmldom.sourceforge.net/ it is really powerful and easy to use. If you have wellformed xhtml you can also use SimpleXML from php.
edit: Didn't see the javascript tag.
Try this:
/[(<.+>)(^<)]*user[(^>)(<.*>)]/
It means:
Before the keyword, you can have as many <...> or non-<.
Samewise after it.
EDIT:
The correct one would be:
/((<.+>)|(^<))*user((^>)|(<.*>))*/
Here is what works, I tried it on your JS Bin:
var s = 'hey user, what are you doing?';
s = s.replace(/(<[^>]*)user([^<]>)/g,'$1NEVER_WRITE_THAT_ANYWHERE_ELSE$2');
s = s.replace(/user/g,'Mr Smith');
s = s.replace(/NEVER_WRITE_THAT_ANYWHERE_ELSE/g,'user');
document.body.innerHTML = s;
It may be a tiny little bit complicated, but it works!
Explanation:
You replace "user" that is in the tag (which is easy to find) with a random string of your choice that you must never use again... ever. A good use would be to replace it with its hashcode (md5, sha-1, ...)
Replace every remaining occurence of "user" with the text you want.
Replace back your unique string with "user".
this code will strip all tags from sting
var s = 'hey user, what are you doing?';
s = s.replace(/<[^<>]+>/g,'');

Categories