JavaScript conserve HTML entities as code when serializing - javascript

I have the following HTML:
<p>This contains an HTML space entity  .</p>
I need to serialize this HTML to text along with HTML entities as their existing code (spaces added to prevent SO from rendering literal characters):
< p >This contains an HTML space entity & #160;.< / p >
When serializing the HTML the HTML entities are rendered instead of converted to their code/text form:
new XMLSerializer().serializeToString(element)
I've looked in to other methods of converting HTML code to text including innerHTML though I haven't managed to determine any direct means to outputting the HTML code that exists without it being modified by the browser.
I'm also open to replacing HTML entities with a createTreeWalker if need be though I'd prefer a more direct approach. No frameworks. Suggestions please?

Please see this SO answer: https://stackoverflow.com/a/3700369/3218479.
You can use code like:
// Prepare element
var myEl = document.createElement("p");
myEl.innerText = "This contains an HTML space entity  .";
// Convert to string
var textArea = document.createElement("textarea");
textArea.innerHTML = myEl.outerHTML;
var myElText = textArea.innerText;
delete textArea;

Related

How to work around setting innerHTML causing escape sequences to expand?

I am trying to avoid a cross-site scripting vulnerability on my server. Before any user-inputted string is embedded within HTML or sent to client-side javascript code it is escaped ('<' replaced with '<', '&' replaced with '&', etc.) When embedding into HTML this works mostly fine; the HTML code produced does not contain any HTML elements inside the user-provided string. However, when the client-side javascript inserts HTML into the document, the escape sequences get expanded back into their special characters, which can result in user-inputted tags appearing in the document HTML. Here's approximately what I'm doing, javascript client-side:
// response_data received from XMLHttpRequest and parsed as JSON
var s = "";
for (var i = 0; i < response_data.length; ++i) {
s += "<p>";
s += response_data[i];
s += "</p>";
}
console.log(s);
elem.innerHTML = s;
Suppose the user inputted the string "abcde <script>alert("Hello!");</script>" earlier. Then response_data could be ["abcde <script>alert("Hello!");</script>"]. The print to console shows s to be "<p>abcde <script>alert("Hello!");</script></p>". However, when I assign elem.innerHTML, I can see in Inspect Element that the inner HTML of the element is actually <p>abcde <script>alert("Hello!");</script></p>! I don't think it executed, probably because of some browser security features regarding script tags within p tags, but it's obviously not very good. How do I work around this?
Code snippet (run and inspect element over the text created, it shows a script tag within the p tag):
var div_elem = document.querySelector("div");
div_elem.innerHTML = "<p><script>alert("Hello!");</script></p>";
<html>
<head></head>
<body>
<div></div>
</body>
</html>
Use innerText, it's like innerHTML but it's treated as pure text and won't decode the HTML entities.
Edit:
Set innerHTML to the p tags, then set the actual text using innerText on the tag
elem.innerHTML = "<p></p>";
elem.childNodes[0].innerText = s;

How to use regex to replace text between tags

I'd like to replace some text in a string that represents a div tag that may or may not also include style and class attributes. For example,
var s = "<div style='xxx' class='xxx'>replaceThisText<div>
If it were just the tag, I believe I could just do this:
str = str.replace(/<div>[\s\S]*?<\/div>/, '<div>' + newText+ '<\/div>');
But how do I take the attributes into account?
Generate a temporary element with your string as HTML content then get the div within it to update content after updating the content get back the HTML of temporary element.
var s = "<div style='xxx' class='xxx'>replaceThisText<div>";
// create a temporary div element
var temp = document.createElement('div');
// set content as string
temp.innerHTML = s;
// get div within the temporary element
// and update the content within the div
temp.querySelector('div').innerHTML = 'newText';
// get back the current HTML content in the
// temporary div element
console.log(temp.innerHTML)
Why not regex?
RegEx match open tags except XHTML self-contained tags
Using regular expressions to parse HTML: why not?
Regex will never be a good decision to parse html content.
Consider the following short solution using DOMParser object(for browsers which support DOMParser implementation, see compatibility table):
var s = "<div style='xxx' class='xxx'>replaceThisText<div>",
tag = (new DOMParser()).parseFromString(s, 'text/html').querySelector('.xxx');
tag.textContent = 'newText'; // replacing with a new text
console.log(tag.outerHTML); // outputs the initial tag representation with replaced content
https://developer.mozilla.org/ru/docs/Web/API/DOMParser

How can i get the string including HTML element after highlighting the text in HTML using Javascript?

Say, I highlighted this text The Title is Superman and Batman in my page.
How can i get the text including it's HTML element?
Based on my example, I should get this:
The Title is <i>Superman</i> and <i>Batman</i>
Use jquery html selector to get the value with HTML selector.
HTML:
<div id="test">this is my <i>test</i></div>
JS:
$('#test').html()
You should take the values adding a class or id.
HTML:
<div class="test"><i>Superman</i></div>
<div class="test"><i>Batman</i></div>
JS:
$('.test').html()
Live Example
Since everyone is requiring OP to use jQuery, here's the native JS equivalent. You can select the html content of an element like so :
var html = document.getElementById('text-container').innerHTML;
You might want to redisplay all the HTML from the container as different values, eg. as HTML markup, as text, as HTML-encoded text. With that I mean HTML entities (eg. > for > (greater than sign)). Here are the methods for displaying different types of output each time:
Here's a variable for the subsequent code:
var target = document.getElementById('text-output'); // for later
1. HTML in a container element
Output: Rendered HTML
Javascript:
target.innerHTML = html;
2. Text in a container element
Output: Text, HTML entities encoded
Javascript:
// will automatically encode HTML entities
var text = document.createTextNode(html);
target.innerHTML = text;
3. HTML in a textarea element
Output: Text, HTML entities non-encoded
Javascript:
yourTextArea.value = html;
4. Text in a textarea element
Output: Text, HTML entities encoded
Javascript:
// The virtual container automatically encodes entities when its .innerHTML
// method is called after appending a textnode.
var virtualContainer = document.createElement('div');
var text = document.createTextNode(html);
virtualContainer.appendChild(text);
yourTextArea.value = virtualContainer.innerHTML;
Demo: http://jsbin.com/mozibezi/1/edit
PS: It is impossible to display the output from #4 in a non-form input.

How can I Strip all regular html tags except <a></a>, <img>(attributes inside) and <br> with javascript?

When a user create a message there is a multibox and this multibox is connected to a design panel which lets users change fonts, color, size etc.. When the message is submited the message will be displayed with html tags if the user have changed color, size etc on the font.
Note: I need the design panel, I know its possible to remove it but this is not the case :)
It's a Sharepoint standard, The only solution I have is to use javascript to strip these tags when it displayed. The user should only be able to insert links, images and add linebreaks.
Which means that all html tags should be stripped except <a></a>, <img> and <br> tags.
Its also important that the attributes inside the the <img> tag that wont be removed. It could be isplayed like this:
<img src="/image/Penguins.jpg" alt="Penguins.jpg" style="margin:5px;width:331px;">
How can I accomplish this with javascript?
I used to use this following codebehind C# code which worked perfectly but it would strip all html tags except <br> tag only.
public string Strip(string text)
{
return Regex.Replace(text, #"<(?!br[\x20/>])[^<>]+>", string.Empty);
}
Any kind of help is appreciated alot
Does this do what you want? http://jsfiddle.net/smerny/r7vhd/
$("body").find("*").not("a,img,br").each(function() {
$(this).replaceWith(this.innerHTML);
});
Basically select everything except a, img, br and replace them with their content.
Smerny's answer is working well except that the HTML structure is like:
var s = '<div><div>Link<span> Span</span><li></li></div></div>';
var $s = $(s);
$s.find("*").not("a,img,br").each(function() {
$(this).replaceWith(this.innerHTML);
});
console.log($s.html());
The live code is here: http://jsfiddle.net/btvuut55/1/
This happens when there are more than two wrapper outside (two divs in the example above).
Because jQuery reaches the most outside div first, and its innerHTML, which contains span has been retained.
This answer $('#container').find('*:not(br,a,img)').contents().unwrap() fails to deal with tags with empty content.
A working solution is simple: loop from the most inner element towards outside:
var $elements = $s.find("*").not("a,img,br");
for (var i = $elements.length - 1; i >= 0; i--) {
var e = $elements[i];
$(e).replaceWith(e.innerHTML);
}
The working copy is: http://jsfiddle.net/btvuut55/3/
with jQuery you can find all the elements you don't want - then use unwrap to strip the tags
$('#container').find('*:not(br,a,img)').contents().unwrap()
FIDDLE
I think it would be better to extract to good tags. It is easy to match a few tags than to remove the rest of the element and all html possibilities. Try something like this, I tested it and it works fine:
// the following regex matches the good tags with attrinutes an inner content
var ptt = new RegExp("<(?:img|a|br){1}.*/?>(?:(?:.|\n)*</(?:img|a|br){1}>)?", "g");
var input = "<this string would contain the html input to clean>";
var result = "";
var match = ptt.exec(input);
while (match) {
result += match;
match = ptt.exec(input);
}
// result will contain the clean HTML with only the good tags
console.log(result);

string search in body.html() not working

Hi here is my total work to search a string in HTML and highlight it if it is found in document:
The problem is here
var SearchItems = text.split(/\r\n|\r|\n/);
var replaced = body.html();
for(var i=0;i<SearchItems.length;i++)
{
var tempRep= '<span class="highlight" style="background-color: yellow">';
tempRep = tempRep + SearchItems[i];
tempRep = tempRep + '</span>';
replaced = replaced.replace(SearchItems[i],tempRep); // It is trying to match along with html tags...
// As the <b> tags will not be there in search text, it is not matching...
}
$("body").html(replaced);
The HTML I'm using is as follows;
<div>
The clipboardData object is reserved for editing actions performed through the Edit menu, shortcut menus, and shortcut keys. It transfers information using the system clipboard, and retains it until data from the next editing operation replace s it. This form of data transfer is particularly suited to multiple pastes of the same data.
<br><br>
This object is available in script as of <b>Microsoft Internet Explorer 5.</b>
</div>
<div class='b'></div>
If I search for a page which is pure or without any html tags it will match. However, if I have any tags in HTML this will not work.. Because I am taking body html() text as the target text. It is exactly trying to match along with html tags..
In fiddle second paragraph will not match.
First of all, to ignore the HTML tags of the element to look within, use the .text() method.
Secondly, in your fiddle, it wasn't working because you weren't calling the SearchQueue function on load.
Try this amended fiddle

Categories