How to remove content within the &lt and &gt javascript - javascript

I have a content that contains a string of elements along with images. ex:
var str= <p><img src=\"v\">fwefwefw</img></p><p><br></p><p><br></p>
the text that is within the &lt and &gt is a dirty tag and I would like to remove it along with the content that is within it. the tag is generated dynamically and hence could be any tag i.e <div>, <a>, <h1> etc....
the expected output : <p></p><p><br></p><p><br></p>
however with this code, im only able to remove the tags and not the content inside it.
str.replaceAll(/<.*?>/g, "");
it renders like this which is not what im looking for:
<p>fwefwefw</p><p><br></p><p><br></p><p><br></p>
how can I possibly remove the & tags along with the content so that I get rid of dirty tags and text inside it?
fiddle: https://jsfiddle.net/3rozjn8m/
thanks

A safe way is to use a DOM parser, visiting each text node, where then each text can be cleaned separately. This way you are certain the DOM structure is not altered; only the texts:
let str= "<p><img src=\"v\">fwefwefw</img></p><p><br></p><p><br></p>";
let doc = new DOMParser().parseFromString(str, "text/html");
let walk = doc.createTreeWalker(doc.body, 4, null, false);
let node = walk.nextNode();
while (node) {
node.nodeValue = node.nodeValue.replace(/<.*>/gs, "");
node = walk.nextNode();
}
let clean = doc.body.innerHTML;
console.log(clean);
This will also work when you have more than one <p> element that has such content.

Remove the question mark.
var str= "<p><img src=\"v\">fwefwefw</img></p><p><br></p><p><br></p>";
console.log(str.replaceAll(/<.*>/g, ""));

Related

Change start tag and end tag separately

I need to replace the flowing:
something
withthis:
(start-a)something(end-a)
I do not master the regex well, so the regex I have its just to change all tags completely.
string.replace(/(<([^>]+)>)/ig,"(start-a)");
Whats should I change in my regex for just replacing the first tag, and the other change to replace the ending tag?
Thanks.
Find all the tags you wish to replace.
Create a new text node based on each tag.
Insert new node.
Delete old tag.
// Find all tags to be replaced and put in an Array and loop through them
var tags = Array.prototype.slice.call(document.querySelectorAll("a")).forEach(function(tag){
// Find the parent element of the node
var parent = tag.parentNode;
// Create a new text node with the contents of the current element
var replaceString = document.createTextNode("(start-a)" + tag.textContent + "(end-a)");
// Insert the new text node where the current tag is
parent.insertBefore(replaceString, tag);
// Delete the old tag
parent.removeChild(tag);
});
<div>
something
something
something
something
something
</div>

How to use regex to replace text between tags

I'd like to replace some text in a string that represents a div tag that may or may not also include style and class attributes. For example,
var s = "<div style='xxx' class='xxx'>replaceThisText<div>
If it were just the tag, I believe I could just do this:
str = str.replace(/<div>[\s\S]*?<\/div>/, '<div>' + newText+ '<\/div>');
But how do I take the attributes into account?
Generate a temporary element with your string as HTML content then get the div within it to update content after updating the content get back the HTML of temporary element.
var s = "<div style='xxx' class='xxx'>replaceThisText<div>";
// create a temporary div element
var temp = document.createElement('div');
// set content as string
temp.innerHTML = s;
// get div within the temporary element
// and update the content within the div
temp.querySelector('div').innerHTML = 'newText';
// get back the current HTML content in the
// temporary div element
console.log(temp.innerHTML)
Why not regex?
RegEx match open tags except XHTML self-contained tags
Using regular expressions to parse HTML: why not?
Regex will never be a good decision to parse html content.
Consider the following short solution using DOMParser object(for browsers which support DOMParser implementation, see compatibility table):
var s = "<div style='xxx' class='xxx'>replaceThisText<div>",
tag = (new DOMParser()).parseFromString(s, 'text/html').querySelector('.xxx');
tag.textContent = 'newText'; // replacing with a new text
console.log(tag.outerHTML); // outputs the initial tag representation with replaced content
https://developer.mozilla.org/ru/docs/Web/API/DOMParser

replacing text from a paste when looping over html elements

I am trying to replace html links (and eventually other elements) with bbcode when a user does a paste from a document (like gdocs or libre office). So we are dealing with rich html already formatted (which is why it needs to copy HTML and not text).
Essentially, I want to be able to copy stuff pre-written from a document into a textarea on my website without having to manually write BBCode tags in the original document (as it's messy for proof-reading).
Thanks to the help here Adjust regex to ignore anything else inside link HTML tags I have gotten mostly there, but I am stuck on replacing the found tags with the original text.
Here's what I have:
function fragmentFromString(strHTML) {
return document.createRange().createContextualFragment(strHTML);
}
$('textarea').on('paste',function(e) {
e.preventDefault();
var text = (e.originalEvent || e).clipboardData.getData('text/html') || prompt('Paste something..');
var fragment = fragmentFromString(text);
var aTags = Array.from(fragment.querySelectorAll('a'));
aTags.forEach(a => {
text = text.replace(a, "[url="+a.href+"]"+a.textContent+"[/url]");
});
window.document.execCommand('insertText', false, text);
});
You can see it loops over the found a tags and I am essentially trying to replace them from the original text with the new stuff.
Here's an example of the type of content that could be pasted (this is a single link from google docs):
<span style="font-size:14.666666666666666px;font-family:Arial;color:#1155cc;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap;">Link test</span>
Expected to be replaced with:
[url=https://www.test.com]Link test[/url]
So I want that HTML replaced, with the BBCode within the original text that's then sent to the textarea from the paste.
The aTags foreach currently does nothing. You need to create a new text node, and replace the existing anchor tag with it.
aTags.forEach(a => {
var new_text = document.createTextNode("[url=" + a.href + "]" + a.textContent + "[/url]");
a.parentNode.insertBefore(new_text, a);
a.parentNode.removeChild(a);
});
window.document.execCommand('insertText', false, text.innerText);
This will replace every a tag into the given text.

How can I Strip all regular html tags except <a></a>, <img>(attributes inside) and <br> with javascript?

When a user create a message there is a multibox and this multibox is connected to a design panel which lets users change fonts, color, size etc.. When the message is submited the message will be displayed with html tags if the user have changed color, size etc on the font.
Note: I need the design panel, I know its possible to remove it but this is not the case :)
It's a Sharepoint standard, The only solution I have is to use javascript to strip these tags when it displayed. The user should only be able to insert links, images and add linebreaks.
Which means that all html tags should be stripped except <a></a>, <img> and <br> tags.
Its also important that the attributes inside the the <img> tag that wont be removed. It could be isplayed like this:
<img src="/image/Penguins.jpg" alt="Penguins.jpg" style="margin:5px;width:331px;">
How can I accomplish this with javascript?
I used to use this following codebehind C# code which worked perfectly but it would strip all html tags except <br> tag only.
public string Strip(string text)
{
return Regex.Replace(text, #"<(?!br[\x20/>])[^<>]+>", string.Empty);
}
Any kind of help is appreciated alot
Does this do what you want? http://jsfiddle.net/smerny/r7vhd/
$("body").find("*").not("a,img,br").each(function() {
$(this).replaceWith(this.innerHTML);
});
Basically select everything except a, img, br and replace them with their content.
Smerny's answer is working well except that the HTML structure is like:
var s = '<div><div>Link<span> Span</span><li></li></div></div>';
var $s = $(s);
$s.find("*").not("a,img,br").each(function() {
$(this).replaceWith(this.innerHTML);
});
console.log($s.html());
The live code is here: http://jsfiddle.net/btvuut55/1/
This happens when there are more than two wrapper outside (two divs in the example above).
Because jQuery reaches the most outside div first, and its innerHTML, which contains span has been retained.
This answer $('#container').find('*:not(br,a,img)').contents().unwrap() fails to deal with tags with empty content.
A working solution is simple: loop from the most inner element towards outside:
var $elements = $s.find("*").not("a,img,br");
for (var i = $elements.length - 1; i >= 0; i--) {
var e = $elements[i];
$(e).replaceWith(e.innerHTML);
}
The working copy is: http://jsfiddle.net/btvuut55/3/
with jQuery you can find all the elements you don't want - then use unwrap to strip the tags
$('#container').find('*:not(br,a,img)').contents().unwrap()
FIDDLE
I think it would be better to extract to good tags. It is easy to match a few tags than to remove the rest of the element and all html possibilities. Try something like this, I tested it and it works fine:
// the following regex matches the good tags with attrinutes an inner content
var ptt = new RegExp("<(?:img|a|br){1}.*/?>(?:(?:.|\n)*</(?:img|a|br){1}>)?", "g");
var input = "<this string would contain the html input to clean>";
var result = "";
var match = ptt.exec(input);
while (match) {
result += match;
match = ptt.exec(input);
}
// result will contain the clean HTML with only the good tags
console.log(result);

Replace an HTML element with some HTML contained in a string

Let's say I have this portion of HTML document:
<div>hello world <span id="test"></span></div>
In straight JavaScript, I need to replace the span with some HTML content contained in a string like '<span>other</span> yo google'
So the end result be like:
<div>hello world <span>other</span> yo google</div>
The problem I'm facing is that the HTML string can contain any number of tags at its "root". So it is not a 1 to 1 replacement of tags.
I need to do that in straight JavaScript (no jQuery).
If anyone can help!
Thanks
What is the reason you can't just set the innerHTML of <span id="test">? There's no harm in having the extra span...
If you really need to remove the outer span, you can just insert all the childNodes before it.
var html = '<span>other</span> yo google';
var removeMe = document.getElementById('test');
removeMe.innerHTML = html;
var child;
while(child = removeMe.childNodes[0]) {
removeMe.parentNode.insertBefore(child, removeMe);
}
removeMe.parentNode.removeChild(removeMe);
See http://jsfiddle.net/4tLVC/1/
can't you use
var span = document.getElement...('test')
div.getElementById('yourDiv').removeChild(span)
or actually you can do
span.parentNode.removeChild(span)
this should work to.
After that
div.innerHTML += 'your cool <span>content></span>'

Categories