How can I exclude a class inside a regex string? - javascript

I'm currently trying to build a regex which replaces all HTML tags inside a string, excluding a special element. The problem is that I've found no way excluding the closing tag of the special element also. This is my code:
let str = 'You have to pay <div class="keep-this">$200</div> per <span class="date">month</span> for your <span class="vehicle">car</span> <div class="keep-this">$500</div> also';
console.log(str.replace(/(?!<div class="keep-this">)(<\/?[^>]+(>|$))/g, ""));
How can I fix this?

Try this option, which matches all HTML tags, excluding those tags which have the attribute class="keep-this".
let str = 'You have to pay <input class="some-class"/> blah <div class="keep-this">$200</div> per <span class="date">month</span> for your <span class="vehicle">car</span> <div class="keep-this">$500</div> also';
console.log(str.replace(/<\s*([^\s>]+)(?:(?!\bclass="keep-this")[^>])*>(.*?)(?:<\/\1>)|<\s*([^\s>]+)(?:(?!\bclass="keep-this")[^>])*\/>/g, "$2"));
Here is an explanation of the regex pattern:
< match < of an opening tag
\s* optional whitespace
([^\s>]+) match and capture the HTML tag name in $1 (\1)
(?:(?!\bclass="keep-this")[^>])* match remainder of tag,
so long as class="keep-this" is not seen
> match > of an opening tag
(.*?) match and capture the tag's content in $2,
until hitting the nearest
(?:<\/\1>) closing tag, which matches the opening one
| OR
<\s*([^\s>]+) match a standalone tag e.g. <input/>
(?:(?!\bclass="keep-this")[^>])* without a closing tag
\/> which matches
Then, we simply replace all such matches with empty string, to effectively remove them.

If you want to remove all the html elements that do not have the class keep-this you might also make use of DOMParser and for example use :not.
let str = 'You have to pay <div class="keep-this">$200</div> per <span class="date">month</span> for your <span class="vehicle">car</span> <div class="keep-this">$500</div> also';
let parser = new DOMParser();
let doc = parser.parseFromString(str, "text/html");
doc.querySelectorAll("body *:not(.keep-this)").forEach(e => e.replaceWith(e.innerHTML));
console.log(doc.body.innerHTML);

Related

Keep an element by id in a html string using javascript or jquery

I have a html string that contains:
<div class="infodiv">
<p id="111"><span class="text">111</span></p>
<p id="222"><span class="text">222</span></p>
<p id="333"><span class="text">333</span></p>
</div>
I will put it on the infowindow content on Google Maps.
However, I want to remove some elements by id before showing on the infowindow.
For example: I want to keep only an element with id=111.
So, my html string will only show:
<div class="infodiv">
<p id="111"><span class="text">111</span></p>
</div>
Any ideas how can I achieve it?
Thanks
If you only want to keep a single child of .infodiv, :not is a good choice. Otherwise, take a look at the filter() method.
Can you use the :not css selector in jQuery after using jQuery.parseHTML()?
$(".infodiv p:not(#111)").remove();
console.log($(".infodiv").html())
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div class="infodiv">
<p id="111"><span class="text">111</span><span></span></p>
<p id="222"><span class="text">222</span><span></span></p>
<p id="333"><span class="text">333</span><span></span></p>
</div>
Try This:-
$('.infodiv p:not(#111)').remove();
An id of an HTML element cannot begin with digits [0-9]
6 Basic HTML data types
ID and NAME tokens must begin with a letter ([A-Za-z]) and may be
followed by any number of letters, digits ([0-9]), hyphens ("-"),
underscores ("_"), colons (":"), and periods (".").
Substitute the first digit for a letter, for example "a" at id attribute at HTML.
To convert the HTML string to DOM elements you can set the HTML at .innerHTML of <template> element.
Create an array of last portion of id's that should be removed from HTML string, iterate .children of parent element, call .removeChild() with .querySelector() and attribute ends with selector, use .replace() to remove unnecessary newline characters from resulting HTML string.
const html = `<div class="infodiv">
<p id="111"><span class="text">111</span></p>
<p id="222"><span class="text">222</span></p>
<p id="333"><span class="text">333</span></p>
</div>`;
const template = document.createElement("template");
template.innerHTML = html;
const p = template.content.children[0];
const not = ["222", "333"];
for (let id of not) {
p.removeChild(p.querySelector(`[id="${id}"]`))
}
let parsedHTML = template.innerHTML.replace(/\n/g, "");
console.log(parsedHTML);

Include HTML tag but just include list of characters

I have string text (for example in Russian) with HTML tags
I need to get all word with a JavaScript RegEx and exclude HTML tags
This is my RegEx
reg = /([^\r\n\t\f>< /]+(?!>))\b/g;
For example, in Russian, I need to keep all HTML tags in my string text but keep all work in Russian ( [\wа-я]+)
Its is possible to exclude & include some things in JavaScript RegEx?
I would not try to parse HMTL with regexp. Instead, get the innerText property of the DOM node:
HTML:
<div id="myRussianText">
Lorem <span>ipsum</span>
</div>
JS:
var el = document.getElementById('myRussianText');
var text = el.innerText; // 'Lorem ipsum'
https://jsfiddle.net/cn0np3yf/

replace text, but ignore a certain string inside the document

I have a big string: Hello <span class="ashakd">my</span> name is <bob>!
I have a second string: llo my name
I have what i want to replace it with:<span class="ashakd">llo my name</span>
I need to replace() it as if the <span class="ashakd"> and </span> didnt exist, but they are replaced with the string so the final result is: He<span class="ashakd">llo my name</span> is <bob>!
PS: <bob> exists so you cant ignore any text between two >'s it must specifically ignore <span class="ashakd"> and </span>
very sorry if this is confusing. ask me to make it clearer if this is confusing
edit
sorry for being unclear, but it must only replace the within my replace. so if the original string was: Hello <span class="ashakd">my</span> name is <bob><span class="ashakd">hello</span>!
the result would be: He<span class="ashakd">llo my name</span> is <bob><span class="ashakd">hello</span>!
This may be too destructive to the original string, but I propose this solution:
var a = 'Hello <span class="ashakd">my</span> name is <bob>!';
var searchString = 'llo my name';
// remove all <span> and </span> tags, you may not want to remove any and all span tags???
a = a.replace(/<\/?span[^>]*?>/g,'');
a = a.replace(searchString,"<span class='ashakd'>"+searchString+"</span>");
What this does is remove all span tags, then search for your "llo my name" search string, and wrap that with a span tag.
Since you said you don't know regex that well, here's a description of:
/<\/?span[^>]*?>/g
<\/? means match on '<' and then optionally a /. This matches both the start and end tags, i.e. <span...> and </span>
[^>]*? means match any character that is NOT > in a non-greedy fashion, i.e. stop matching at the first > found.
The final /g means 'global', which means match <span> and </span> as many times as possible.

Javascript regexp replace of multiline content between two tags (including the tags)

In the string
some text <p id='item_1' class='item'>multiline content\r\n\r\n for <br/>remove</p><br clear='all' id='end_of_item_1'/><p id='item_2' class='item'>another multiline content\r\n\r\n</p><br clear='all' id='end_of_item_2'/>
I need to remove
<p id='item_1' class='item'>multiline content\r\n\r\n for <br/>remove</p><br clear='all' id='end_of_item_1'/>
Can't find a way how to do it.
var id = 'item_1';
var patt=new RegExp("<p id='"+id+"'(.)*|([\S\s]*?)end_of_"+id+"'\/>","g");
var str="some text <p id='item_1' class='item'>multiline content\r\n\r\n for <br/>remove</p><br clear='all' id='end_of_item_1'/><p id='item_2' class='item'>another multiline content\r\n\r\n</p><br clear='all' id='end_of_item_2'/>";
document.write(str.replace(patt,""));
The result is
some text for
<br>
remove
<p></p>
<br id="<p id=" class="item" clear="all" item_2'="">
another multiline content
<p></p>
<br id="end_of_item_2" clear="all">
Please help to solve this.
Here's the regex for the current scenario. When the regex approach eventually breaks, remember that we warned that parsing HTML with regex was a fool's errand. ;)
This:
var s = "some text <p id='item_1' class='item'>multiline content\r\n\r\n for <br/>remove</p><br clear='all' id='end_of_item_1'/><p id='item_2' class='item'>another multiline content\r\n\r\n</p><br clear='all' id='end_of_item_2'/><ul><li>";
var id = 'item_1';
var patt = new RegExp ("<p[^<>]*\\sid=['\"]" + id + "['\"](?:.|\\n|\\r)*<br[^<>]*\\sid=['\"]end_of_" + id + "['\"][^<>]*>", "ig")
var stripped = s.replace (patt, "");
Produces this:
"some text <p id='item_2' class='item'>another multiline content
</p><br clear='all' id='end_of_item_2'/><ul><li>"
Why can't you use the DOM API to remove it? (add everything to the document, and then remove what you don't need)
var item1 = document.getElementById('item_1'),
endOfItem1 = document.getElementById('end_of_item_1');
item1.parentNode.removeChild(item1);
endOfItem1.parentNode.removeChild(endOfItem1);
I need to assume a bit of unspoken constraints from your question, to get this to work:
Am I right in guessing, that you want a regex, that can find (and then replace) any 'p' tag with a specific id, up to a certain tag (like e.g. a 'br' tag) with an id of 'end_of_[firstid]'?
If that is correct, than the following regex might work for you. It may be, that you need to modify it a bit, to get JS to accept it:
<p\s+id='([a-zA-Z0-9_]+)'.*?id='end_of_\1'\s*\/>
This will give you any constellation with the criteria, describled above, and the name if the id as group 1, It should now be a simple task, to check if group1 contains the id you want to remove and then replace the whole match with an empty string.
If I understand your example correcty (I am not that good with JavaScript and my RegEx was based rather on the general perl-regex fashion) you could maybe do something like the following:
var patt=new RegExp("<p\s+id='"+id+"'.*?id='end_of_"+id+"'\s*\/>","g");
That way, you don't have to worry about group matching, although I find it to be more elegant, to match the id you wanted via a group instead of inserting it into the RegEx.

Regexp for content inside tags

I use Javascript
I have this:
<(div|span) class="search-result-(body-text|title)">(.*?)</(span|div)>
And i use is on this content:
<div class="search-result-item club">
<span class="search-result-type">Projekt</span
<span class="search-result-title">Titel</span>
<div class="search-result-body-text">
Body text
</div>
<div class="search-result-attributes">
<span class="search-result-attribute">Attribute</span>
</div>
</div>
My result is:
<span class="search-result-title">Titel</span>,
<div class="search-result-body-text">
Body text
</div>
Thats make sense, but how should my regexp look like so it strips the tags, so i only get: Titel, Body text
It is required by law that someone post a link to this: RegEx match open tags except XHTML self-contained tags which you should read and reconsider whether you really want to be parsing HTML using regular expressions.
However, what you want is the contents of the third () group in your match. The exec method of a JS regular expression object is an array containing the whole match at index 0, and the matches from all the groups at indices 1,2,... (in this case index 3 is what you need).
[NOTE: an earlier version of this answer had "first" and "1" instead of "third" and "3" above, because I misread your regexp. Sorry.]

Categories