RegEx for matching style tag - javascript

I have an HTML code that contain CSS code inside tag under the header tag. I want to use regex to extract all text in HTML, only pure text (between HTML tags ). I tried,
console.log(HTML_TEXT.replace(/(<([^>]+)>)/g, ""))
which replace every thing between <> by empty char, the problem is the CSS code inside STYLE tag is still there, so i want to know how to write the regular expression to remove CSS code inside tags.
How do I solve this problem?

This RegEx might help you to do so:
(\>)(.+)(<\/style>)
It creates a right boundary in a capturing group: (<\/style>)
It has a left boundary in another capturing group: (\>), which you can add additional boundaries to it, if you wish/necessary
Then, it has a no-boundary middle capturing group, (.+), where your target is located, and you can call it using $2 and replace it with an empty string, or otherwise.
I'm not so sure, did not test it, but your code might look like something similar to:
console.log(HTML_TEXT.replace(/(\>)(.+)(<\/style>)/g, '\\$1\\$3'))
This post explains how to do a string replace in JavaScript.
Edit:
Based on the comment, this RegEx might help you to filter your tags using $1:
(\<style type=\"text\/css\"\>)([\s\S]*)(\<\/style\>)

Related

Select single text character from body element with regular expression

Using regex javascript or Jquery, how do i select a lone character or in the body element? We have noticed that there's a random "s" at the bottom of our site, after the footer. We have no idea where it came from -it's really bizarre because it's in the body element, but it's not tied to any child element. it's just floating there in between some script tags.
screenshot of the 's' after the footer
here is the 's' in the DOM in between scripts
It might be from a plugin - but in the meantime, we were hoping we could hide this until we find the culprit.
I'm trying to select the element first with this in the console, but I've had no success.
jQuery('body').filter( function(){ return /([s]$)/g.test( jQuery(this).text() ) } )
However, I can't select this element.. does anyone know how I can do this? Thanks so much
If the s is the the start of the string after the closing script tag and is the only text on that line, you could use a capturing group and use that group $1 in the replacement:
(<\/script>\s*\n)s$
( Capturing group
<\/script>\s\n Match closing script tag followed by 0+ times a whitespace char and a newline
) Close capturing group
s Match s
$ End of string
Regex demo
const s = `<script type="text/javascript"><\/script>
s`;
console.log(s.replace(/(<\/script>\s*\n)s$/, "$1"));
If your app is php, try to search something like this in the beginning of the php file.
This is a piece of code that will be just echoed, not executed.
s<?php

Regex - how to replace css when not inside of html

Alright, so I've been looking around for quite a while trying to figure out how to get this to work out. So what I'm trying to do is replace anything in strings that looks like this:
foo: bar;
But only if its not inside something like this:
<div style='foo: bar; ofoo: obar'>
So the basic idea is that I want to replace css when its not inside html style attributes. I understand that you can use a for loop and check it but I would like to do this with just the regex replace.
I'm using JavaScript Regex heres what my code attempt currently looks like:
\b(.*?):(|\s)(.*?);
https://regex101.com/r/LWohvu/1
Notes:
I understand that you could use a ^ to check if it starts with it but that only works for the first line.
If I didn't cover any needed any information please feel free to comment!
According to your description, you want to replace all style in your html page except those are inside of a html tag. I've updated your regex and this worked according to your need. Please check this.
Regex:
^(?!(\=|\<))(.*?):(.*?);
Regex in JavaScript:
/^(?!(\=|\<))(.*?):(.*?);/gm
All style start with style= if this exists inside of a html tag. So, I've tried to avoid those using ^(?!(\=|\<)). This represent not start with = and <. Avoid = for style and < for html tag.
Please check this in Updated Regex.

How do I allow <img> and <a> tags for innerHTML, but no others? (Making a forum)

I am currently programming a forum using only javascript (No JQuery please). I am doing very well, however, there is one issue I would love help with.
Currently I am getting the post from a database, assigning it to variable MainPost, and then attaching it to a div via a text node:
var theDiv = document.getElementById("MainBody");
var content = document.createTextNode(MainPost);
theDiv.appendChild(content);
This is working quite well, however, I would LOVE to be able to do this:
document.getElementById("MainBody").innerHTML += MainPost;
But I know this would allow people to use ANY html tag they want, even something like "script" followed by javascript code. This would be bad for business, obviously, but I do like the idea of allowing posters to use the "img" tag as well as the "a href" tags. Is there a way to somehow disable all tags except these two for the innerHTML?
Thank you all so much for any help you can offer.
Ok, the first thought that came to my mind when I read this question was to find a regular expression to exclude a specific string in a word. Simple search gave a lot of results from SO.
Starting point - To remove all the HTML tags from a string (from this answer):
var regex = /(<([^>]+)>)/ig
, body = "<p>test</p>"
, result = body.replace(regex, "");
console.log(result);
To exclude a string you would do something like this (again from all the source mentioned above):
(?!StringToBeExcluded)
Since you want to exlcude the <a href and <img tags. The suitable regex in your case could be:
(<(?![\/]?a)(?![\/]?img)([^>]+)>)
Explanation :
Think of it as three capturing groups in succession:
(?![\/]?a) : Negative Lookahead to assert that it is impossible to match the regex containing the string "a" prefixed by zero or one backslashes (Should take care of the a href tags)
(?![\/]?img) : Same as 1, just here it looks for the string "img". I don't know why I allowed the </img> tag. Yes, <img> doesn't have a closing tag. You could remove the [\/]? bit from it to fix this.
([^>]+) : Makes sure to not match > zero or one times to take care of tags that have opening and closing tags.
Now all these capture groups lie between < and >. You might want to try a regex demo that I've created incorporating these three capture groups to take care of ignoring all HTML elements except the image and link tags.
Sidenote - I haven't thoroughly given this regex a try. Feel free to play around with it and tweak it according to your needs. In any case, I hope this gets you started in the right direction.

Removing html line breaks using Javascript

I'm trying to grab an element's HTML using jQuery and then post it to the server. I successfully grabbed it, but I am not able to remove the white space between the tags and the line breaks that are rendered by default. The HTML code grabbed is shown below:
<table><tbody><tr><th></th><th>1</th><th>2</th><th>3</th><th>4</th><th>5</th><th>6</th><th>7</th></tr>
<tr><th>2nd row</th><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr></tbody></table>
I would like to trim the spaces between the tags only. I've used this regular expression: str.replace(/\s+/g, ' ');. But that doesn't seem to work, any suggestions?
Currently, you are replacing all consecutive sequences of whitespace with a single space.
This is what you want:
str.replace(/>\s+</g, '><');
I need to add escape character(backslash - ) character at the end of each line to wrap the string.

Whitelist javascript to strip html tags

I have modified a whitelist javascript regex that strip unwanted tags.
I am trying to allow this code:
<span style="color: #000000"></span>
but I am unable to do it in regex.
Bellow is what is have so far:
(/<(?!(br|\/br|p|\/p|b|\/b|u|\/u|ol|\/ol|ul|\/ul|li|\/li))([^>])+>/gi
Thanks
Works for me as well - unless there is more that you are trying to do - e.g. if there is any content between the tags, or if you want to match the opening and closing tag in the same run - then post the example in your question.
BTW: the regex can be simplified a little the following way:
<(?!((?:\/\s*)?(?:br|p|b|u|[o|i]l|li)))([^>])+>
(?:\/\s*)? - an optional slash
(?:br|p|b|u|[o|i]l|li) - followed by any of these tags
UPDATE:
Here's my last try:
if you want to match all the other tags use this
<(?!(?:\/\s*)?(?:br|p|b|[o|u]l|li|span)(?:\s*style='color: #[A-Fa-f0-9]+'))([^>])*>
if you want to match the tags with color use this
<((?:\/\s*)?(?:br|p|b|[o|u]l|li|span)(?:\s*style='color: #[A-Fa-f0-9]+'))([^>])*>
this works for me (no parenthesis at the beginning):
/<(?!(br|\/br|p|\/p|b|\/b|u|\/u|ol|\/ol|ul|\/ul|li|\/li))([^>])+>/gi
I have developed tool with Source code. It'll strip all the tags with exception list proveded by user : try this HTML Tag Stripper

Categories