My regex is for finding certain words in text, and not words inside elemental text.
REGEXP
RegExp('\\b([^<(.*?)>(.?+)<\/(.*?)>])(' + wregex.join('|') + ')\\b(?=\\W)
EXAMPLE
This is some text that should be looked through
though this text <code>Should not be looked at </code> and this text is ok to
look at
So I'll explain my method of my regex Expression which I am having trouble with
([^<(.*?)>(.?+)<\/(.*?)>]) Do Not match any text that starts with <element> nothing inside here until this </element>
Thats the most important so I've tried multiple methods and not sure if this regex is possible. I don't want to match anything starting with a basic html element tag until the ending tag appears then start over searching.
EDIT
I know that RegEx shouldn't be used to parse HTML this is looking through TEXT
Testing Example HERE
Assuming that the text you are searching over is correctly formed (as in, no tag mismatches) the following regex should work:
^([^<]*<([^>]*)>[^<]*</\2>)*[^<]Your Text
This insures that you text is outside of an open and closed set of tags by matching all open and closed sets before getting to your text.
It won't work for nested tags. Regex is incapable of parsing arbitrarily nested tags.
However, please remember, you should not parse html with regex
Why crum everything in a single regex? It can be as simple as this. Notice that I'm using [^] instead of ., to also match newlines.
string.replace(/<[^]+?<\/[^]+?>/, '').match(/what i really want to find/gi)
And yes, this is prone to breakage, as any regex solution would be.
Related
I want to append a word after <body> tag, it should not modify/replace anything other than just append a word. I have done something like this, is it valid do empty parenthesis fir second capture group will match everything?
/(<body[^>]*>)()/, `$1${my_variable}$2`)
The second capture group, designed to capture nothing, will match "nothing" - it will form a match immediately after your closed body tag. There's nothing wrong with doing this for the regex, though you might want to be wary of using [^>]* - this negated character class will gladly match across lines and grab as much input as it can. Handy for matching multi-line tags, but often very dangerous.
Also, if you're on linux and for some reason have > symbols in filenames (which is valid!) your regex will break horribly, as shown here.
That being said, valid regex or not, it's usually a bad idea to use regex with html, since HTML isn't a regular language. Also, you could accidentally summon Cthulhu.
let page = "<html><body>Some info</body></html>";
page.replace("<body>", `<body>${my_variable}`);
or
page.replace(/<body>|<BODY>/, `<body>${my_variable}`);
If in the broweser you can also use document.querySelector("body").innerHTML
Also depending on which framework you're using there are better ways to accomplish this.
I have a little interesting issue here. I have a plaintext URL coming from Excel and I need to change it to an HTML URL with a unique body. Here is the regex code for javascript:
text = text.toString().replace(/=hyperlink\(([#\\\w\s\(\)-\.\/]+)\)/g, "<a href='file:///$1'>$1</a>");
This works perfectly fine for what it does. Example, text is:
=hyperlink("\\share\folder\log\2013\13-05-13\13-05-13.txt")
regex turns it into
\\share\folder\log\2013\13-05-13\13-05-13.txt
However, I need the inner HTML to be just the text file name:
13-05-13.txt
To further complicate the matter, the original text the regex is going through is not a single occurrence. It is an entire spreadsheet with 100's of rows that contain this. So the regex will be matching and replacing 100's of these strings in one operation.
Hopefully it is possible to get this all done in one regexp on the entire string, but I suppose I could loop through each line of the string first...
If there is no way to do this with one regex engine, what do you think the best approach is? (no PHP/Python/Server side. Just Javascript, HTML, Jquery, etc).
I guess you could use this regex:
=hyperlink\("([#\\\w\s\(\)\-\.\/]+\\([^"]+))"\)
And this new replace:
$2
I'm not sure how your regex was working, but I added the quotes in the regex and replaced the single quotes by double quotes in the replace. Revert those if need be.
Demo
I'm trying to build a text formatter that will add p and br tags to text based on line breaks. I currently have this:
s.replace(/\n\n/g, "\n</p><p>\n");
Which works wonderfully for creating paragraph ends and beginnings. However, trying to find instances isn't working so well. Attempting to do a matched group replacement isn't working, as it ignores the parenthesis and replaces the entire regex match:
s.replace(/\w(\n)\w/g, "<br />\n");
I've tried removing the g option (still replaced entire match, but only on first match). Is there another way to do this?
Thanks!
You can capture the parts you don't want to replace and include them in the replacement string with $ followed by the group number:
s.replace(/(\w)\n(\w)/g, "$1<br />\n$2");
See this section in the MDN docs for more info on referring to parts of the input string in your replacement string.
Catch the surrounding characters also:
s.replace(/(\w)(\n\w)/g, "$1<br />$2");
<script type="text/javascript">
var haystackText = document.getElementById("navigation").innerHTML;
var matchText = 'Subscribe to RSS';
var replacementText = '<ul><li>Some Other Thing Here</li></ul>';
var replaced = haystackText.replace(matchText, replacementText);
document.getElementById("navigation").innerHTML = replaced;
</script>
I'm attempting to try and replace a string of HTML code to be something else. I cannot edit the code directly, so I'm using Javascript to alter the code.
If I use the above method Matching Text on a regular string, such as just 'Subscribe to RSS', I can replace it fine. However, once I try to replace an HTML string, the code 'fails'.
Also, what if the HTML I wish to replace contains line breaks? How would I search for that?
<ul><li>\n</li></ul>
??
What should I be using or doing instead of this? Or am I just missing a small step? I did search around here, but maybe my keywords for the search weren't optimal to find a result that fit my situation...
Edit: Gonna mention, I'm writing this script in the footer of my page, well after the text I wish to replace, so it's not an issue of the script being written before what I want to overwrite to appear. :)
Currently you are using String.replace(substring, replacement) that will search for an exact match of the substring and replace it with the replacement e.g.
"Hello world".replace("world", "Kojichan") => "Hello Kojichan"
The problem with exact matches is that it doesn't allow anything else but exact matches.
To solve the problem, you'll have to start to use regular expressions. When using regular expression you have to be aware of
special characters such as ?, /, and \ that need to escaped \?, \/, \\
multiline mode /regexp/m
global matching if you want to replace more than one instance of the expression /regexp/g
closures for allowing multiple instances of white space \s+ for [1..n] white-space characters and \s* for [0..n] white-space characters.
To use regular expression instead of substring matching you just need to change String.replace("substring", "replacement") to String.replace(/regexp/, "replacement") e.g.
"Hello world".replace(/world/, "Kojichan") => "Hello Kojichan"
From MDN:
Note: If a <div>, <span>, or <noembed> node has a child text node that
includes the characters (&), (<), or (>), innerHTML returns these
characters as &, < and > respectively. Use element.textContent
to get a correct copy of these text nodes' contents.
So since textContent (or innerText) won't get you the HTML, you'd have to modify your search string appropriately.
You can use Regular Expressions.
Recommend to use Regular Expression. Notice that ? and / are special characters in Regular Expression. And for global multi-line matching, you need g and m flags set in the regular expression.
Regular expression matching of HTML (other than plain text) that comes out of a web page is a bad idea and is troublesome to make work cross browser (particularly in IE). The HTML that comes out of a web page does not always look the same as what was put in because some browser reconstitute the HTML and don't actually store what went in. Attributes can change order, quote marks can change or disappear, entities can change, etc...
If you want to modify whole tags, then you should directly access the DOM and operate on the actual objects in the page.
I want to replace all strings between, before and after <span style="[^"]+">.*?</span> tags that may be in string with span tags (no other HTML tags, just simple text)
I have regular expression like this.
var span_re = /^(.+)(?=<span)|(?=<\/span>)(.+)(?=<span)|(?=<\/span>)(.+)$/g;
str = str.replace(span_re, '<span>$1</span>');
for this string
'foo<span style="text-decoration:underline;">bar</span>baz'
I got
'<span>foo</span><span style="text-decoration:underline;">bar<span></span>'
I want this:
'<span>foo</span><span style="text-decoration:underline;">bar</span><span>baz</span>'
I also try using: .+?, .*? and instead .+ capturing whole expression with no result.
I don't need a parser I don't parse XHTML and don't have self-contained tags.
Parsing HTML using regex is seldom a good idea, particularity in the context of a web browser. Here's a simple example that gets what you want, using jQuery:
Even if that HTML isn't already inside the DOM, it is easy to wrap it in a dummy element:
var wrapper = $('<div />')
wrapper.html('foo<span style="text-decoration:underline;">bar</span>baz');
wrapper.contents()
.filter(
function(){return this.nodeType == 3;} //select text nodes only
)
.wrap('<span />');
As a bonus, that will work well with other tags, and even if you have several <span> tags with free text between them.
Working example: http://jsbin.com/acigu5/
You may be over complicating it. If you know you only may have a single <span>, no other tags and no unescaped > signs, you can use this simple regex:
s = s.replace(/^[^<]+|[^>]+$/gi, '<span>$&</span>');
This regex finds text before the tag (from the beginning, not <) or after the tag (not >, until the end), and wraps them with a <span>. $& in JavaScript regex replace stands for the whole match, or group 0 (on other flavors that may be \0 or $0).
Note that (?=<\/span>) from your original regex is a look-ahead, not a look-behind (JavaScript doesn't have look-behind. shame). That caused (.+) to match the closing tag and consume it, resulting in invalid HTML.
Working example: http://jsbin.com/acexu4/