I'm trying to build a text formatter that will add p and br tags to text based on line breaks. I currently have this:
s.replace(/\n\n/g, "\n</p><p>\n");
Which works wonderfully for creating paragraph ends and beginnings. However, trying to find instances isn't working so well. Attempting to do a matched group replacement isn't working, as it ignores the parenthesis and replaces the entire regex match:
s.replace(/\w(\n)\w/g, "<br />\n");
I've tried removing the g option (still replaced entire match, but only on first match). Is there another way to do this?
Thanks!
You can capture the parts you don't want to replace and include them in the replacement string with $ followed by the group number:
s.replace(/(\w)\n(\w)/g, "$1<br />\n$2");
See this section in the MDN docs for more info on referring to parts of the input string in your replacement string.
Catch the surrounding characters also:
s.replace(/(\w)(\n\w)/g, "$1<br />$2");
Related
Because of some poor forward thinking when building my search database, I'm left with some links in the format of: (Google Homepage)[http://google.com]
I've been trying to mess with regex in Javascript to convert the format above into a regular HTML link in the format ofGoogle Homepage.
I've been able to pick out the parentheses and brackets via regex, but am having trouble getting regex to replace the parenthesis and brackets with HTML as appropriate. Thanks!
This is going to be pretty straightforward. Basically, just make two capture groups. One capture group will have the text inside of the parenthesis and the other will have the URL inside the square braces.
\((.*?)\)\[(.*?)\]
#1 #2
Then, you can simply stick each captured part into your tag, like this:
\1
Here is a demo
This works for me
var str1="(Google Homepage)[http://google.com]";
var pattern=/\((.*)\)\[(.*)\]/;
var str2=str1.replace(pattern,"$1");
console.log(str2);
considering that ( ) and [ ] define the boundaries, you can try
\(([^\)]+)\).*\[([^\]]+)\]
\1 will be text and \2 will be link
I have this line of code
thePage.html(thePage.html().replace(/DECC([A-Z]{2}|[A-Z]{3})[A-Z]-[0-9]+-[0-9]+/g, '<a class="DeccDocumentId" onclick="TG.DECC.EDRMSLinks.redirectToDocument()">$1$2$3</a>'));
I want to replace the text found by regex with a link that still uses that text, ive already attempted (most likely incorrectly) a backreference in the form of $1$2$3 but its not working.
If its any help the text im trying to replace is
DECCMIA-1-1
DECCMIC-1-103
DECCFCSE-92-12
and it turns out to be like this
MI$2$3
MI$2$3
FCS$2$3
In replacing $& refers to the entire match, $1, $2,.. refer to individual capture groups inside the match. You use only one capture group in your expression.
/DECC([A-Z]{2}|[A-Z]{3})[A-Z]-[0-9]+-[0-9]+/g
([A-Z]{2}|[A-Z]{3}) // $1 refers to this capture group.
But you try to refer to three when replacing.
My regex is for finding certain words in text, and not words inside elemental text.
REGEXP
RegExp('\\b([^<(.*?)>(.?+)<\/(.*?)>])(' + wregex.join('|') + ')\\b(?=\\W)
EXAMPLE
This is some text that should be looked through
though this text <code>Should not be looked at </code> and this text is ok to
look at
So I'll explain my method of my regex Expression which I am having trouble with
([^<(.*?)>(.?+)<\/(.*?)>]) Do Not match any text that starts with <element> nothing inside here until this </element>
Thats the most important so I've tried multiple methods and not sure if this regex is possible. I don't want to match anything starting with a basic html element tag until the ending tag appears then start over searching.
EDIT
I know that RegEx shouldn't be used to parse HTML this is looking through TEXT
Testing Example HERE
Assuming that the text you are searching over is correctly formed (as in, no tag mismatches) the following regex should work:
^([^<]*<([^>]*)>[^<]*</\2>)*[^<]Your Text
This insures that you text is outside of an open and closed set of tags by matching all open and closed sets before getting to your text.
It won't work for nested tags. Regex is incapable of parsing arbitrarily nested tags.
However, please remember, you should not parse html with regex
Why crum everything in a single regex? It can be as simple as this. Notice that I'm using [^] instead of ., to also match newlines.
string.replace(/<[^]+?<\/[^]+?>/, '').match(/what i really want to find/gi)
And yes, this is prone to breakage, as any regex solution would be.
<script type="text/javascript">
var haystackText = document.getElementById("navigation").innerHTML;
var matchText = 'Subscribe to RSS';
var replacementText = '<ul><li>Some Other Thing Here</li></ul>';
var replaced = haystackText.replace(matchText, replacementText);
document.getElementById("navigation").innerHTML = replaced;
</script>
I'm attempting to try and replace a string of HTML code to be something else. I cannot edit the code directly, so I'm using Javascript to alter the code.
If I use the above method Matching Text on a regular string, such as just 'Subscribe to RSS', I can replace it fine. However, once I try to replace an HTML string, the code 'fails'.
Also, what if the HTML I wish to replace contains line breaks? How would I search for that?
<ul><li>\n</li></ul>
??
What should I be using or doing instead of this? Or am I just missing a small step? I did search around here, but maybe my keywords for the search weren't optimal to find a result that fit my situation...
Edit: Gonna mention, I'm writing this script in the footer of my page, well after the text I wish to replace, so it's not an issue of the script being written before what I want to overwrite to appear. :)
Currently you are using String.replace(substring, replacement) that will search for an exact match of the substring and replace it with the replacement e.g.
"Hello world".replace("world", "Kojichan") => "Hello Kojichan"
The problem with exact matches is that it doesn't allow anything else but exact matches.
To solve the problem, you'll have to start to use regular expressions. When using regular expression you have to be aware of
special characters such as ?, /, and \ that need to escaped \?, \/, \\
multiline mode /regexp/m
global matching if you want to replace more than one instance of the expression /regexp/g
closures for allowing multiple instances of white space \s+ for [1..n] white-space characters and \s* for [0..n] white-space characters.
To use regular expression instead of substring matching you just need to change String.replace("substring", "replacement") to String.replace(/regexp/, "replacement") e.g.
"Hello world".replace(/world/, "Kojichan") => "Hello Kojichan"
From MDN:
Note: If a <div>, <span>, or <noembed> node has a child text node that
includes the characters (&), (<), or (>), innerHTML returns these
characters as &, < and > respectively. Use element.textContent
to get a correct copy of these text nodes' contents.
So since textContent (or innerText) won't get you the HTML, you'd have to modify your search string appropriately.
You can use Regular Expressions.
Recommend to use Regular Expression. Notice that ? and / are special characters in Regular Expression. And for global multi-line matching, you need g and m flags set in the regular expression.
Regular expression matching of HTML (other than plain text) that comes out of a web page is a bad idea and is troublesome to make work cross browser (particularly in IE). The HTML that comes out of a web page does not always look the same as what was put in because some browser reconstitute the HTML and don't actually store what went in. Attributes can change order, quote marks can change or disappear, entities can change, etc...
If you want to modify whole tags, then you should directly access the DOM and operate on the actual objects in the page.
Because of the way that jQuery deals with script tags, I've found it necessary to do some HTML manipulation using regular expressions (yes, I know... not the ideal tool for the job). Unfortunately, it seems like my understanding of how captured groups work in JavaScript is flawed, because when I try this:
var scriptTagFormat = /<script .*?(src="(.*?)")?.*?>(.*?)<\/script>/ig;
html = html.replace(
scriptTagFormat,
'<span class="script-placeholder" style="display:none;" title="$2">$3</span>');
The script tags get replaced with the spans, but the resulting title attribute is blank. Shouldn't $2 match the content of the src attribute of a script tag?
Nesting of groups is irrelevant; their numbering is determined strictly by the positions of their opening parentheses within the regex. In your case, that means it's group #1 that captures the whole src="value" sequence, and group #2 that captures just the value part.
Try this:
/<script (?:(?!src).)*(?:src="(.*?)")?.*?>(.*?)<\/script>/ig
See here: rubular
As stema wrote, the .*? matches too much. With the negative lookahead (?:(?!src).)* you will match only until a src attribute.
But actually in this case you could also just move the .*? into the optional part:
/<script (?:.*?src="(.*?)")?.*?>(.*?)<\/script>/ig
See here: rubular
The .*? matches too much because the following group is optional, ==> your src is matched from one of the .*? around. if you remove the ? after your first group it works.
Update: As #morja pointed out your solution is to move the first .*? into the optional src part.
Just for completeness: /<script (?:.*?(src="(.*?)"))?.*?>(.*?)<\/script>/ig
You can see it here on rubular (corrected my link also)
If you don't want to use the content of the first capturing group, then make it a non capturing group using (?:)
/<script (?:.*?(?:src="(.*?)"))?.*?>(.*?)<\/script>/ig
Then your wanted result is in $1 and $2.
Could you post the html you are retrieving? Your code works fine in a simple example: jsfiddle (warning: alert box)
My first guess is that one of your script tags does not have a src meaning you are left with a single capture group (the script contents).
I'm thinking that regular expressions by themselves can't do exactly what I'm looking for, so here's my modification to work around the problem:
var scriptTagFormat = /<script\s+((.*?)="(.*?)")*\s*>(.*?)<\/script>/ig;
html = html.replace(
scriptTagFormat,
'<span class="script-placeholder" style="display:none;" $1>$4</span>');
Before, I wanted to avoid setting non-standard attributes on the replacement span. This code blindly copies all attributes instead. Luckily, the non-standard attributes aren't stripped out of the DOM when I insert the HTML, so it will work for my purposes.