Javascript: Adding '<' seems to break html - javascript

I have a scenario where i have strings which have the character '<' in it. An example of this is
<a href='#'>foobar</a> foo <bar
Using the native JavaScript (innerHTML) as well as jQuery (.html()), while trying to add it to the DOM, the text after and including < gets stripped off
<a href='#'>foobar</a> foo
I tried escaping the '<' however that escapes the <a> as well. Anchors are not the only html tag that may be there, so how can i handle in such a way that i can only encode the '<' that are not part of any html entity.
An example of the problem - http://jsfiddle.net/ovbg3p8m/
Please note i get the entire HTML from somewhere else and have to work with that.
Thanks.

You can use < instead for a less than symbol.

Use <, not <. .

Try using the appropriate entry from the HTML name column here.

<
is the escaping you desire for this symbol.
The key is to escape what needs to be escaped and not the tag as that is html you want to preserve.

Let's say your build your HTML like:
var text = "foo <bar";
var link_text = "foobar";
var html = '' + link_text + '' + text;
$(".node").html(html);
If you escape just text and link_text (but not html) you're good.
The distinction between what is html and what isn't is very important.
In PHP there is the function htmlspecialchars which replaces &, ", ', < and > by their escaped forms: & " &apos; < >. This is sufficient to avoid breaking HTML.

you have to escape the <
var text = 'foobar foo <bar foobar';
Find more here
Escape html

Related

quotation marks within quotation marks syntax [duplicate]

I'm outputting values from a database (it isn't really open to public entry, but it is open to entry by a user at the company -- meaning, I'm not worried about XSS).
I'm trying to output a tag like this:
Click Me
DESCRIPTION is actually a value from the database that is something like this:
Prelim Assess "Mini" Report
I've tried replacing " with \", but no matter what I try, Firefox keeps chopping off my JavaScript call after the space after the word Assess, and it is causing all sorts of issues.
I must bemissing the obvious answer, but for the life of me I can't figure it out.
Anyone care to point out my idiocy?
Here is the entire HTML page (it will be an ASP.NET page eventually, but in order to solve this I took out everything else but the problem code)
<html>
<body>
edit
</body>
</html>
You need to escape the string you are writing out into DoEdit to scrub out the double-quote characters. They are causing the onclick HTML attribute to close prematurely.
Using the JavaScript escape character, \, isn't sufficient in the HTML context. You need to replace the double-quote with the proper XML entity representation, ".
" would work in this particular case, as suggested before me, because of the HTML context.
However, if you want your JavaScript code to be independently escaped for any context, you could opt for the native JavaScript encoding:
' becomes \x27
" becomes \x22
So your onclick would become:DoEdit('Preliminary Assessment \x22Mini\x22');
This would work for example also when passing a JavaScript string as a parameter to another JavaScript method (alert() is an easy test method for this).
I am referring you to the duplicate Stack Overflow question, How do I escape a string inside JavaScript code inside an onClick handler?.
<html>
<body>
edit
</body>
</html>
Should do the trick.
Folks, there is already the unescape function in JavaScript which does the unescaping for \":
<script type="text/javascript">
var str="this is \"good\"";
document.write(unescape(str))
</script>
The problem is that HTML doesn't recognize the escape character. You could work around that by using the single quotes for the HTML attribute and the double quotes for the onclick.
<a href="#" onclick='DoEdit("Preliminary Assessment \"Mini\""); return false;'>edit</a>
This is how I do it, basically str.replace(/[\""]/g, '\\"').
var display = document.getElementById('output');
var str = 'class="whatever-foo__input" id="node-key"';
display.innerHTML = str.replace(/[\""]/g, '\\"');
//will return class=\"whatever-foo__input\" id=\"node-key\"
<span id="output"></span>
If you're assembling the HTML in Java, you can use this nice utility class from Apache commons-lang to do all the escaping correctly:
org.apache.commons.lang.StringEscapeUtils Escapes and unescapes
Strings for Java, Java Script, HTML, XML, and SQL.
Please find in the below code which escapes the single quotes as part of the entered string using a regular expression. It validates if the user-entered string is comma-separated and at the same time it even escapes any single quote(s) entered as part of the string.
In order to escape single quotes, just enter a backward slash followed by a single quote like: \’ as part of the string. I used jQuery validator for this example, and you can use as per your convenience.
Valid String Examples:
'Hello'
'Hello', 'World'
'Hello','World'
'Hello','World',' '
'It\'s my world', 'Can\'t enjoy this without me.', 'Welcome, Guest'
HTML:
<tr>
<td>
<label class="control-label">
String Field:
</label>
<div class="inner-addon right-addon">
<input type="text" id="stringField"
name="stringField"
class="form-control"
autocomplete="off"
data-rule-required="true"
data-msg-required="Cannot be blank."
data-rule-commaSeparatedText="true"
data-msg-commaSeparatedText="Invalid comma separated value(s).">
</div>
</td>
JavaScript:
/**
*
* #param {type} param1
* #param {type} param2
* #param {type} param3
*/
jQuery.validator.addMethod('commaSeparatedText', function(value, element) {
if (value.length === 0) {
return true;
}
var expression = new RegExp("^((')([^\'\\\\]*(?:\\\\.[^\'\\\\])*)[\\w\\s,\\.\\-_\\[\\]\\)\\(]+([^\'\\\\]*(?:\\\\.[^\'\\\\])*)('))(((,)|(,\\s))(')([^\'\\\\]*(?:\\\\.[^\'\\\\])*)[\\w\\s,\\.\\-_\\[\\]\\)\\(]+([^\'\\\\]*(?:\\\\.[^\'\\\\])*)('))*$");
return expression.test(value);
}, 'Invalid comma separated string values.');
I have done a sample one using jQuery
var descr = 'test"inside"outside';
$(function(){
$("#div1").append('Click Me');
});
function DoEdit(desc)
{
alert ( desc );
}
And this works in Internet Explorer and Firefox.
You can copy those two functions (listed below), and use them to escape/unescape all quotes and special characters. You don't have to use jQuery or any other library for this.
function escape(s) {
return ('' + s)
.replace(/\\/g, '\\\\')
.replace(/\t/g, '\\t')
.replace(/\n/g, '\\n')
.replace(/\u00A0/g, '\\u00A0')
.replace(/&/g, '\\x26')
.replace(/'/g, '\\x27')
.replace(/"/g, '\\x22')
.replace(/</g, '\\x3C')
.replace(/>/g, '\\x3E');
}
function unescape(s) {
s = ('' + s)
.replace(/\\x3E/g, '>')
.replace(/\\x3C/g, '<')
.replace(/\\x22/g, '"')
.replace(/\\x27/g, "'")
.replace(/\\x26/g, '&')
.replace(/\\u00A0/g, '\u00A0')
.replace(/\\n/g, '\n')
.replace(/\\t/g, '\t');
return s.replace(/\\\\/g, '\\');
}
Escape whitespace as well. It sounds to me like Firefox is assuming three arguments instead of one. is the non-breaking space character. Even if it's not the whole problem, it may still be a good idea.
You need to escape quotes with double backslashes.
This fails (produced by PHP's json_encode):
<script>
var jsonString = '[{"key":"my \"value\" "}]';
var parsedJson = JSON.parse(jsonString);
</script>
This works:
<script>
var jsonString = '[{"key":"my \\"value\\" "}]';
var parsedJson = JSON.parse(jsonString);
</script>
You can use the escape() and unescape() jQuery methods. Like below,
Use escape(str); to escape the string and recover again using unescape(str_esc);.

Javascript: replace() all but only outside html tags

I have an autocomplete form and when showing the results matching the user's search string, I want to highlight the search string itself. I plan to do this by wrapping any occurrence of the search string within a tag such as , or a with a given class. Now, the problem is that when using regEx I have problems if the pattern occurs within a html tag.
For instance
var searchPattern = 'pa';
var originalString = 'The pattern to <span class="something">be replaced is pa but only outside the html tag</span>';
var regEx = new RegExp(searchPattern, "gi")
var output = originalString.replace(regEx, "<strong>" + searchPattern + "</strong>");
alert(output);
(Demo: http://jsfiddle.net/cumufLm3/7/ )
This is going to replace also the occurrence of "pa" within the tag
<span class="something">
breaking the code. I'm not sure how to deal with this. I've been checking various similar questions, and I've understood that in general I shouldn't use regular expressions to parse html. But I'm not sure if there is any quick way to parse smoothly the html string, alter the text of each node, and "rebuild" the string with the text altered?
Of course I suppose I could use $.parseHTML(), iterate over each node, and somehow rewrite the string, but this seems to me to be too complex and prone to errors.
Is there a smart way to parse the html string somehow to tell "do this only outside of html tags"?
Please notice that the content of the tag itself must be handled. So, in my example above, the replace() should act also on the part "be replaced is pa but only outside the html tag".
Any idea of either a regular expression solid enough to deal with this, or (better, I suppose) to elegantly handle the text parts within the html string?
Your code should look like this:
var searchWord = 'pa';
var originalString = 'The pattern to <span class="something">be replaced is pa but only outside the html tag</span>';
var regEx = new RegExp("(" + searchWord + ")(?!([^<]+)?>)", "gi");
var output = originalString.replace(regEx, "<strong>$1</strong>");
alert(output);
Source: http://pureform.wordpress.com/2008/01/04/matching-a-word-characters-outside-of-html-tags/
Parse the HTML and find all text nodes in it, doing the replace in all of them. If you are using jQuery you can do this by just passing the snippet to $() which parses it in a Document Fragment, which you can then query or step over all elements and find all the .text() to replace.

Regular expression in javascript to match outside of XML tags

I want find all matches of "a" in <span class="get">habbitant morbi</span> triastbbitique , except "a" in tags (See below "a" between **).
<span class="get">h*a*bbit*a*nt morbi</span> tri*a*stbbitique.
If I find them, I want to replace them and also I want to save original tags.
This expression doesn't work:
var variable = "a";
var reg = new RegExp("[^<]."+variable+".[^>]$",'gi');
I would recommend to not use a regular expression to parse HTML; it's not a regular grammar, and you will experience pain for all but simple cases.
Your question is still a bit unclear, but let me try rephrasing to see if I have it right:
You'd like to get all matches of a given string in a HTML document, except for matches in <tag> bodies?
Assuming you're using jQuery or similar:
// Let the browser parse it for you:
var container = document.createElement()
container.innerHTML = '<span class="get">habbitant morbi</span> triastbbitique'
var doc_text = $(container).text()
// And then you can just regex away normally:
doc_text.match(/a/gi)
(Even better would be to use DOMParser, but that doesn't have wide browser support yet)
If you're in Node, then you want to look for some libraries that help you parse HTML nodes (like jsdom); and then just splat out all the next nodes.
Note that this question isn't about parsing. This is lexing. Something that regex are regularly and properly used for.
If you want to go with regex there are a couple of ways you could do this.
A simple hack lookahead like:
a(?![^<>]*>)
note that this wont handle < and > quoted in tags/unescaped outside of tags properly.
A full blown tokenizer of the form:
(expression for tag|comments|etc)|(stuff outside that that i'm interested in)
Replaced with a function that does different things depending on which part was matched. If $1 matched it would be replaced by it self, if $2 matchehd replace it with *$2*
The full tokenizer way is of course not a trivial task, the spec isn't small.
But if simplifying to only match the basic tags, ignore CDATA, comments, script/style tags, etc, you could use the following:
var str = '<span class="a <lal> a" attr>habbitant 2 > morbi. 2a < 3a</span> triastbbitique';
var re = /(<[a-z\/](?:"[^"]*"|'[^']*'|[^'">]+)*>)|(a)/gi;
var res = str.replace(re, function(m, tag, a){
return tag ? tag : "*" + a + "*";
});
Result:
<span class="a <lal> a" attr>h*a*bbit*a*nt 2 > morbi. 2*a* < 3*a*</span> tri*a*stbbitique
Live Example:
var str = '<span class="a <lal> a" attr>habbitant 2 > morbi. 2a < 3a</span> triastbbitique';
var re = /(<[a-z\/](?:"[^"]*"|'[^']*'|[^'">]+)*>)|(a)/gi;
var res = str.replace(re, function(m, tag, a){
return tag ? tag : "*" + a + "*";
});
console.log(res);
This handles messy tags, quotes and unescaped </> in the HTML.
Couple examples of tokenizing HTML tags with regex (which should translate fine to JS regex):
Remove on* JS event attributes from HTML tags
Regex to allow only set of HTML Tags and Attributes

Javascript Regex Replace HTML Tags

Having a lot of difficulties using regex.
Heres what i am trying to do...
text<div> text </div><div> text </div><div> text </div>
to turn it in to
text<br> text<br>text<br>text
I've tryed doing...
newhtml = newhtml.replace(/\<div>/g,'<br>');
newhtml = newhtml.replace(/\</div>/g,' ');
but this gives the wrong output. Does jquery provide a better way of doing this?
That's because you're escaping the wrong thing, as only the backslash needs to be escaped.
newhtml = newhtml.replace(/<div>/g,'<br>');
newhtml = newhtml.replace(/<\/div>/g,' ');
Yes you are correct, jQuery does provide a better way of doing this.
An interesting read
first.
Easy, elegant, solution to your specific problem.
$('div').replaceWith(function(){
return "<br>"+$(this).html();
});​
jsFiddle
Don't use regexes if you don't need them; just replace string literals.
text.replace("<div>","<br>").replace("</div>","");
Note: This solution applies exactly to this scenario, I don't normally have anything against using regular expresions.
This must do the job:
text.replace(/(<\/?\w+?>)\s*?(<\/?\w+?>)|(<\/?\w+?>)/g,'<br>')
Though this will only work if there were no tags with some attributes like <div id="foo1">
You do not need to escape < as you did in your example, but instead you do need to escape /
A simple way to do this is the following:
$('.container').html(function(i, html) {
return html.replace(/<(|\/)div>/g, function(match) {
return match == '<div>' ? '<br>' : '';
});
});
/<(|\/)div>/: Matches <div> or </div>.
demo
Note: .container is where your html is placed.
One Liner using JQuery
newhtml = $(newhtml ).text().split(' ').join('<br/>');
You can achieve this using a simple RegExp
output = inputText.replace(/<\w{0,}\W{0,}>|<\W{0,}\w{1,}>/ig, "With whatever you want it to be replaced with")
Or you can do this
String.prototype.replaceTags = function( replacementText )
{
var x = new RegExp( "(" + replacementText + ")+" , "ig");
return this
.replace( /<\w{0,}\W{0,}>|<\W{0,}\w{1,}>/ig, replacementText )
.replace( x, replacementText )
}
And then call it directly on the String as follows
"text<div> text </div><div> text </div><div> text </div>".replaceTags( "<br>" )
You'll get this -- "text<br> text <br> text <br> text <br>"
This will search for portions in the string which begin with the "<" contains some text in between "div/p/br" additionally if the tag is being ended by "/" and finally the ">" closing of the tag. The ignore case will help when you are not sure that the element is written in Upper or Lower case.

jquery / javascript: regex to replace instances of an html tag

I'm trying to take some parsed XML data, search it for instances of the tag and replace that tag (and anything that may be inside the font tag), and replace it with a simple tag.
This is how I've been doing my regexes:
var emailReg = /^([\w-\.]+#([\w-]+\.)+[\w-]{2,4})?$/; //Test against valid email
console.log('regex: ' + emailReg.test(testString));
and so I figured the font regex would be something like this:
var fontReg = /'<'+font+'[^><]*>|<.'+font+'[^><]*>','g'/;
console.log('regex: ' + fontReg.test(testString));
but that isn't working. Anyone know a way to do this? Or what I might be doing wrong in the code above?
I think namuol's answer will serve you better then any RegExp-based solution, but I also think the RegExp deserves some explanation.
JavaScript doesn't allow for interpolation of variable values in RegExp literals.
The quotations become literal character matches and the addition operators become 1-or-more quantifiers. So, your current regex becomes capable of matching these:
# left of the pipe `|`
'<'font'>
'<''''fontttt'>
# right of the pipe `|`
<#'font'>','g'
<#''''fontttttt'>','g'
But, it will not match these:
<font>
</font>
To inject a variable value into a RegExp, you'll need to use the constructor and string concat:
var fontReg = new RegExp('<' + font + '[^><]*>|<.' + font + '[^><]*>', 'g');
On the other hand, if you meant for literal font, then you just needed:
var fontReg = /<font[^><]*>|<.font[^><]*>/g;
Also, each of those can be shortened by using .?, allowing the halves to be combined:
var fontReg = new RegExp('<.?' + font + '[^><]*>', 'g');
var fontReg = /<.?font[^><]*>/g;
If I understand your problem correctly, this should replace all font tags with simple span tags using jQuery:
$('font').replaceWith(function () {
return $('<span>').append($(this).contents());
});
Here's a working fiddle: http://jsfiddle.net/RhLmk/2/

Categories