I want to initialize a Javascript variable from some JSON (generated via Jackson) in my JSPX, something like this:
<script>
var x = <c:out value="${myJson}" />;
</script>
But the output I get looks like:
<script>
var x = {"foo":"bar"};
</script>
I see what you did there, HTML-escaping the string. Obviously, I can't leave it completely unescaped because angle brackets in the data could break the page. But I don't really need all the quotes to be escaped, since I'm not putting the JSON within an attribute value, do I?
Now, this looks like it would be a perfectly valid way to write a script in HTML, just needlessly complicated (like, say, replacing spaces with ). As it turns out, it works just fine in XHTML, but with an HTML content type, I get an error, both in Firefox and IE. I'm not sure of the rationale, but that's how it is.
So, what's the best approach here? Do I really want to simply escape angle brackets but not escape double quotes, or are there any other gotchas? Is there a tag out there that would replace c:out (I know there are Spring tags for escaping Javascript, but that's still not the right kind of escaping)? How do people get this to work?
BTW, yes, I could make a separate AJAX call, but an extra round trip just to work around this problem seems silly.
UPDATE
I had a lot to learn about CDATA vs. PCDATA and how HTML is different from XHTML. Here I thought JSPX would make polyglot markup easy, but it turns out to be, as someone put it, a big ball of nasty.
For HTML, the <script> element has a CDATA content model (not to be confused with CDATA sections), which means nothing can be escaped, but </ must be absolutely avoided.
In the special case of JSON, where end tags can only occur within a quoted string, this therefore means the safe way to escape is to use Javascript (rather than HTML) escaping and replace </ with <\/.
For XHTML (if you care about such things) on the other hand, you just XML-escape everything as usual (& becomes &, etc.) and it all works beautifully. A compatible solution would have to use CDATA with guarding comments (<!--/*--><![CDATA[/*><!--*/ etc.) around the entire <script> body and then escape any occurrences of ]]> within the JSON; furthermore, I'd still escape </ too just to be safe. Big ball of nasty, indeed, but at least it can be automated.
set escapeXml=false
<c:out value="${myJson}" escapeXml="false"/>
OK, answering my own question here, after much research and no real help.
Based my "update" above, the most straightforward way targeting HTML is just:
<script>
var x = ${fn:replace(myJson, "\</", "\<\\/")};
</script>
Ugly but simple.
This will not yield valid XML or XHTML, unfortunately. If you really need that, the original c:out will work fine, though it will not yield valid HTML. And if you really need a single solution to work on both, you probably need a custom taglib (or TAGX) that will either switch from the content type or do all of the following:
wrap the script body in a comment-guarded CDATA section
replace each </ with <\/
replace each ]]> with ]]\>
Related
I would like to store a JSON's contents in a HTML document's source, inside a script tag.
The content of that JSON does depend on user submitted input, thus great care is needed to sanitise that string for XSS.
I've read two concept here on SO.
1. Replace all occurrences of the </script tag into <\/script, or replace all </ into <\/ server side.
Code wise it looks like the following (using Python and jinja2 for the example):
// view
data = {
'test': 'asdas</script><b>as\'da</b><b>as"da</b>',
}
context_dict = {
'data_json': json.dumps(data, ensure_ascii=False).replace('</script', r'<\/script'),
}
// template
<script>
var data_json = {{ data_json | safe }};
</script>
// js
access it simply as window.data_json object
2. Encode the data as a HTML entity encoded JSON string, and unescape + parse it in client side. Unescape is from this answer: https://stackoverflow.com/a/34064434/518169
// view
context_dict = {
'data_json': json.dumps(data, ensure_ascii=False),
}
// template
<script>
var data_json = '{{ data_json }}'; // encoded into HTML entities, like < > &
</script>
// js
function htmlDecode(input) {
var doc = new DOMParser().parseFromString(input, "text/html");
return doc.documentElement.textContent;
}
var decoded = htmlDecode(window.data_json);
var data_json = JSON.parse(decoded);
This method doesn't work because \" in a script source becames " in a JS variable. Also, it creates a much bigger HTML document and also is not really human readable, so I'd go with the first one if it doesn't mean a huge security risk.
Is there any security risk in using the first version? Is it enough to sanitise a JSON encoded string with .replace('</script', r'<\/script')?
Reference on SO:
Best way to store JSON in an HTML attribute?
Why split the <script> tag when writing it with document.write()?
Script tag in JavaScript string
Sanitize <script> element contents
Escape </ in script tag contents
Some great external resources about this issue:
Flask's tojson filter's implementation source
Rail's json_escape method's help and source
A 5 year long discussion in Django ticket and proposed code
Here's how I dealt with the relatively minor part of this issue, the encoding problem with storing JSON in a script element. The short answer is you have to escape either < or / as together they terminate the script element -- even inside a JSON string literal. You can't HTML-encode entities for a script element. You could JavaScript-backslash-escape the slash. I preferred to JavaScript-hex-escape the less-than angle-bracket as \u003C.
.replace('<', r'\u003C')
I ran into this problem trying to pass the json from oembed results. Some of them contain script close tags (without mentioning Twitter by name).
json_for_script = json.dumps(data).replace('<', r'\u003C');
This turns data = {'test': 'foo </script> bar'}; into
'{"test": "foo \\u003C/script> bar"}'
which is valid JSON that won't terminate a script element.
I got the idea from this little gem inside the Jinja template engine. It's what's run when you use the {{data|tojson}} filter.
def htmlsafe_json_dumps(obj, dumper=None, **kwargs):
"""Works exactly like :func:`dumps` but is safe for use in ``<script>``
tags. It accepts the same arguments and returns a JSON string. Note that
this is available in templates through the ``|tojson`` filter which will
also mark the result as safe. Due to how this function escapes certain
characters this is safe even if used outside of ``<script>`` tags.
The following characters are escaped in strings:
- ``<``
- ``>``
- ``&``
- ``'``
This makes it safe to embed such strings in any place in HTML with the
notable exception of double quoted attributes. In that case single
quote your attributes or HTML escape it in addition.
"""
if dumper is None:
dumper = json.dumps
rv = dumper(obj, **kwargs) \
.replace(u'<', u'\\u003c') \
.replace(u'>', u'\\u003e') \
.replace(u'&', u'\\u0026') \
.replace(u"'", u'\\u0027')
return Markup(rv)
(You could use \x3C instead of \u003C and that would work in a script element because it's valid JavaScript. But might as well stick to valid JSON.)
First of all, your paranoia is well founded.
an HTML-parser could be tricked by a closing script tag (better assume by any closing tag)
a JS-parser could be tricked by backslashes and quotes (with a really bad encoder)
Yes, it would be much "safer" to encode all characters that could confuse the different parsers involved. Keeping it human-readable might be contradicting your security paradigm.
Note: The result of JSON String encoding should be canoncical and OFC, not broken, as in parsable. JSON is a subset of JS and thus be JS parsable without any risk. So all you have to do is make sure the HTML-Parser instance that extracts the JS-code is not tricked by your user data.
So the real pitfall is the nesting of both parsers. Actually, I would urge you to put something like that into a separate request. That way you would avoid that scenario completely.
Assuming all possible styles and error-corrections that could happen in such a parser it might be that other tags (open or close) might achieve a similar feat.
As in: suggesting to the parser that the script tag has ended implicitly.
So it is advisable to encode slash and all tag braces (/,<,>), not just the closing of a script-tag, in whatever reversible method you choose, as long as long as it would not confuse the HTML-Parser:
Best choice would be base64 (but you want more readable)
HTMLentities will do, although confusing humans :)
Doing your own escaping will work as well, just escape the individual characters rather than the </script fragment
In conclusion, yes, it's probably best with a few changes, but please note that you will be one step away from "safe" already, by trying something like this in the first place, instead of loading the JSON via XHR or at least using a rigorous string encoding like base64.
P.S.: If you can learn from other people's code encoding the strings that's nice, but you should not resort to "libraries" or other people's functions if they don't do exactly what you need.
So rather write and thoroughly test your own (de/en)coder and know that this pitfall has been sealed.
I have the below code in my JSP. UI displays every character correctly other than "&".
<c:out value="<script>var escapedData=unescape('${column}');
$('div').html(escapedData);</script>" escapeXml="false" /> </div>
E.g. 1) working case
input = ni!er#
Value in my escapedData variable is ni%21er%40. Now when I put it in my div using
$('div').html(escapedData); then o/p on html is as expected
E.g. 2) Issue case
input = nice&
Value in my escapedData variable is nice%26. Now when I put it in my div using
$('div').html(escapedData); then also it displays below
$('#test20').html('nice%26');
However, when output is displayed in JSP, it just prints "nice". It truncates everything after &.
Any suggestions?
It looks like you have some misunderstandings what unescape(val)/escape(val) do and where you need them. And what you need to take attention of when you use .html().
HTML and URI have certain character that have special meanings. The most important ones are:
HTML: <, >, &
URI: /,?,%,&
If you want to use one of those characters in HTML or URI you need to escape them.
The escaping for URI and for HTML are different.
The functions unescape/escape (deprecated) and decodeURI/endcodeURI are for URI. But was you want is to escape your data into the HTML format.
There is no build-in function in_JS_ that does this but you could e.g. use the code of the answer to this question Can I escape html special chars in javascript?.
But as it seems that you use jQuery you could think of just using .text instead of .html as this will do the escaping for you.
An additional note:
I'm pretty sure that the var escapedData=unescape('${column}'); does not do anything. I assume that ${column} already is ni!er#/nice&.
So please check your source code. If var escapedData=unescape('${column}'); will look like var escapedData=unescape('ni!er#'); then you should remove the unescape otherwise you would not get the expected result if the ${column} contains something like e.g. %23.
(when rendering a HTML template)
<hidden name=”param${ns?htmlattr}” />
<a href=”${url?urlencode}”>${usercontent?htmlencode}</a>
${rawhtml?htmlliteral}
<script>
var a = “${str?jsstr}”; //null becomes “”
var b = ${str?quote,jsstr}; //allow null, render quotes if nonnull
var c = ${func?jsliteral}
var ${func?jsidentifier} = null;
</script>
jsstr escapes \t\b\f\n\r\\\'\" and </
jsliteral escapes </
jsidentifier replaces non-alnum with a dummy character
xmlattr escapes <>& and filters characters that aren't legal UTF-8
htmlencode encodes almost all edge cases into stuff like &
quote causes a string to render out quoted (including empty), or null
A few of these might not be relevant for security--they just help the code stay sane. Which escape mode do we choose as the default to help prevent XSS -- be "more secure" by default? What if we default to the most restrictive (htmlencode) and relax/switch excape modes from there?
I'm not interested in discussing the merits of all these escape modes -- for better or worse, they all exist in our codebase. Am I missing any modes? Any good reading material?
Take a look at http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/safetemplate.html
That defines contexts in HTML and a mapping from those contexts to escaping functions.
For a runnable example, take a look at http://js-quasis-libraries-and-repl.googlecode.com/svn/trunk/index.html . Try starting with the "Safe HTML" examples from the dropdown menu at the top-right.
To address your specific example, jsliteral looks a little widgy. What benefit do you get from html encoding anything inside a <script> block? The content is CDATA.
What are jsidentifier and jsliteral guarding? Do they stop dangerous identifiers like eval from being assigned? They should probably prevent <!-- in addition to </ since an injected /*<!-- could cause the </script> to be ignored possibly allowing an later interpolation to masquerade as script content.
I have some addHtml JavaScript function in my JS code. I wonder how to escape HTML/JS code properly. Basically, what I am trying right now is:
addHtml("<a onclick=\"alert(\\\"Hello from JS\\\")\">click me</a>")
However, that doesn't work. It adds the a element but it doesn't do anything when I click it.
I don't want to replace all " by ' as a workaround. (If I do, it works.)
I wonder how to escape HTML/JS code properly.
To insert string content into an HTML event handler attribute:
(1) Encode it as a JavaScript string literal:
alert("Hello \"world\"");
(2) Encode the complete JavaScript statement as HTML:
<a onclick="alert("Hello \"world\""">foo</a>
And since you seem to be including that HTML inside a JavaScript string literal again, you have to JS-encode it again:
html= "<a onclick=\"alert("Hello \\"world\\""\">foo<\/a>";
Notice the double-backslashes and also the <\/, which is necessary to avoid a </ sequence in a <script> block, which would otherwise be invalid and might break.
You can make this less bad for yourself by mixing single and double quotes to cut down on the amount of double-escaping going on, but you can't solve it for the general case; there are many other characters that will cause problems.
All this escaping horror is another good reason to avoid inline event handler attributes. Slinging strings full of HTML around sucks. Use DOM-style methods, assigning event handlers directly from JavaScript instead:
var a= document.createElement('a');
a.onclick= function() {
alert('Hello from normal JS with no extra escaping!');
};
My solution would be
addHtml('<a onclick="alert(\'Hello from JS\')">click me</a>')
I typically use single quotes in Javascript strings, and double quotes in HTML attributes. I think it's a good rule to follow.
How about this?
addHtml("<a onclick=\"alert("Hello from JS")\">click me</a>");
It worked when I tested in Firefox, at any rate.
addHtml("<a onclick='alert(\"Hello from JS\")'>click me</a>")
The problem is probably this...
As your code is now, it will add this to the HTML
<a onclick="alert("Hello from Javascript")"></a>
This is assuming the escape slashes will all be removed properly.
The problem is that the alert can't handle the " inside it... you'll have to change those quotes to single quotes.
addHtml("<a onclick=\"alert(\\\'Hello from JS\\\')\">click me</a>")
That should work for you.
What does the final HTML rendered in the browser look like ? I think the three slashes might be causing an issue .
The reason for this "escapes" me.
JSON escapes the forward slash, so a hash {a: "a/b/c"} is serialized as {"a":"a\/b\/c"} instead of {"a":"a/b/c"}.
Why?
JSON doesn't require you to do that, it allows you to do that. It also allows you to use "\u0061" for "A", but it's not required, like Harold L points out:
The JSON spec says you CAN escape forward slash, but you don't have to.
Harold L answered Oct 16 '09 at 21:59
Allowing \/ helps when embedding JSON in a <script> tag, which doesn't allow </ inside strings, like Seb points out:
This is because HTML does not allow a string inside a <script> tag to contain </, so in case that substring's there, you should escape every forward slash.
Seb answered Oct 16 '09 at 22:00 (#1580667)
Some of Microsoft's ASP.NET Ajax/JSON API's use this loophole to add extra information, e.g., a datetime will be sent as "\/Date(milliseconds)\/". (Yuck)
The JSON spec says you CAN escape forward slash, but you don't have to.
I asked the same question some time ago and had to answer it myself. Here's what I came up with:
It seems, my first thought [that it comes from its JavaScript
roots] was correct.
'\/' === '/' in JavaScript, and JSON is valid JavaScript. However,
why are the other ignored escapes (like \z) not allowed in JSON?
The key for this was reading
http://www.cs.tut.fi/~jkorpela/www/revsol.html, followed by
http://www.w3.org/TR/html4/appendix/notes.html#h-B.3.2. The feature of
the slash escape allows JSON to be embedded in HTML (as SGML) and XML.
PHP escapes forward slashes by default which is probably why this appears so commonly. I suspect it's because embedding the string "</script>" inside a <script> tag is considered unsafe.
Example:
<script>
var searchData = <?= json_encode(['searchTerm' => $_GET['search'], ...]) ?>;
// Do something else with the data...
</script>
Based on this code, an attacker could append this to the page's URL:
?search=</script> <some attack code here>
Which, if PHP's protection was not in place, would produce the following HTML:
<script>
var searchData = {"searchTerm":"</script> <some attack code here>"};
...
</script>
Even though the closing script tag is inside a string, it will cause many (most?) browsers to exit the script tag and interpret the items following as valid HTML.
With PHP's protection in place, it will appear instead like this, which will NOT break out of the script tag:
<script>
var searchData = {"searchTerm":"<\/script> <some attack code here>"};
...
</script>
This functionality can be disabled by passing in the JSON_UNESCAPED_SLASHES flag but most developers will not use this since the original result is already valid JSON.
Yes, some JSON utiltiy libraries do it for various good but mostly legacy reasons. But then they should also offer something like setEscapeForwardSlashAlways method to set this behaviour OFF.
In Java, org.codehaus.jettison.json.JSONObject does offer a method called
setEscapeForwardSlashAlways(boolean escapeForwardSlashAlways)
to switch this default behaviour off.