I am writing JavaScript templates for a content management system where users fill out text input fields that are passed to my templates.
My problem is the quotation marks in the input fields are not escaped before they are passed to my template, so I have no way of knowing if they will contain single or double quotes, or even both. Whichever way I try to handle the data my code ends up breaking because the quotes terminate the string declaration. I want to run a function on the data to escape quotes but I can't find a way to get the data into a valid variable first.
Is there any way to safely handle the data in JavaScript without it breaking a string variable declaration?
Edit: I'm posting code example;
CMS Text Input Field value is: Who'll win our "Big Contest"?
Text Input Field placeholder macro is [%TextInput%]
I'm building an HTML template for this input, using just JS/HTML/CSS
<script>
(function(){
var textInputStr = "[%TextInput%]";
})();
</script>
This will break the string declaration if the value of TextInput contains a single quote, and vice versa.
This is an awesome question, and one that deserves an answer. Strings in JS don't have a custom delimiter, like in most other modern languages, so you can get really stuck. But one solution is to build a script tag with the placeholder inside it, then find the innerHTML of that tag and grab the string back into a variable. eg
<script id="parseMe" type="text/template">
[%TextInput%]
</script>
then use
var yourString = document.getElementById("parseMe").innerHTML
Now you can manipulate the string as you please.
HTH!
I want to run a function on the data to escape quotes but I can't find a way to get the data into a valid variable first.
Well, you will have to make it a valid string literal before you could run JavaScript functions on it. There's no other way (unless you count an ajax request to the template script to get a string representation of it).
The input fields are not escaped before they are passed to my template
Then fix that. There's nothing you can do about it in JavaScript.
Related
I have this code where I grab an attribute value and load it into a form, the headline line can look something like:
Welcome to America's best valued whatever
But when using this escape function, the string is cut off at the apostrophe,
var headline = escape($(this).attr("data-headline"));
//populate the textbox
$(e.currentTarget).find('input[name="headline"]').val(headline);
I've also tried using the solutions here: HtmlSpecialChars equivalent in Javascript? with no luck.
How can I populate my input and keep apostrophe's/quotes?
Just use
$(this).find('input[name="headline"]').val(this.dataset.headline);
No need for any escaping.
However, notice that escape does not cut off apostrophes, it replaces them with %27. If your current code does not work with apostrophes in the headline, make sure that the markup containing the data-headline attribute is properly escaped by whatever tool is creating it.
var headline = $(this).attr("data-headline").replace(/'/g, '%27');
//populate the textbox
$(e.currentTarget).find('input[name="headline"]').val(unescape(headline));
If browser compatibility is important, dataset is only available IE11+ https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/dataset#Browser_compatibility
I would like to store a JSON's contents in a HTML document's source, inside a script tag.
The content of that JSON does depend on user submitted input, thus great care is needed to sanitise that string for XSS.
I've read two concept here on SO.
1. Replace all occurrences of the </script tag into <\/script, or replace all </ into <\/ server side.
Code wise it looks like the following (using Python and jinja2 for the example):
// view
data = {
'test': 'asdas</script><b>as\'da</b><b>as"da</b>',
}
context_dict = {
'data_json': json.dumps(data, ensure_ascii=False).replace('</script', r'<\/script'),
}
// template
<script>
var data_json = {{ data_json | safe }};
</script>
// js
access it simply as window.data_json object
2. Encode the data as a HTML entity encoded JSON string, and unescape + parse it in client side. Unescape is from this answer: https://stackoverflow.com/a/34064434/518169
// view
context_dict = {
'data_json': json.dumps(data, ensure_ascii=False),
}
// template
<script>
var data_json = '{{ data_json }}'; // encoded into HTML entities, like < > &
</script>
// js
function htmlDecode(input) {
var doc = new DOMParser().parseFromString(input, "text/html");
return doc.documentElement.textContent;
}
var decoded = htmlDecode(window.data_json);
var data_json = JSON.parse(decoded);
This method doesn't work because \" in a script source becames " in a JS variable. Also, it creates a much bigger HTML document and also is not really human readable, so I'd go with the first one if it doesn't mean a huge security risk.
Is there any security risk in using the first version? Is it enough to sanitise a JSON encoded string with .replace('</script', r'<\/script')?
Reference on SO:
Best way to store JSON in an HTML attribute?
Why split the <script> tag when writing it with document.write()?
Script tag in JavaScript string
Sanitize <script> element contents
Escape </ in script tag contents
Some great external resources about this issue:
Flask's tojson filter's implementation source
Rail's json_escape method's help and source
A 5 year long discussion in Django ticket and proposed code
Here's how I dealt with the relatively minor part of this issue, the encoding problem with storing JSON in a script element. The short answer is you have to escape either < or / as together they terminate the script element -- even inside a JSON string literal. You can't HTML-encode entities for a script element. You could JavaScript-backslash-escape the slash. I preferred to JavaScript-hex-escape the less-than angle-bracket as \u003C.
.replace('<', r'\u003C')
I ran into this problem trying to pass the json from oembed results. Some of them contain script close tags (without mentioning Twitter by name).
json_for_script = json.dumps(data).replace('<', r'\u003C');
This turns data = {'test': 'foo </script> bar'}; into
'{"test": "foo \\u003C/script> bar"}'
which is valid JSON that won't terminate a script element.
I got the idea from this little gem inside the Jinja template engine. It's what's run when you use the {{data|tojson}} filter.
def htmlsafe_json_dumps(obj, dumper=None, **kwargs):
"""Works exactly like :func:`dumps` but is safe for use in ``<script>``
tags. It accepts the same arguments and returns a JSON string. Note that
this is available in templates through the ``|tojson`` filter which will
also mark the result as safe. Due to how this function escapes certain
characters this is safe even if used outside of ``<script>`` tags.
The following characters are escaped in strings:
- ``<``
- ``>``
- ``&``
- ``'``
This makes it safe to embed such strings in any place in HTML with the
notable exception of double quoted attributes. In that case single
quote your attributes or HTML escape it in addition.
"""
if dumper is None:
dumper = json.dumps
rv = dumper(obj, **kwargs) \
.replace(u'<', u'\\u003c') \
.replace(u'>', u'\\u003e') \
.replace(u'&', u'\\u0026') \
.replace(u"'", u'\\u0027')
return Markup(rv)
(You could use \x3C instead of \u003C and that would work in a script element because it's valid JavaScript. But might as well stick to valid JSON.)
First of all, your paranoia is well founded.
an HTML-parser could be tricked by a closing script tag (better assume by any closing tag)
a JS-parser could be tricked by backslashes and quotes (with a really bad encoder)
Yes, it would be much "safer" to encode all characters that could confuse the different parsers involved. Keeping it human-readable might be contradicting your security paradigm.
Note: The result of JSON String encoding should be canoncical and OFC, not broken, as in parsable. JSON is a subset of JS and thus be JS parsable without any risk. So all you have to do is make sure the HTML-Parser instance that extracts the JS-code is not tricked by your user data.
So the real pitfall is the nesting of both parsers. Actually, I would urge you to put something like that into a separate request. That way you would avoid that scenario completely.
Assuming all possible styles and error-corrections that could happen in such a parser it might be that other tags (open or close) might achieve a similar feat.
As in: suggesting to the parser that the script tag has ended implicitly.
So it is advisable to encode slash and all tag braces (/,<,>), not just the closing of a script-tag, in whatever reversible method you choose, as long as long as it would not confuse the HTML-Parser:
Best choice would be base64 (but you want more readable)
HTMLentities will do, although confusing humans :)
Doing your own escaping will work as well, just escape the individual characters rather than the </script fragment
In conclusion, yes, it's probably best with a few changes, but please note that you will be one step away from "safe" already, by trying something like this in the first place, instead of loading the JSON via XHR or at least using a rigorous string encoding like base64.
P.S.: If you can learn from other people's code encoding the strings that's nice, but you should not resort to "libraries" or other people's functions if they don't do exactly what you need.
So rather write and thoroughly test your own (de/en)coder and know that this pitfall has been sealed.
I am using AJAX to handle a form submission. The AJAX request returns a javascript script with text string arguments. I run into a problem when I try to add the AJAX returned script to the existing page.
Here are the different things I've tried to accomplish this already:
newAjaxBlock.appendChild(document.createTextNode(ajaxRequest.responseText));
newAjaxBlock.innerHTML = ajaxRequest.responseText;
newAjaxBlock.textContent = ajaxRequest.responseText;
The problem is that if I use .innerHTML to insert the returned script, it converts the escaped characters in the argument text string to their HTML equivalent and the script will throw errors because of single quotes and other characters in the string.
I expected .innerHTML to take the text and write it exactly as PHP provides it without unexpected conversions from escaped characters to their HTML equivalents.
For example I would generate a script in PHP and run it through htmlspecialchars() and make a text string exactly as follows:
<script type='text/javascript' id='layerScript'>Lib.alertFunction(arg1, $arg2, '<p>You changed THING from "value1" to "newValue".</p>');</script>
But instead .innerHTML converts it to this:
<script type='text/javascript' id='layerScript'>Lib.alertFunction(arg1, $arg2, '<p>You changed THING from "value1" to "newValue".</p>');</script>
and as you can see, the script won't work with single quotes and other characters messing up the argument list.
In contrast, when I tried using the createTextNode or .textContent options it creates a text node that ignores the HTML tags and shows it ALL as text instead of interpreting the HTML. This is not a surprise to me but leaves me with no option that actually just puts the HTML code in as it's written without converting the escaped characters.
All of the code works exactly as I expect and need it to except when the script argument contains single quotes or lt and gt symbols so I know I have narrowed the problem down to this single issue. I don't want jquery suggestions and I know I could code for an extra few days to make a function that does what I need but I want to know if there's something that does what .innerHTML does without converting escaped characters before I waste that time.
This exact question was already asked and was answered with "use .textContent" which as I mentioned doesn't work to insert formatted HTML with AJAX.
In the following scenario:
var evil_string = "...";
$('#mytextarea').val(evil_string);
Do I have to escape an untrusted string before using it as the value of a textarea element?
I understand that I will have to handle the string with care if I want to do anything with it later on, but is the act of putting the string in a textarea without escaping inherently dangerous?
I have done some basic testing and the usual special characters &'"< seem to be successfully added to the textarea without interpretation.
No, you don't need to do that. When you assign directly to property of DOM element (which jQuery's .val does under the hood), the data is interpreted verbatim. You only need to quote text with methods that explicitly treat input as HTML - i.e. outer/innerHTML and like.
Putting unescaped strings as values of textboxes or textareas is fine. You only need to worry about it when you are putting strings in your HTML that could potentially be interpreted as other HTML. Generally speaking, this means you should escape the strings when the text could be a child of some HTML DOM Element. This could be done on the server (as lolka_bolka suggested), or on the client before adding the potentially dangerous string to the DOM.
I have the below code in my JSP. UI displays every character correctly other than "&".
<c:out value="<script>var escapedData=unescape('${column}');
$('div').html(escapedData);</script>" escapeXml="false" /> </div>
E.g. 1) working case
input = ni!er#
Value in my escapedData variable is ni%21er%40. Now when I put it in my div using
$('div').html(escapedData); then o/p on html is as expected
E.g. 2) Issue case
input = nice&
Value in my escapedData variable is nice%26. Now when I put it in my div using
$('div').html(escapedData); then also it displays below
$('#test20').html('nice%26');
However, when output is displayed in JSP, it just prints "nice". It truncates everything after &.
Any suggestions?
It looks like you have some misunderstandings what unescape(val)/escape(val) do and where you need them. And what you need to take attention of when you use .html().
HTML and URI have certain character that have special meanings. The most important ones are:
HTML: <, >, &
URI: /,?,%,&
If you want to use one of those characters in HTML or URI you need to escape them.
The escaping for URI and for HTML are different.
The functions unescape/escape (deprecated) and decodeURI/endcodeURI are for URI. But was you want is to escape your data into the HTML format.
There is no build-in function in_JS_ that does this but you could e.g. use the code of the answer to this question Can I escape html special chars in javascript?.
But as it seems that you use jQuery you could think of just using .text instead of .html as this will do the escaping for you.
An additional note:
I'm pretty sure that the var escapedData=unescape('${column}'); does not do anything. I assume that ${column} already is ni!er#/nice&.
So please check your source code. If var escapedData=unescape('${column}'); will look like var escapedData=unescape('ni!er#'); then you should remove the unescape otherwise you would not get the expected result if the ${column} contains something like e.g. %23.