Escaping HTML in Ruby & adding to DOM with JS - javascript

I'm trying to dynamically add contents of a div using JS. Back end is Ruby on Rails. I am having a problem. Here's what is included in the view file:
var product_sidebar_inner = "<%= CGI.escapeHTML(render(...some partial...)).gsub(/\r/," ").gsub(/\n/," ") %>";
document.getElementById("left_sidebar_wrapper").innerHTML = unescape(product_sidebar_inner);
The above inserts html as text to div#left_sidebar_wrapper. Spent some time on this but still can't make this work. Any idea what am I am doing wrong?

Based on your comment to macarthy, I think you want CGI.escape (or CGI.unescape), that's what you use for URL encoding. You can also use URI.escape (or URI.unescape) but you'll get tired of having to pass the unsafe regex all the time to get it to do what you want.
Also, on the JavaScript side, you should be using encodeURI or encodeURIComponent as escape is deprecated because it has problems with non-ASCII characters.

THink you need to use raw
unescape(raw(product_sidebar_inner));

Related

Stripping filename from HTML Encoded UNIX Path

I am writing a NodeJS application using Express and Google Datastore. I am trying to get the filename from a UNIX path. The path is stored in an HTML encoded format in the database.
Here's the path un-encoded:
/toplevel/example/text123.txt
Here's how the path is stored in the database HTML encoded format:
/toplevel/example/test123.txt
Since the path is HTML encoded, this line is not working.
let filename_only = requested_filepath_unescaped.split('/').pop().toString();
I also tried splitting by the encoded characters but that does not work either (perhaps because split doesn't work with multiple characters?)
let filename_only = requested_filepath_unescaped.split('&#x2F').pop().toString();
What is the best way to either split the string as-is, or de-code the HTML back into an unencoded string?
Well, split works with multiple characters, so I don't know what goes wrong when you tried it.
However if you can use jQuery, you can also decode the html like this:
var htmlDecoded = $('<div />').html(htmlEncoded).text()
After that you can split on '/'.
(The code I gave creates a div tag in memory (it is not added to the DOM, the web page), after that it sets the html of it, which automatically decodes the html entities.
EDIT:
As I am unsure what the problem of the OP is and I can't comment due to low reputation, I give some more suggestions here.
Maybe the variable you call split on is not really a string object. Try converting to string first:
var filename = filepath.toString().split('/');
Other option is to use regex, but I don't know what exactly solves that, but might be worth trying.
var filename = filepath.toString().split(/&#2F;/);
EDIT2: Tested and working in Chrome v62 and Node v6.11.4.

Escaping string using JavaScript before populating form input

I have this code where I grab an attribute value and load it into a form, the headline line can look something like:
Welcome to America's best valued whatever
But when using this escape function, the string is cut off at the apostrophe,
var headline = escape($(this).attr("data-headline"));
//populate the textbox
$(e.currentTarget).find('input[name="headline"]').val(headline);
I've also tried using the solutions here: HtmlSpecialChars equivalent in Javascript? with no luck.
How can I populate my input and keep apostrophe's/quotes?
Just use
$(this).find('input[name="headline"]').val(this.dataset.headline);
No need for any escaping.
However, notice that escape does not cut off apostrophes, it replaces them with %27. If your current code does not work with apostrophes in the headline, make sure that the markup containing the data-headline attribute is properly escaped by whatever tool is creating it.
var headline = $(this).attr("data-headline").replace(/'/g, '%27');
//populate the textbox
$(e.currentTarget).find('input[name="headline"]').val(unescape(headline));
If browser compatibility is important, dataset is only available IE11+ https://developer.mozilla.org/en-US/docs/Web/API/HTMLElement/dataset#Browser_compatibility

How to insert arbitrary JSON in HTML's script tag

I would like to store a JSON's contents in a HTML document's source, inside a script tag.
The content of that JSON does depend on user submitted input, thus great care is needed to sanitise that string for XSS.
I've read two concept here on SO.
1. Replace all occurrences of the </script tag into <\/script, or replace all </ into <\/ server side.
Code wise it looks like the following (using Python and jinja2 for the example):
// view
data = {
'test': 'asdas</script><b>as\'da</b><b>as"da</b>',
}
context_dict = {
'data_json': json.dumps(data, ensure_ascii=False).replace('</script', r'<\/script'),
}
// template
<script>
var data_json = {{ data_json | safe }};
</script>
// js
access it simply as window.data_json object
2. Encode the data as a HTML entity encoded JSON string, and unescape + parse it in client side. Unescape is from this answer: https://stackoverflow.com/a/34064434/518169
// view
context_dict = {
'data_json': json.dumps(data, ensure_ascii=False),
}
// template
<script>
var data_json = '{{ data_json }}'; // encoded into HTML entities, like < > &
</script>
// js
function htmlDecode(input) {
var doc = new DOMParser().parseFromString(input, "text/html");
return doc.documentElement.textContent;
}
var decoded = htmlDecode(window.data_json);
var data_json = JSON.parse(decoded);
This method doesn't work because \" in a script source becames " in a JS variable. Also, it creates a much bigger HTML document and also is not really human readable, so I'd go with the first one if it doesn't mean a huge security risk.
Is there any security risk in using the first version? Is it enough to sanitise a JSON encoded string with .replace('</script', r'<\/script')?
Reference on SO:
Best way to store JSON in an HTML attribute?
Why split the <script> tag when writing it with document.write()?
Script tag in JavaScript string
Sanitize <script> element contents
Escape </ in script tag contents
Some great external resources about this issue:
Flask's tojson filter's implementation source
Rail's json_escape method's help and source
A 5 year long discussion in Django ticket and proposed code
Here's how I dealt with the relatively minor part of this issue, the encoding problem with storing JSON in a script element. The short answer is you have to escape either < or / as together they terminate the script element -- even inside a JSON string literal. You can't HTML-encode entities for a script element. You could JavaScript-backslash-escape the slash. I preferred to JavaScript-hex-escape the less-than angle-bracket as \u003C.
.replace('<', r'\u003C')
I ran into this problem trying to pass the json from oembed results. Some of them contain script close tags (without mentioning Twitter by name).
json_for_script = json.dumps(data).replace('<', r'\u003C');
This turns data = {'test': 'foo </script> bar'}; into
'{"test": "foo \\u003C/script> bar"}'
which is valid JSON that won't terminate a script element.
I got the idea from this little gem inside the Jinja template engine. It's what's run when you use the {{data|tojson}} filter.
def htmlsafe_json_dumps(obj, dumper=None, **kwargs):
"""Works exactly like :func:`dumps` but is safe for use in ``<script>``
tags. It accepts the same arguments and returns a JSON string. Note that
this is available in templates through the ``|tojson`` filter which will
also mark the result as safe. Due to how this function escapes certain
characters this is safe even if used outside of ``<script>`` tags.
The following characters are escaped in strings:
- ``<``
- ``>``
- ``&``
- ``'``
This makes it safe to embed such strings in any place in HTML with the
notable exception of double quoted attributes. In that case single
quote your attributes or HTML escape it in addition.
"""
if dumper is None:
dumper = json.dumps
rv = dumper(obj, **kwargs) \
.replace(u'<', u'\\u003c') \
.replace(u'>', u'\\u003e') \
.replace(u'&', u'\\u0026') \
.replace(u"'", u'\\u0027')
return Markup(rv)
(You could use \x3C instead of \u003C and that would work in a script element because it's valid JavaScript. But might as well stick to valid JSON.)
First of all, your paranoia is well founded.
an HTML-parser could be tricked by a closing script tag (better assume by any closing tag)
a JS-parser could be tricked by backslashes and quotes (with a really bad encoder)
Yes, it would be much "safer" to encode all characters that could confuse the different parsers involved. Keeping it human-readable might be contradicting your security paradigm.
Note: The result of JSON String encoding should be canoncical and OFC, not broken, as in parsable. JSON is a subset of JS and thus be JS parsable without any risk. So all you have to do is make sure the HTML-Parser instance that extracts the JS-code is not tricked by your user data.
So the real pitfall is the nesting of both parsers. Actually, I would urge you to put something like that into a separate request. That way you would avoid that scenario completely.
Assuming all possible styles and error-corrections that could happen in such a parser it might be that other tags (open or close) might achieve a similar feat.
As in: suggesting to the parser that the script tag has ended implicitly.
So it is advisable to encode slash and all tag braces (/,<,>), not just the closing of a script-tag, in whatever reversible method you choose, as long as long as it would not confuse the HTML-Parser:
Best choice would be base64 (but you want more readable)
HTMLentities will do, although confusing humans :)
Doing your own escaping will work as well, just escape the individual characters rather than the </script fragment
In conclusion, yes, it's probably best with a few changes, but please note that you will be one step away from "safe" already, by trying something like this in the first place, instead of loading the JSON via XHR or at least using a rigorous string encoding like base64.
P.S.: If you can learn from other people's code encoding the strings that's nice, but you should not resort to "libraries" or other people's functions if they don't do exactly what you need.
So rather write and thoroughly test your own (de/en)coder and know that this pitfall has been sealed.

display double escaped special characters - change &quot; to " using Javascript or Jquery

I have a page developed by someone who is retiring. The page is programmed in ASP, and pulls info from a database we maintain. All special characters are double escaped, (i.e. &quot; and &amp;) so the browser is rendering as &AMP;quot; and &.
I would like to switch the behavior to properly display " or & respectively.
The data is loaded into a table, and is wrapped in only in td /td
The simplest method for me would be javascript/jquery, but I cannot get it to work.
I have tried a few methods that i have found here on SO such as:
$('td').innerText.each(function() {
var text = $(this).text();
$(this).text(text.replace('&amp;', '&'));
$(this).text(text.replace('&quot;', '"'));
});
I haven't had luck so far. Any ideas?
Have you tried unescape() method in Javascript?

Using javascript regex containing '#' in razor

I am performing email address validation with javascript in a razor view page.
The regex I am going to use is similar to the one proposed at Validate email address in JavaScript?
However, because the regex contains an '#' character, I am getting parser error when try to run the web app.
my regex looks like
/^...otherpart #* ... other part$/
I tried to add an '#' character to make the in the origin regex ... ##* ..., this eliminated the compilation error but seems to make the regex stops working. (we have used it in another web app that does not use razor engine, so I know it works).
Is there any other way to escape the '#' character?
You can add another # in front of it to escape ##, try leaving out the quantifer *. If this doesn't seem to work then add <text></text> around the function, it tells Razor to not parse the contents. Alternatively you can put Javascript in a separate file to accomplish your needs.
If for some reason you have multiple ## in your string, place code blocks ahead #:##
Put the following inside the regEx instead of #:
#('#')
The above will be rendered to a single #.
Your example will become:
/^...otherpart #('#')* ... other part$/
Simply just use double ##.
It works for me and I think that is the only way.
It's better to use #('#') instead of ## (not working for me).
Example
<input class="form-control" pattern="^[a-zA-Z#('#')]{5,20}$"/>

Categories