XSS prevention and .innerHTML - javascript

When I allow users to insert data as an argument to the JS innerHTML function like this:
element.innerHTML = “User provided variable”;
I understood that in order to prevent XSS, I have to HTML encode, and then JS encode the user input because the user could insert something like this:
<img src=a onerror='alert();'>
Only HTML or only JS encoding would not help because the .innerHTML method as I understood decodes the input before inserting it into the page. With HTML+JS encoding, I noticed that the .innerHTML decodes only the JS, but the HTML encoding remains.
But I was able to achieve the same by double encoding into HTML.
My question is: Could somebody provide an example of why I should HTML encode and then JS encode, and not double encode in HTML when using the .innerHTML method?

Could somebody provide an example of why I should HTML encode and then
JS encode, and not double encode in HTML when using the .innerHTML
method?
Sure.
Assuming the "user provided data" is populated in your JavaScript by the server, then you will have to JS encode to get it there.
This following is pseudocode on the server-side end, but in JavaScript on the front end:
var userProdividedData = "<%=serverVariableSetByUser %>";
element.innerHTML = userProdividedData;
Like ASP.NET <%= %> outputs the server side variable without encoding. If the user is "good" and supplies the value foo then this results in the following JavaScript being rendered:
var userProdividedData = "foo";
element.innerHTML = userProdividedData;
So far no problems.
Now say a malicious user supplies the value "; alert("xss attack!");//. This would be rendered as:
var userProdividedData = ""; alert("xss attack!");//";
element.innerHTML = userProdividedData;
which would result in an XSS exploit where the code is actually executed in the first line of the above.
To prevent this, as you say you JS encode. The OWASP XSS prevention cheat sheet rule #3 says:
Except for alphanumeric characters, escape all characters less than
256 with the \xHH format to prevent switching out of the data value
into the script context or into another attribute.
So to secure against this your code would be
var userProdividedData = "<%=JsEncode(serverVariableSetByUser) %>";
element.innerHTML = userProdividedData;
where JsEncode encodes as per the OWASP recommendation.
This would prevent the above attack as it would now render as follows:
var userProdividedData = "\x22\x3b\x20alert\x28\x22xss\x20attack\x21\x22\x29\x3b\x2f\x2f";
element.innerHTML = userProdividedData;
Now you have secured your JavaScript variable assignment against XSS.
However, what if a malicious user supplied <img src="xx" onerror="alert('xss attack')" /> as the value? This would be fine for the variable assignment part as it would simply get converted into the hex entity equivalent like above.
However the line
element.innerHTML = userProdividedData;
would cause alert('xss attack') to be executed when the browser renders the inner HTML. This would be like a DOM Based XSS attack as it is using rendered JavaScript rather than HTML, however, as it passes though the server it is still classed as reflected or stored XSS depending on where the value is initially set.
This is why you would need to HTML encode too. This can be done via a function such as:
function escapeHTML (unsafe_str) {
return unsafe_str
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/\"/g, '"')
.replace(/\'/g, ''')
.replace(/\//g, '/')
}
making your code
element.innerHTML = escapeHTML(userProdividedData);
or could be done via JQuery's text() function.
Update regarding question in comments
I just have one more question: You mentioned that we must JS encode
because an attacker could enter "; alert("xss attack!");//. But if we
would use HTML encoding instead of JS encoding, wouldn't that also
HTML encode the " sign and make this attack impossible because we
would have: var userProdividedData =""; alert("xss attack!");//";
I'm taking your question to mean the following: Rather than JS encoding followed by HTML encoding, why don't we don't just HTML encode in the first place, and leave it at that?
Well because they could encode an attack such as <img src="xx" onerror="alert('xss attack')" /> all encoded using the \xHH format to insert their payload - this would achieve the desired HTML sequence of the attack without using any of the characters that HTML encoding would affect.
There are some other attacks too: If the attacker entered \ then they could force the browser to miss the closing quote (as \ is the escape character in JavaScript).
This would render as:
var userProdividedData = "\";
which would trigger a JavaScript error because it is not a properly terminated statement. This could cause a Denial of Service to the application if it is rendered in a prominent place.
Additionally say there were two pieces of user controlled data:
var userProdividedData = "<%=serverVariableSetByUser1 %>" + ' - ' + "<%=serverVariableSetByUser2 %>";
the user could then enter \ in the first and ;alert('xss');// in the second. This would change the string concatenation into one big assignment, followed by an XSS attack:
var userProdividedData = "\" + ' - ' + ";alert('xss');//";
Because of edge cases like these it is recommended to follow the OWASP guidelines as they are as close to bulletproof as you can get. You might think that adding \ to the list of HTML encoded values solves this, however there are other reasons to use JS followed by HTML when rendering content in this manner because this method also works for data in attribute values:
<a href="javascript:void(0)" onclick="myFunction('<%=JsEncode(serverVariableSetByUser) %>'); return false">
Despite whether it is single or double quoted:
<a href='javascript:void(0)' onclick='myFunction("<%=JsEncode(serverVariableSetByUser) %>"); return false'>
Or even unquoted:
<a href=javascript:void(0) onclick=myFunction("<%=JsEncode(serverVariableSetByUser) %>");return false;>
If you HTML encoded like mentioned in your comment an entity value:
onclick='var userProdividedData ="";"' (shortened version)
the code is actually run via the browser's HTML parser first, so userProdividedData would be
";;
instead of
";
so when you add it to the innerHTML call you would have XSS again. Note that <script> blocks are not processed via the browser's HTML parser, except for the closing </script> tag, but that's another story.
It is always wise to encode as late as possible such as shown above. Then if you need to output the value in anything other than a JavaScript context (e.g. an actual alert box does not render HTML, then it will still display correctly).
That is, with the above I can call
alert(serverVariableSetByUser);
just as easily as setting HTML
element.innerHTML = escapeHTML(userProdividedData);
In both cases it will be displayed correctly without certain characters from disrupting output or causing undesirable code execution.

A simple way to make sure the contents of your element is properly encoded (and will not be parsed as HTML) is to use textContent instead of innerHTML:
element.textContent = "User provided variable with <img src=a>";
Another option is to use innerHTML only after you have encoded (preferably on the server if you get the chance) the values you intend to use.

I have faced this issue in my ASP.NET Webforms application. The fix to this is relatively simple.
Install HtmlSanitizationLibrary from NuGet Package Manager and refer this in your application. At the code behind, please use the sanitizer class in the following way.
For example, if the current code looks something like this,
YourHtmlElement.InnerHtml = "Your HTML content" ;
Then, replace this with the following:
string unsafeHtml = "Your HTML content";
YourHtmlElement.InnerHtml = Sanitizer.GetSafeHtml(unsafeHtml);
This fix will remove the Veracode vulnerability and make sure that the string gets rendered as HTML. Encoding the string at code behind will render it as 'un-encoded string' rather than RAW HTML as it is encoded before the render begins.

Related

Using Javascript to format an HTML string to display properly

The back end hands off a string that gets displayed like:
"Hello, <br><br> This notice is to inform you that you are in violation of <font color=red><b>HR POLICY XXXXX</b></font>."
The point of this page is to let you easily copy-paste pre-generated emails, but spewing out a bunch of html tags through the sentences is unwanted.
The string in question is inside of a with an id of "textBlock".
The back end is Java with an Oracle DB. I can edit the java to some extent and I can't touch the DB at all. I've used the console to play around with the string and editing it in any way seems to make it display properly once I finish editing. The innerText includes tags like in my summary, the innerHTML displays the tags like <br>.
So far I've attempted to give the an onload attribute that calls a function named formatText(); that does:
temp var = document.getElementById("textBlock").innerText;
document.getElementById("textBlock").innerText = var;
as well as the above function with innerHTML instead of innerText. I've also tried using document.write(); but that clears the rest of the page.Finally I've added some random characters in front of the string and tried to use the replace("!##","") function to replace those in an effort to mimic the "editing it in any way seems to make it display properly" that I noticed.
java
out.println("<td align=left id=textBlock onload=formatText();> !##" + strTemp + "</td>" );
Expected:
Hello,
This notice is to inform you that you are in violation of HR POLICY XXXXX.
Actual:
Hello, <br><br> This notice is to inform you that you are in violation of <font color=red><b>HR POLICY XXXXX</b></font>.
What you want, if I understood correctly, is some stripping html tags function. You can use regex
var str = "Hello, <br><br> This notice is to inform you that you are in violation of <font color=red><b>HR POLICY XXXXX</b></font>."
console.log(str)
var str2 = str.replace(/<[^>]*>?/gm, '')
console.log(str2)
If you want the html element to render your html, you need to use the DOM property innerHtml
var str = "Hello, <br><br> This notice is to inform you that you are in violation of <font color=red><b>HR POLICY XXXXX</b></font>."
document.getElementById('myDiv').innerHTML = str
<div id="myDiv">Hi</div>
(resolved in comments, answer added for completeness)
When HTML tags are visible in the browser, it's usually encoded with html-entities, preventing it getting parsed as HTML. In this case a post-processing script was replacing the < and > characters to their entity counterparts < and >.
Disabling these replacements resolved the issue.

escaping/encoding characters ready for use in an attribute

Context:
I want to pass a title field into an Angular attribute. The title field is sometimes crazy with the characters people put in.
I have the following Csharp property:
Model.StoryTitle = "!"£$%^&*()<>;><~andanythingelsethatisweird";
<my-directive-thing story-title="#Model.StoryTitle"></my-directive-thing>
I also have this on a page that pulls the same field out of an Ajax call and gets populated by Kendo (darn legacy frameworks):
<my-directive-thing story-title="#= storyTitle #"></my-directive-thing>
On my directive side, I have the following code:
var storyTitle = $attrs.storyTitle || "";
Issue:
Due to the issue of having weird characters sometimes, I decided to escape it on the javascript side:
<my-directive-thing story-title="#= escape(storyTitle) #"></my-directive-thing>
The job was then easy as I put an unescape in the directive:
var storyTitle = unescape($attrs.storyTitle) || "";
Then everything works fine.
However, I don't know an equivalent for the Csharp.
Question:
Is there a trick I'm missing on the JavaScript + Csharp way of making sure ugly characters don't break attributes?
Escape those characters or transform those characters to HTML enteties. You should not do that on your client side. Your backend should deliver nice encoded/decoded data.
Model.StoryTitle = HttpUtility.HtmlDecode("!"£$%^&*()<>;><~andanythingelsethatisweird");
> HttpUtility.HtmlDecode() documentation

ejs won't print a new line in a variable

I have a web app in Node.js/MySQL where users can upload their stories. They write in an HTML textarea tag. Now I'm trying to get the uploaded from the database using ejs into a script tag so I can do further 'processes'
<script>
var text = "<%=story.Content%>",
result = anchorme.js(text);
document.getElementById('story-content').innerHTML = twemoji.parse(result);
</script>
Problem is if the user hit enter to start on a new line while writing. It'll give me an error here in the text variable and nothing will be printed so how do I fix this?
If you view source on the page so that you can see how the browser receives it, you'll see something like this - note the line feeds:
var text = "I am a story over multiple lines
and that's what javascript gets confused about
because it can't figure out where the closing quotes are.
Let's not even go into what could happen when someone uses quotes!"
So you really just need a way to pass the story content to javascript without it breaking the page. Instead of just pushing out the string like this...
var text = "<%=story.Content%>"
...you can pass stuff to javascript in JSON format as that allows and escapes newlines into \r\n and quotes into \" and so-on. Note the use of <%- rather than <%= here because you don't want it to pass escaped HTML:
var text = <%-JSON.stringify({ content: story.Content })%>.content;
That passes an object with a nicely escaped string in it to your inline script for it to carry on processing.

C# verbatim string insert to Acrobat Javascript

I have a syntax error and i can't solve it at the moment.
Task: C# app with Acrobat JS Invoke...
I pass this as a string command:
acrofields.ExecuteThisJavascript(#"this.getField(""TM"").value = """ + TM_Textbox.Text + #""";");
I use verbatim string to make my life easier in other situations (similar to this). So as you can see the textbox content has to be in "" as well. And this works fine! BUT: If i have a Path as content:
\\\Computername\Folder1\Folder2\\...
it won't work. I have tried many possibilities of the quoting.
Since it is JavaScript that will be executed, turn your internal quotes into single quotes:
acrofields.ExecuteThisJavascript(#"this.getField('TM').value = '" + TM_Textbox.Text + #"';");
or, better yet:
string execStr = string.Format("this.getField('TM').value = '{0}';", TM_TextBox.Text);
acrofields.ExecuteThisJavascript(execStr);
Of course, you also probably want to sanitize the textbox input to prevent malicious script attacks.

strange characters (amp;) added to moss service output

I have moss service which output the url of image.
Lets say the output url has '&' character , the service appending amp; next to &.
for ex: Directory.aspx?&z=BWxNK
Here amp; is additionally added. it is a moss sevice. so i don't have control on the sevice.
what i can do is decode the output. As i am using Ajax calls for calling moss sevice i am forced to decode the out put from javascript. i tried decodeURIComponent,decodeURI,unescape. nothing solved the problem.
Any help greatly appreciated. even server side function also helpful. i am using Aspl.net MVC3
Regards,
Kumar.
& is not URI encoded, it's HTML encoded.
For a server side solution, you could do this:
Server.HtmlDecode("&") // yields "&"
For a JavaScript solution, you could set the html to "&" and read out the text, to simulate HTML decoding. In jQuery, it could look like this:
$("<span/>").html("&").text(); // yields "&"
& is SGML/XML/HTML for &.
If the service is outputting an XML document, then make sure you are using an XML parser to parse it (and not regular expressions or something equally crazy).
Otherwise, you need decode the (presumably) HTML. In JavaScript, the easiest way to do that is:
var foo = document.createElement('div');
foo.innerHTML = myString;
var url = foo.firstChild.data;

Categories