strange characters (amp;) added to moss service output - javascript

I have moss service which output the url of image.
Lets say the output url has '&' character , the service appending amp; next to &.
for ex: Directory.aspx?&z=BWxNK
Here amp; is additionally added. it is a moss sevice. so i don't have control on the sevice.
what i can do is decode the output. As i am using Ajax calls for calling moss sevice i am forced to decode the out put from javascript. i tried decodeURIComponent,decodeURI,unescape. nothing solved the problem.
Any help greatly appreciated. even server side function also helpful. i am using Aspl.net MVC3
Regards,
Kumar.

& is not URI encoded, it's HTML encoded.
For a server side solution, you could do this:
Server.HtmlDecode("&") // yields "&"
For a JavaScript solution, you could set the html to "&" and read out the text, to simulate HTML decoding. In jQuery, it could look like this:
$("<span/>").html("&").text(); // yields "&"

& is SGML/XML/HTML for &.
If the service is outputting an XML document, then make sure you are using an XML parser to parse it (and not regular expressions or something equally crazy).
Otherwise, you need decode the (presumably) HTML. In JavaScript, the easiest way to do that is:
var foo = document.createElement('div');
foo.innerHTML = myString;
var url = foo.firstChild.data;

Related

python django json.dumps() and javascript cookies [duplicate]

I'm trying to encode an object in a Python script and set it as a cookie so I can read it with client-side JavaScript.
I've run into problems every way I've tried to do this. Generally, the cookie is formatted in a way that makes JSON.parse() break.
My current script:
cookie = Cookie.SimpleCookie()
data = {"name": "Janet", "if_nasty": "Ms. Jackson"}
cookie['test'] = json.dumps(data)
self.response.headers.add_header("Set-Cookie", cookie.output(header=''))
... which results in
test="{\"name\": \"janet\"\054 \"if_nasty\": \"Ms. Jackson\"}"
on the client.
I don't really want to introduce a hack-y solution to replace instances of commas when they appear. Any ideas how I can pass complex data structures (both by setting and reading cookies) with Python?
I also wanted to read a cookie (that had been set on the server) on the client. I worked around the issue by base64 encoding the JSON String, however there are a few small gotchas involved with this approach as well.
1: Base64 strings end with 0-2 equal signs, and these were being converted into the string \075. My approach is to revert those characters into equal characters on the client.
2: The base64 string is being enclosed in double quote characters in the cookie. I remove these on the client.
Server:
nav_json = json.dumps(nav_data)
nav_b64=base64.b64encode(nav_json)
self.response.set_cookie('nav_data', nav_b64)
Client:
var user_data_base64= $.cookie('nav_data');
// remove quotes from around the string
user_data_base64 = user_data_base64.replace(/"/g,"");
// replace \075 with =
user_data_base64 = user_data_base64.replace(/\\075/g,"=");
var user_data_encoded=$.base64.decode( user_data_base64 );
var user_data = $.parseJSON(user_data_encoded);
I am using 2 jquery plugins here:
https://github.com/carlo/jquery-base64
and
https://github.com/carhartl/jquery-cookie
Note: I consider this a hack: It would be better to re-implement the python code that encodes the cookie in javascript, however this also has the downside that you would need to notice and port and changes to that code.
I have now moved to a solution where I use a small html file to set the cookie on the client side and then redirect to the actual page requested. Here is a snippet from the JINJA2 template that I am using:
<script type="text/javascript">
var nav_data='{% autoescape false %}{{nav_data}}{% endautoescape %}';
$.cookie('nav_data', nav_data, { path: '/' });
window.location.replace("{{next}}")
</script>
Note 2: Cookies are not ideal for my use case and I will probably move on to Session or Local Storage to reduce network overhead (although my nav_data is quite small - a dozen characters or so.)
On the Python side:
json.dumps the string
escape spaces - just call .replace(' ', '%20')
Call urllib.parse.quote_plus() then write the string to the cookie
On the JavaScript side:
read the cookie
pass it through decodeURIComponent()
JSON.parse it
This seems to be the cleanest way I've found.
not sure a cookie is the best way of doing this? see the getting started guide for info rendering data to the client

What is the right way to safely and accurately insert user-provided URL data into an HTML5 document?

Given an arbitrary customer input in a web form for a URL, I want to generate a new HTML document containing that URL within an href. My question is how am I supposed to protect that URL within my HTML.
What should be rendered into the HTML for the following URLs that are entered by an unknown end user:
http://example.com/?file=some_19%affordable.txt
http://example.com/url?source=web&last="f o o"&bar=<
https://www.google.com/url?source=web&sqi=2&url=https%3A%2F%2Ftwitter.com%2F%3Flang%3Den&last=%22foo%22
If we assume that the URLs are already uri-encoded, which I think is reasonable if they are copying it from a URL bar, then simply passing it to attr() produces a valid URL and document that passes the Nu HTML checker at validator.w3.org/nu.
To see it in action, we set up a JS fiddle at https://jsfiddle.net/kamelkev/w8ygpcsz/2/ where replacing the URLs in there with the examples above can show what is happening.
For future reference, this consists of an HTML snippet
<a>My Link</a>
and this JS:
$(document).ready(function() {
$('a').attr('href', 'http://example.com/request.html?data=>');
$('a').attr('href2', 'http://example.com/request.html?data=<');
alert($('a').get(0).outerHTML);
});
So with URL 1, it is not possible to tell if it is URI encoded or not by looking at it mechanically. You can surmise based on your human knowledge that it is not, and is referring to a file named some_19%affordable.txt. When run through the fiddle, it produces
My Link
Which passes the HTML5 validator no problem. It likely is not what the user intended though.
The second URL is clearly not URI encoded. The question becomes what is the right thing to put into the HTML to prevent HTML parsing problems.
Running it thru the fiddle, Safari 10 produces this:
My Link
and pretty much every other browser produces this:
My Link
Neither of these passes the validator. Three complaints are possible: the literal double quote (from un-escaping HTML), the spaces, or the trailing < character (also from un-escaping HTML). It just shows you the first of these it finds. This is clearly not valid HTML.
Two ways to try to fix this are a) html-escape the URL before giving it to attr(). This however results in every & becoming & and the entities such as & and < become double-escaped by attr(), and the URL in the document is entirely inaccurate. It looks like this:
My Link
The other is to URI-encode it before passing to attr(), which does result in a proper validating URL which actually clicks to the intended destination. It looks like this:
My Link
Finally, for the third URL, which is properly URI encoded, the proper HTML that validates does come out.
My Link
and it does what the user would expect to happen when clicked.
Based on this, the algorithm should be:
if url is encoded then
pass as-is to attr()
else
pass encodeURI(url) to attr()
however, the "is encoded" test seems to be impossible to detect in the affirmative based on these two prior discussions (indeed, see example URL 1):
How to find out if string has already been URL encoded?
How to know if a URL is decoded/encoded?
If we bypass the attr() method and forcibly insert the HTML-escaped version of example URL 2 into the document structure, it would look like this:
My Link
Which seemingly looks like valid HTML, yet fails the HTML5 validator because it unescapes to have invalid URL characters. The browsers, however, don't seem to mind it. Unfortunately, if you do any other manipulation of the object, the browser will re-escape all the &'s anyway.
As you can see, this is all very confusing. This is the first time we're using the browser itself to generate the HTML, and we are not sure if we are getting it right. Previously, we did it server side using templates, and only did the HTML-escape filter.
What is the right way to safely and accurately insert user-provided
URL data into an HTML5 document (using JavaScript)?
If you can assume the URL is either encoded or not encoded, you may be able to get away with something along the lines of this. Try to decode the URL, treat an error as the URL not being encoded and you should be left with a decoded URL.
<script>
var inputurl = 'http://example.com/?file=some_19%affordable.txt';
var myurl;
try {
myurl = decodeURI(inputurl);
}
catch(error) {
myurl = inputurl;
}
console.log(myurl);
</script>

encodeURIComponent() adds too many characters

Either my encodeURICOmponent() in java script is adding to many characters or I don't understand exactly how it works.
I am using this line of code:
var encoded = encodeURIComponent(searchTerm);
When I look in the chrome inspect element after passing Abt 12 it shows the encoded variable added to the URL as this:
Abt%252012
I would think it should be this:
Abt%12
So when I pass it through PHP I get really odd results when actually conducting the search.
Form the comments, it looks like you are sending the value to server via jQuery ajax request, then it will take care of parameter encoding, so there is no need for you to encode it again.
$.get("website.php", { p: searchTerm, })

XSS prevention and .innerHTML

When I allow users to insert data as an argument to the JS innerHTML function like this:
element.innerHTML = “User provided variable”;
I understood that in order to prevent XSS, I have to HTML encode, and then JS encode the user input because the user could insert something like this:
<img src=a onerror='alert();'>
Only HTML or only JS encoding would not help because the .innerHTML method as I understood decodes the input before inserting it into the page. With HTML+JS encoding, I noticed that the .innerHTML decodes only the JS, but the HTML encoding remains.
But I was able to achieve the same by double encoding into HTML.
My question is: Could somebody provide an example of why I should HTML encode and then JS encode, and not double encode in HTML when using the .innerHTML method?
Could somebody provide an example of why I should HTML encode and then
JS encode, and not double encode in HTML when using the .innerHTML
method?
Sure.
Assuming the "user provided data" is populated in your JavaScript by the server, then you will have to JS encode to get it there.
This following is pseudocode on the server-side end, but in JavaScript on the front end:
var userProdividedData = "<%=serverVariableSetByUser %>";
element.innerHTML = userProdividedData;
Like ASP.NET <%= %> outputs the server side variable without encoding. If the user is "good" and supplies the value foo then this results in the following JavaScript being rendered:
var userProdividedData = "foo";
element.innerHTML = userProdividedData;
So far no problems.
Now say a malicious user supplies the value "; alert("xss attack!");//. This would be rendered as:
var userProdividedData = ""; alert("xss attack!");//";
element.innerHTML = userProdividedData;
which would result in an XSS exploit where the code is actually executed in the first line of the above.
To prevent this, as you say you JS encode. The OWASP XSS prevention cheat sheet rule #3 says:
Except for alphanumeric characters, escape all characters less than
256 with the \xHH format to prevent switching out of the data value
into the script context or into another attribute.
So to secure against this your code would be
var userProdividedData = "<%=JsEncode(serverVariableSetByUser) %>";
element.innerHTML = userProdividedData;
where JsEncode encodes as per the OWASP recommendation.
This would prevent the above attack as it would now render as follows:
var userProdividedData = "\x22\x3b\x20alert\x28\x22xss\x20attack\x21\x22\x29\x3b\x2f\x2f";
element.innerHTML = userProdividedData;
Now you have secured your JavaScript variable assignment against XSS.
However, what if a malicious user supplied <img src="xx" onerror="alert('xss attack')" /> as the value? This would be fine for the variable assignment part as it would simply get converted into the hex entity equivalent like above.
However the line
element.innerHTML = userProdividedData;
would cause alert('xss attack') to be executed when the browser renders the inner HTML. This would be like a DOM Based XSS attack as it is using rendered JavaScript rather than HTML, however, as it passes though the server it is still classed as reflected or stored XSS depending on where the value is initially set.
This is why you would need to HTML encode too. This can be done via a function such as:
function escapeHTML (unsafe_str) {
return unsafe_str
.replace(/&/g, '&')
.replace(/</g, '<')
.replace(/>/g, '>')
.replace(/\"/g, '"')
.replace(/\'/g, ''')
.replace(/\//g, '/')
}
making your code
element.innerHTML = escapeHTML(userProdividedData);
or could be done via JQuery's text() function.
Update regarding question in comments
I just have one more question: You mentioned that we must JS encode
because an attacker could enter "; alert("xss attack!");//. But if we
would use HTML encoding instead of JS encoding, wouldn't that also
HTML encode the " sign and make this attack impossible because we
would have: var userProdividedData =""; alert("xss attack!");//";
I'm taking your question to mean the following: Rather than JS encoding followed by HTML encoding, why don't we don't just HTML encode in the first place, and leave it at that?
Well because they could encode an attack such as <img src="xx" onerror="alert('xss attack')" /> all encoded using the \xHH format to insert their payload - this would achieve the desired HTML sequence of the attack without using any of the characters that HTML encoding would affect.
There are some other attacks too: If the attacker entered \ then they could force the browser to miss the closing quote (as \ is the escape character in JavaScript).
This would render as:
var userProdividedData = "\";
which would trigger a JavaScript error because it is not a properly terminated statement. This could cause a Denial of Service to the application if it is rendered in a prominent place.
Additionally say there were two pieces of user controlled data:
var userProdividedData = "<%=serverVariableSetByUser1 %>" + ' - ' + "<%=serverVariableSetByUser2 %>";
the user could then enter \ in the first and ;alert('xss');// in the second. This would change the string concatenation into one big assignment, followed by an XSS attack:
var userProdividedData = "\" + ' - ' + ";alert('xss');//";
Because of edge cases like these it is recommended to follow the OWASP guidelines as they are as close to bulletproof as you can get. You might think that adding \ to the list of HTML encoded values solves this, however there are other reasons to use JS followed by HTML when rendering content in this manner because this method also works for data in attribute values:
<a href="javascript:void(0)" onclick="myFunction('<%=JsEncode(serverVariableSetByUser) %>'); return false">
Despite whether it is single or double quoted:
<a href='javascript:void(0)' onclick='myFunction("<%=JsEncode(serverVariableSetByUser) %>"); return false'>
Or even unquoted:
<a href=javascript:void(0) onclick=myFunction("<%=JsEncode(serverVariableSetByUser) %>");return false;>
If you HTML encoded like mentioned in your comment an entity value:
onclick='var userProdividedData ="";"' (shortened version)
the code is actually run via the browser's HTML parser first, so userProdividedData would be
";;
instead of
";
so when you add it to the innerHTML call you would have XSS again. Note that <script> blocks are not processed via the browser's HTML parser, except for the closing </script> tag, but that's another story.
It is always wise to encode as late as possible such as shown above. Then if you need to output the value in anything other than a JavaScript context (e.g. an actual alert box does not render HTML, then it will still display correctly).
That is, with the above I can call
alert(serverVariableSetByUser);
just as easily as setting HTML
element.innerHTML = escapeHTML(userProdividedData);
In both cases it will be displayed correctly without certain characters from disrupting output or causing undesirable code execution.
A simple way to make sure the contents of your element is properly encoded (and will not be parsed as HTML) is to use textContent instead of innerHTML:
element.textContent = "User provided variable with <img src=a>";
Another option is to use innerHTML only after you have encoded (preferably on the server if you get the chance) the values you intend to use.
I have faced this issue in my ASP.NET Webforms application. The fix to this is relatively simple.
Install HtmlSanitizationLibrary from NuGet Package Manager and refer this in your application. At the code behind, please use the sanitizer class in the following way.
For example, if the current code looks something like this,
YourHtmlElement.InnerHtml = "Your HTML content" ;
Then, replace this with the following:
string unsafeHtml = "Your HTML content";
YourHtmlElement.InnerHtml = Sanitizer.GetSafeHtml(unsafeHtml);
This fix will remove the Veracode vulnerability and make sure that the string gets rendered as HTML. Encoding the string at code behind will render it as 'un-encoded string' rather than RAW HTML as it is encoded before the render begins.

Making a URL W3C valid AND work in Ajax Request

I have a generic function that returns URLs. (It's a plugin function that returns URLs to resources [images, stylesheets] within a plugin).
I use GET parameters in those URLs.
If I want to use these URLs within a HTML page, to pass W3C validation, I need to mask ampersands as &
/plugin.php?plugin=xyz&resource=stylesheet&....
but, if I want to use the URL as the "url" parameter for a AJAX call, the ampersand is not interpreted correctly, screwing up my calls.
Can I do something get & work in AJAX calls?
I would very much like to avoid adding parameters to th URL generating function (intendedUse="ajax" or whatever) or manipulating the URL in Javascript, as this plugin model will be re-used many times (and possibly by many people) and I want it as simple as possible.
It seems to me that you're running into the problem of having one piece of your application cross multiple layers. In this case it's the plugin.
A URL as specified by RFC 1738 states that a URL should use a & token to separate key/value pairs from one another. However ampersand is a reserved token in HTML and therefore should be escaped into &. Since escaping the ampersands is an artifact of HTML, your plugin should probably not be escaping them directly. Instead you should have a function or something that escapes a canonical URL so that it can be embedded in HTML markup.
The only place that this is likely to actually happen is if you are:
Using XHTML
Serving it as text/html
Using inline <script>
This is not a happy combination, and the solution is in the spec.
Use external scripts if your script
uses < or & or ]]> or --.
The XHTML media types note includes the same advice, but also provides a workaround if you choose to ignore it.
Try returning JSON instead of just a string, that way your Javascript can read the URL value as an object, and you shouldn't have that issue. Other than that, try simply HTML decoding the string, using something like:
function escapeHTML (str)
{
var div = document.createElement('div');
var text = document.createTextNode(str);
div.appendChild(text);
return div.innerHTML;
};
Obviously you'll want to make sure you remove any reference to DOM elements you might create (which I've not done here to simplify the example).
I use this technique in the AJAX sites I create at my work and have used it many times to solve this problem.
When you have markup of the form:
<a href="?a=1&b=2">
Then the value of the href attribute is ?a=1&b=2. The & is only an escape sequence in HTML/XML and doesn't affect the value of the attribute. This is similar to:
<a href="<>">
Where the value of the attribute is <>.
If, instead, you have code of the form:
<script>
var s = "?a=1&b=2";
</script>
Then you can use a JavaScript function:
<script>
var amp = String.fromCharCode(38);
var s = "?a=1"+amp+"b=2";
</script>
This allows code that would otherwise only be valid HTML or only valid XHTML to be valid in both. (See Dorwald's comments for more info.)

Categories