Bypass server-side URL encoding -[Firing range automation]

Bypass server-side URL encoding -[Firing range automation] - javascript

I'm trying to make an automated XSS detection toolkit for myself and I'm using google's firing range for testing it. For this particular case of escaped XSS in HTML context https://public-firing-range.appspot.com/escape/serverside/encodeUrl/attribute_name?q=a
Whatever value q takes gets encoded server-side and is then reflected back into the HTML. How can I solve this? I tried double encoding and other kinds of stuff like Homoglyphs but nothing seems to be working.

Related

How to protect against Encoded URL XSS Attack

I got following 2 questions as:
1) How to protect from this kind of XSS Attacks?
https://www.example.com/index.php?&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041
If suppose for some reason, the query parameter is embedded in an image load event then it would be like this
<img src=x onload="&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041">
//And browser treats as
<img src=x onload="javascript:alert('XSS')">
I am already using PHP's htmlspecialchars() and Filtar_var() with URL Sanitization, but this kind of encoded XSS will easily get pass through these functions
How can we defend such Encoded XSS or neutralize any such attack?
2) Is it necessary for an XSS attack to get embedded in HTML Page, Javascript or CSS etc in order to get triggered? or there can be a way where XSS does not need to be embedded?

htmlspecialchars is a perfectly good defence against XSS when you are inserting user input into an HTML document.
It stops any HTML syntax in the user input from breaking out of where you intend for it to go and being treated as JavaScript.
Your problem has nothing to do with the fact the attack is encoded. The problem is that are are putting user input somewhere that JavaScript is expected (and onload attribute) so it is already being treated as JavaScript.
json_encode is the usual solution here (and then htmlspecialchars because the JavaScript is inside an HTML attribute).
However, that only works when you are taking user input and putting it into a script to be used as data. Here it seems that you are taking user input and just treating the whole thing as a JavaScript function.
If you do that then you are going to be vulnerable to XSS. You can mitigate it to some degree by implementing defenses against CSRF, but you almost certainly shouldn't be doing this in the first place.

2016 Sending html across to a server

I'm grabbing web page data (think similar issue to instapaper) and sending it back to a LOCAL server, not across the web. Both will be on the same machine, but I'd still like to make it a bit more secure.
I'd currently grabbing html from webpages and attempting to encode it into a URI. Here's the problem. The built in function encodeURI doesn't work properly because some content inside a HTML page is already encoded, and some of it isn't! Like the HTML itself. Let me give an example from a (decades old website that still exists).
This code:
<title>You've Got Mail</title> needs to be encoded to: "%3Ctitle%3EYou've%20Got%20Mail%3C/title%3E"
But some of it comes in pre-encoded (before encodeURI is called):
<noframes>
<body bgcolor="#FFFFFF" background="../img/1bgbottom.gif" text="#000000" link="#2100c5" vlink="#2100c5" alink="#bd0031">
Is there any simple way to take an HTML page (in all it's ugliness) and encode it in a URI reliably?

You're looking for encodeURIComponent(). There's never really a good reason (that I know of anyway) for encodeURI().
Once you use encodeURIComponent(), it can be decoded as-is, getting you exactly what was sent in.
On another note, I'm not sure where you're doing this encoding but if you plan to use this in a query string, beware that there are finite query string limits, usually 4k or 64k. (At least sometimes you can configure the limit server-side.)

Javascript implementation of anti-XSS escaping functions

The OWASP XSS (Cross Site Scripting) Prevention Cheat Sheet lists rules to prevent XSS attacks by escaping data appropriately, and it contains links to reference implementations of these escaping methods in the Java language (HTML Escape, Attribute Escape, Javsacript Escape, CSS Escape, URL Escape).
Is there an implementation anywhere of these in Javascript, or do I have to 'roll my own'?
UPDATE: I mean Javascript running in the browser. For example, for escaping text rendered with the jQuery html() method (though of course text() is safer), or escaping data rendered using a template engine such as EJS.
UPDATE2: ESAPI JavaScript seems to be what I was looking for, though it's still only "Alpha Quality"

Since you tend to work with the DOM in (client-side) JavaScript, there is no need for HTML and HTML attribute escaping. For example, given untrusted input input,
var el = document.createElement('div');
el.setAttribute('title', input);
el.appendChild(document.createTextNode(input));
is perfectly safe, since you are never constructing (serialized) HTML in the first place.
If you are writing custom JavaScript or CSS from JavaScript code, you are doing something wrong (including using document.write or some data URI script src abominations), so there is no escaping provided for either. You can simply write your code or styles beforehand and then call the appropriate functions or set the appropriate classes.
encodeURI and encodeURIComponent can be used to encode URIs or their components.

You can use js-xss library. For me it worked against test cases I've been using for injecting scripts into HTML.

Secure database entry against XSS

I'm creating an app that retrieves the text within a tweet, store it in the database and then display it on the browser.
The problem is that I'm thinking if the text has PHP tags or HTML tags it might be a security breach there.
I looked into strip_tags() but saw some bad reviews. I also saw suggestions to HTML Purifier but it was last updated years ago.
So my question is how can I be 100% secure that if the tweet text is "<script> something_bad() </script>" it won't matter?
To state the obvious the tweets are sent to the database from users so I don't want to check all individually before displaying them.

You are NEVER 100% secure, however you should take a look at this. If you use ENT_QUOTES parameter too, currently there are no ways to inject ANY XSS on your website if you're using valid charset (and your users don't use outdated browsers). However, if you want to allow people to only post SOME html tags into their "Tweet" (for example <b> for bold text), you will need to take a deep look at EACH whitelisted tag.

You've passed the first stage which is to recognise that there is a potential issue and skipped straight to trying to find a solution, without stopping to think about how you want to deal the scenario of the content. This is a critical pre-cusrsor to solving the problem.
The general rule is that you validate input and escape output
validate input
- decide whether to accept or reject it it in its entirety)
if (htmlentities($input) != $input) {
die "yuck! that tastes bad";
}
escape output
- transform the data appropriately according to where its going.
If you simply....
print "<script> something_bad() </script>";
That would be bad, but....
print JSONencode(htmlentities("<script> something_bad() </script>"));
...then you'd would have done something very strange at the front end to make the client susceptivble to a stored XSS attack.

If you're outputting to HTML (and I recommend you always do), simply HTML encode on output to the page.
As client script code is only dangerous when interpreted by the browser, it only needs to be encoded on output. After all, to the database <script> is just text. To the browser <script> tells the browser to interpret the following text as executable code, which is why you should encode it to <script>.
The OWASP XSS Prevention Cheat Sheet shows how you should do this properly depending on output context. Things get complicated when outputting to JavaScript (you may need to hex encode and HTML encode in the right order), so it is often much easier to always output to a HTML tag and then read that tag using JavaScript in the DOM rather than inserting dynamic data in scripts directly.
At the very minimum you should be encoding the < & characters and specifying the charset in metatag/HTTP header to avoid UTF7 XSS.

You need to convert the HTML characters <, > (mainly) into their HTML equivalents <, >.
This will make a < and > be displayed in the browser, but not executed - ie: if you look at the source an example may be <script>alert('xss')</script>.
Before you input your data into your database - or on output - use htmlentities().
Further reading: https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet

div contenteditable, XSS

I currently have a simple <div contenteditable="true"> working, but, here's my problem.
Currently, the user can create a persistent XSS by inserting a <script> into the div, which I definitely do not want.
However, my current ideas to fix this are:
Allow only a and img tags
Use a textarea (not a good idea, because then have users copy and paste images)
What do you guys suggest?

You have to keep in mind that to prevent xss, you've GOT TO DO IT ON THE SERVER SIDE. If your rich text editor (ex YUI or tinyMCE) has some javascript to prevent a script tag from being inputted, that doesn't stop me from inspecting your http post requests, looking at the variable names you're using, and then using firefox poster to send whatever string I like to your server to bypass all client side validation. If you aren't validating user input SERVER SIDE then you're doing almost nothing productive to protect from XSS.
Any client side xss protection would have to do with how you render user input; not how you receive it. So, for example, if you encoded all input so it does not render as html. This goes away from what you want to accomplish though (just anchor and img tags). Just keep in mind the more you allow to be rendered the more possible vulnerabilities you expose.
That being said the bulk of your protection should come from the server side and there are a lot of XSS filters out there depending on what you're writing with (ex, asp.net or tomcat/derby/jboss) that you can look into.
I think you're on the right path by allowing ONLY a and img tags. The one thing you have to keep in mind is that you can put javascript commands into the src attributes of a tags, so take care to validate the href attributes. But the basic idea of "allow nothing and then change the filters to only allow certain things" (AKA whitelist filtering) is better than "allow everything and then filter out what I don't want" (AKA blacklist filtering).
In the comments below, Brian Nickel also said this which illustrates the point:
Everything but the elements and attributes you want to keep. I
know you mentioned it in your answer but that bears repeating since it
is so scary. <img onerror="stealMoney()">
The other thing you're going to want to do is define a XSSFilterRequest object (or something along those lines) and in a filter, override your requests so that any call to whatever your "getUrlParameter" and "getRequestParameter" objects run the request values through your xss filter. This provides a clean way to filter everything without rewriting existing code.
EDIT: A python example of xss filtering:
Python HTML sanitizer / scrubber / filter
Python library for XSS filtering?

What about using google caja (a source-to-source translator for securing Javascript-based web content)?
Unless you have xss validation on server side you could apply html_sanitize both to data sent from the user and data received from the server that is to be displayed. In worst case scenario you'll get XSSed content in database that will never be displayed to the user.

We Keep Coding

JavaScript is the programming language of the Web.