How to remove executable JavaScript from a QueryString variable?

How to remove executable JavaScript from a QueryString variable? - javascript

I have a page URL something like this: https://site.com/page.aspx?QSVariableName=Value
Our security team has raised a vulnerability stating that if the Value portion is injected with executable JS, that is something like:
https://site.com/page.aspx?QSVariableName=Value%27%2balert%281234567890%29%2b%27 (Equivalent to passing Value'+alert(1234567890)+')
Then 1234567890 is alerted from the OnClick event of a hyperlink I have on the page that uses Value.
Like I said, this is not something I am doing intentionally, but is identified as a vulnerability in the code. So the question is, how do I make sure QSVariableName uses just the Value and ignore unnecessary code (lets keep to JS only for now)?
The complexity that comes to my mind is QSVariableName could contain ANY JS code, not just alert(). And it could be present anywhere, not just at the end. Is there any way to identify JS executable code embedded in a string?
It is OK if Value is a bad 'string' as long as it doesn't contain anything executable.

What if, instead of putting QSVariableName directly into the onclick attribute, you used a switch/case pattern to choose a function based on QSVariableName? You could also map appropriate query strings to appropriate functions using an associative array. Either way, you'd only ever be treating QSVariableName as a string and only be using safe and pre-approved JS in your onclick event. The downside being, anyone who tries to inject JS into the query string will get an error, but they're trying to hack your site...their user experience should suffer accordingly.
Without seeing more of your code, I'm not sure I can come up with a better answer...

Related

javascript encodeURIComponent and escape?

I use JS to sent encodeURIComponent string to a PHP file write and has been working fine for years; until recently I met with a strange effect that the text need to be further encoded with escape in order to get it to work! The sympton start to show only when I use an open source wysiwyg editor !
What could be the offending characters in URI that need escape to fix it? I used to think URI only reserve ? & = for its syntax to work.

The situation you describe could possibly be explained--although there's no way of knowing without you telling us what the string is, and how it's being used--by a URL which involves two levels of nested URL-like values.
Consider a URL taking a query parameter which is another URL:
http://me.com?url=http://you.com?qp=1
That URL is subject to misinterpretation, so we would normally URL-encode the you.com URL, giving us:
http://me.com?url=http%3A%2F%2Fyou.com%3Fqp%3D1
Whoever is working with this URL can now extract the query parameter named url with the value http%3A%2F%2Fyou.com%3Fqp%3D1, decode it (often a framework or library will decode it for you), and then use it to jump to or call that URL.
Consider, however, the case where the you.com URL itself has a query parameter, not ?qp=1 as given in the first example, but rather something that itself needs to be URL-encoded. To keep things simple, we'll just use "cat?pictures". We'd need to encode that, making the query parameter
In other words, the URL in question is going to be
?qp=cat%3Fpictures
If we just use that as is, then our entire URL becomes
http://me.com?url=http%3A%2F%2Fyou.com%3Fqp=cat%3Fpictures
Unfortunately, if we now decode that in a naive way, we get
http://me.com?url=http://you.com?qp=cat?pictures
In other words, the nested URL has been decoded as well, meaning that it will think the URL has two query paramters, namely url and qp. To successfully deal with this problem, we need to encode the second query parameter a second time, yielding
http://me.com?url=http%3A%2F%2Fyou.com%3Fqp%3Dcat%253Fpictures
Please note, however, that if you use your language or environment's built-in tools and libraries for handling query parameters, most of this will happen automatically and prevent you from having to worry about it.
The symptom start to show only when I use an open source wysiwyg editor
An editor merely places characters in a file. It's very hard to imagine that an editor is causing the problem you refer to, unless perhaps one editor is configured to use smart quotes, for example, which would pretty much break everything that involved quotes.

Prevent user-entered scripts from running in webpage

In my application, there is a comment box. If someone enters a comment like
<script>alert("hello")</script>
then an alert appears when I load that page.
Is there anyway to prevent this?

There are several ways to address this, but since you haven't mentioned which back-end technology you are using, it is hard to give anything but rough answers.
Also, you haven't mentioned if you want to allow, or deny, the ability to enter regular HTML in the box.
Method 1:
Sanitize inputs on the way in. When you accept something at the server, look for the script tags and remove them.
This is actually far more difficult to get right then might be expected.
Method 2:
Escape the data on the way back down to the server. In PHP there is a function called
htmlentities which will turn all HTML into which renders as literally what was typed.
The words <script>alert("hello")</script> would appear on your page.
Method 3
White-list
This is far beyond the answer of a single post and really required knowing your back-end system, but it is possible to allow some HTML characters with disallowing others.
This is insanely difficult to get right and you really are best using a library package that has been very well tested.

You should treat user input as plain text rather than HTML. By correctly escaping HTML entities, you can render what looks like valid HTML text without having the browser try to execute it. This is good practice in general, for your client-side code as well as any user provided values passed to your back-end. Issues arising from this are broadly referred to as script injection or cross-site scripting.
Practically on the client-side this is pretty easy since you're using jQuery. When updating the DOM based on user input, rely on the text method in place of the html method. You can see a simple example of the difference in this jsFiddle.

The best way is replace <script> with other string.For example in C#use:
str.replace("<script>","O_o");
Other options has a lot of disadvantage.
1.Block javascript: It cause some validation disabled too.those validation that done in frontend.Also after retrive from database it works again.I mean attacker can inject script as input in forms and it saved in database.after you return records from database in another page it render as script!!!!
2.render as text. In some technologies it needs third-party packages that it is risk in itself.Maybe these packages has backdoor!!!

convert value into string ,it solved in my case
example
var anything

Should I worry that using GET in a form element doesn't automatically URL-encode angle brackets?

So I decided to use GET in my form element, point it to my cshtml page, and found (as expected) that it automatically URL encodes any passed form values.
I then, however, decided to test if it encodes angle brackets and surprisingly found that it did not when the WebMatrix validator threw a server error warning me about a potentially dangerous value being passed.
I said to myself, "Okay, then I guess I'll use Request.Unvalidated["searchText"] instead of Request.QueryString["searchText"]. Then, as any smart developer who uses Request.Unvalidated does, I tried to make sure that I was being extra careful, but I honestly don't know much about inserting JavaScript into URLs so I am not sure if I should worry about this or not. I have noticed that it encodes apostrophes, quotations, parenthesis, and many other JavaScript special characters (actually, I'm not even sure if an angle bracket even has special meaning in JavaScript OR URLs, but it probably does in one, if not both. I know it helps denote a List in C#, but in any event you can write script tags with it if you could find a way to get it on the HTML page, so I guess that's why WebMatrix's validator screams at me when it sees them).
Should I find another way to submit this form, whereas I can intercept and encode the user data myself, or is it okay to use Request.Unvalidated in this instance without any sense of worry?
Please note, as you have probably already noticed, my question comes from a WebMatrix C#.net environment.
Bonus question (if you feel like saving me some time and you already know the answer off the top of your head): If I use Request.Unvalidated will I have to URL-decode the value, or does it do that automatically like Request.QueryString does?
---------------------------UPDATE----------------------------
Since I know I want neither a YSOD nor a custom error page to appear simply because a user included angle brackets in their "searchText", I know I have to use Request.Unvalidated either way, and I know I can encode whatever I want once the value reaches the cshtml page.
So I guess the question really becomes: Should I worry about possible XSS attacks (or any other threat for that matter) inside the URL based on angle brackets alone?
Also, in case this is relevant:
Actually, the value I am using (i.e. "searchText") goes straight to a cshtml page where the value is ran through a (rather complex) SQL query that queries many tables in a database (using both JOINS and UNIONS, as well as Aliases and function-based calculations) to determine the number of matches found against "searchText" in each applicable field. Then I remember the page locations of all of these matches, determine a search results order based on relevance (determined by type and number of matches found) and finally use C# to write the search results (as links, of course) to a page.
And I guess it is important to note that the database values could easily contain angle brackets. I know it's safe so far (thanks to HTML encoding), but I suppose it may not be necessary to actually "search" against them. I am confused as to how to proceed to maximum security and functional expecations, but if I choose one way or the other, I may not know I chose the wrong decision until it is much too late...

URL and special caracters
The url http://test.com/?param="><script>alert('xss')</script> is "benign" until it is read and ..
print in a template : Hello #param. (Potential reflected/persisted XSS)
or use in Javascript : divContent.innerHTML = '<a href="' + window.location.href + ... (Potential DOM XSS)
Otherwise, the browser doesn't evaluate the query string as html/script.
Request.Unvalidated/Request.QueryString
You should use Request.Unvalidated["searchText"] if you are expecting to receive special caracters.
For example : <b>User content</b><p>Some text...</p>
If your application is working as expected with QueryString["searchText"], you should keep it since it validate for potential XSS.
Ref: http://msdn.microsoft.com/en-us/library/system.web.httprequest.unvalidated.aspx

Is it possible to use JavaScript to break the HTML of a page?

I've been asked at work whether it is possible to write, on purpose or by accident, JavaScript that will remove specific characters from a HTML document and thus break the HTML. An example would be adding some JavaScript that removes the < symbol in the page. I've tried searching online and I know JavaScript can replace strings, but my knowledge of the language is negligible.
I've been asked to look into it as a way of hopefully addressing why a site I work on needs to have controls over who can add bespoke functionality to the page. I'm hoping it's not possible but would be grateful for the peace of mind!

Yes, and in fact you can do things far more insidious with javascript as well.
http://en.wikipedia.org/wiki/Cross-site_scripting

yes, thats possible. the easiest example is
var body = document.getElemetsByTagName('body')[0];
body.innerHTML = 'destroyed';
wich will remove the whole page and just write "destroyed" instead. to get back to your example: in the same way it's possible to replace <:
var body = document.getElemetsByTagName('body')[0];
body.innerHTML = body.innerHTML.replace('<','some other character');
such "extreme" cases are very unlikely to happen by accident, but it's absolutely possible (particularly for inexperienced javascript-developers) to break things on a site that usually shouldn't be affected by javascript.
note that this will only mess op the displayed page in the clients browser and doesn't change your html-file on the server in any way. just find and remove/fix the "bad" lines of code and everything is fine again.

Any client/browser can manipulate how the page is viewed at any time, for instance in chrome hit F12 and then you can write whatever you want in the html and you will see the changes immediately. But that's not to worry about...
The scary part is when JavaScript on the site communicates with the back-end server and supplies it with some input parameters that are not being sanitized on the server side before it is processed in some way. SQL Injection can also happen this way if the back-end utilizes a database which they almost always do, and so on...
A webpage can be manipulated in two ways, either its none-persistent or its persistent.
[none-persistent]: this way you can manipulate your access to a webpage but, this won't affect other users in it self, but you can do harm once your in.
[persistent]: this way the server side code will permanently be affected by the injected code, and most likely affect other users.
Key thing here is to always sanitize the input a back-end server used before it processes anything.

You could definitely write some javascript function to modify the contents of a file. If that file is your HTML page, then sure.
If you want to prevent this from happening, you can just set the permissions of that HTML file to be read-only, though.

you could:
Overwrite the page,
Mess with the innerHTML of the body tag (almost the same),
Insert illegal elements.

Yes. In the least, you could use it to write CSS that sets any element, class, ID... even the body to display:none;

Optimal way to pass system values to javascript

What is the most effective way to pass object and category ids or other system variables which shouldn't be presented to the user, from server to the browser?
Let's say I have a list of items where I can do something with each of them by javascript, for example show tooltip html or add to favorites by ajax, or display on a map. Where is it best to save that tooltip html, or database id, or geoposition?
Some options I can think of are:
some dictionary within <script></script> tag for each item,
microformats,
inline xml,
rel attributes,
css class names with specific information, e.g. class="lat-12345 uid-45678",
one <script></script> with a dictionary of html ids mapping dictionaries with system values in the template,
javascript generated from the database and included via <script src="..."></script> with a dictionary of html ids mapping dictionaries with system values in the template,
ajax requests for all cases when I need more information than just id,
event handlers with parameters within html tags, e.g. onmouseover="tooltip(this, 123, 'Hello world!')".
P.S. I prefer unobtrusive solutions and also the loading/execution time is important.

Perhaps I am missing something... why not just JSON?
How you "send" it (either in the initial page load as "javascript" or via AJAX or whatnot) is really just a trivial detail determined mostly by when the data is available. (JSON is a subset of legal JavaScript syntax.)
Then it's just a matter of the correct transformation. Of course, by pushing this to JSON/JS, you may render some non-JS clients non-interoperable, if that's a consideration for you. If such is indeed the case, why not just perform the transformation server-side using well, any number of the techniques you put at top?
You can also use arbitrary attributes in HTML (the HTML5 spec may include "data-*" which is formally legalized) -- while not technically "correct", all major web-browsers will accept unknown attributes which can be accessed through the DOM API.

I'd prefer a single AJAX call to fetch whatever data you know you need at the outset, so you can have a simple JSON object available in your script. You can, of course, supplement that with additional calls should you find you need more information.
If that's impractical, then "hardcoding" a JavaScript object in a <script>...</script> tag is the next best option. Of course, "hardcoding" is from the browser's perspective. The actual content would surely be written by server-side script from your database.

One method you can use is custom attributes. I think you refer to this as micro-formats, but I am not entirely sure if they are the same thing so I have written a description below.
Having hit the same question before, I basically use something like the following:
<div data-pid="1234">
<a href="#" class="add-to-favourites">
<img src="addToFavourites.png" />
</a>
</div>
$("a.add-to-favourites").click(function() {
$.load("/Favourites/Add.php?prodID="+$(this).parent().attr("data-pid"));
});
This should do exactly what you want to do. The reason I have put the pid in the div, not the a tag, is that you can then place all the other product information within the div with other actions the user can take, for example displaying a tooltip on mouseover using data-description, or displaying on a map using data-geo-x and data-geo-y. Of course you can name these anything you want.
Support / Acceptance
This is becoming a perfectly accepted way to do what you want to do. HTML 5 supports this for precisely the kind of thing you are trying to achieve.
So it is supported by HTML 5, but what about HTML 4?
It may make HTML 4 invalid, but the world is moving on to bigger and better things. Older browsers (IE6 and before, FF1 / 2, Opera 7 / 8 / 9) are becoming less common so it shouldnt be a problem. It wont actually break older browsers - the functionality will still work.
Important validity note
Make sure you prepend the data- onto the attribute name. This will make the attribute perfectly valid in HTML 5.
A few extra hints
In jQuery 1.5, I have heard from an answer to my question that you can simply specify attr("pid") to return the value of data-pid. If this is the case then I would be careful when naming the second part of the attribute name after the name of an actual attribute (for example, instead of data-id, use data-pid - especially if the id attribute is specified. I am not sure what effect it would have if you didn't do this, but its better to avoid the problem in the first place than have issues with the site at a later date due to this.
Hope this is what you were looking for.

ASP.NET offers a very convenient way to do this. You can simply write a JavaScript object. I am sure other templating engines offer similar ways to do this.
var person = {
Name : <%= _person.Name %>,
Age : <%= _person.Age %>
};

I would implement a Javascript singleton AppCacheManager that initializes in the document.ready event. A bit JS oop and you have a fully fledged OOP datastore.
Whenever information is needed, you load it through Ajax / RESTful Webservice and cache it in the AppCache Manager. So you have 2 caches: 1. Browser Cache, possible due to RESTful webservice URL caching, and 2: the JS Cache Manager.
You access all requests to the AppCacheManager which transparently fetches the new data or returns the cached data, so that the client doesnt need to know anything of the caching.
in short:
write a JS CacheManager
don't fetch the whole data at once but in small parts when needed and cache them
define a convenient interface for the cachemanager
Example usage:
linktext
Unobtrusiveness is a very difficult thing in JS and i'd be eager to know something about that, too.
hope that helped.

We Keep Coding

JavaScript is the programming language of the Web.

How to remove executable JavaScript from a QueryString variable? - javascript

Related

javascript encodeURIComponent and escape?

Prevent user-entered scripts from running in webpage

Should I worry that using GET in a form element doesn't automatically URL-encode angle brackets?

Is it possible to use JavaScript to break the HTML of a page?

Optimal way to pass system values to javascript

Categories

Resources