Sanitize function too sanitary?

Sanitize function too sanitary? - javascript

I'm working on a webapp that sanitizes models for the view. However, it is stripping too many wanted characters, like forward slashes, semi-colons, colons, dollar signs, quote marks and accented letters from foreign languages. e.g. 3/8"W becomes 38w.
Do I need to modify the function to be less aggressive, or should I simply not use the sanitize function at all? I guess the bigger question is, what is sanitization for?
Full disclosure - I didn't write the function and I'm not fantastic with regex.
value = value.replace(/[^a-z0-9áéíóúñü .,_-]/gim, "").trim();

The sanitization concept is mainly aimed for sanitizing data from bad characters before being saved in database or processed with any type of queries.
That said, you shouldn't care about sanitizing data at front end so much because javascript can be disabled.
Any thing in client side can be bypassed.
You should care so much about that at back end.
Sanitization should be done for data before saving in database.
Escaping should be done for data after retrieving from database.

Related

How to efficiently handle the line break exploit when implementing server sent events?

When implementing Server Sent Events on your application server, you can terminate a message and have it send by ending it with two line breaks: \n\n, as demonstrated on this documentation page.
So, what if you're receiving user input and forwarding it to all interested parties (as is typical in a chat application)? Could a malicious user not insert two line breaks in their payload to terminate the message early? Even more, could they not then set special fields such as the id and retry fields, now that they have access to the first characters of a line?
It seems that the only alternative is to instead scan their entire payload, and then replace instances of \n with something like \ndata:, such that their entire message payload has to maintain its position in the data tag.
However, is this not very inefficient? Having to scan the entire message payload for each message and then potentially do replacements involves not only scanning each entire payload, but also reallocating in the case of maleficence.
Or is there an alternative? I'm currently trying to decide between websockets and SSE, as they are quite similar, and this issue is making me learn more towards WebSockets, because it feels as if they would be more efficient if they are able to avoid this potential vulnerability.
Edit: To clarify, I'm mostly ignorant as to whether or not there is a way around having to scan each message in its entirety for \n\n. And if not, does WebSockets have the same issue where you need to scan each message in its entirety? Because if it does, then no matter. But if that's not the case, then it seems to be a point in favor of using websockets over SSE.

it shouldnt be necessary to scan the payload if you're encoding the user data correctly. With JSON it is safe to use the "data" field in server-sent events because JSON decode newline and controls characters per default, as the RFC says:
The representation of strings is similar to conventions used in the C
family of programming languages. A string begins and ends with
quotation marks. All Unicode characters may be placed within the
quotation marks, except for the characters that must be escaped:
quotation mark, reverse solidus, and the control characters (U+0000
through U+001F).
https://www.rfc-editor.org/rfc/rfc7159#page-8
the important thing is that nobody sneaks in an newline charactes but this isnt new to server sent events, header are seperate by a single new line and can be tampered too (if not correctly encoded) see https://www.owasp.org/index.php/HTTP_Response_Splitting
Heres an example of an server sent application with json encoding:
https://repl.it/#BlackEspresso/PointedWelloffCircles
you shouldnt be able to tampere the data field even with the newline characters are allowed
Encoding souldnt stop you from using server side events, but there are major differences between websockets and sse. For a comparison see this answer: https://stackoverflow.com/a/5326159/1749420

Unless I'm missing something obvious, sanitizing input is a common thing in web development.
Since the source that you shared explicitly mentioned a PHP example, I just did some research and lookie here:
https://www.php.net/manual/en/filter.filters.sanitize.php
FILTER_SANITIZE_SPECIAL_CHARS
HTML-escape '"<>& and characters with ASCII value less than 32,
optionally strip or encode other special characters.
and:
'\n' = 10 = 0x0A = line feed
So I'm not sure I understand why you would assume that converting certain input to character entities would necessarily be a bad thing.
Avoiding users to abuse the system by uploading unwanted input is what sanitization is for.

Encoding user input to be stored in MongoDB

I'm trying to determine the best practices for storing and displaying user input in MongoDB. Obviously, in SQL databases, all user input needs to be encoded to prevent injection attacks. However, my understanding is that with MongoDB we need to be more worried about XSS attacks, so does user input need to be encoded on the server before being stored in mongo? Or, is it enough to simply encode the string immediately before it is displayed on the client side using a template library like handlebars?
Here's the flow I'm talking about:
On the client side, user updates their name to "<script>alert('hi');</script>".
Does this need to be escaped to "<script>alert('hi');</script>" before sending it to the server?
The updated string is passed to the server in a JSON document via an ajax request.
The server stores the string in mongodb under "user.name".
Does the server need to escape the string in the same way just to be safe? Would it have to first un-escape the string before fully escaping so as to not double up on the '&'?
Later, user info is requested by client, and the name string is sent in JSON ajax response.
Immediately before display, user name is encoded using something like _.escape(name).
Would this flow display the correct information and be safe from XSS attacks? What about about unicode characters like Chinese characters?
This also could change how text search would need to be done, as the search term may need to be encoded before starting the search if all user text is encoded.
Thanks a lot!

Does this need to be escaped to "<script>alert('hi');</script>" before sending it to the server?
No, it has to be escaped like that just before it ends up in an HTML page - step (5) above.
The right type of escaping has to be applied when text is injected into a new surrounding context. That means you HTML-encode data at the moment you include it in an HTML page. Ideally you are using a modern templating system that will do that escaping for you automatically.
(Similarly if you include data in a JavaScript string literal in a <script> block, you have to JS-encode it; if you include data in in a stylesheet rule you have to CSS-encode it, and so on. If we were using SQL queries with data injected into their strings then we would need to do SQL-escaping, but luckily Mongo queries are typically done with JavaScript objects rather than a string language, so there is no escaping to worry about.)
The database is not an HTML context so HTML-encoding input data on the way to the database is not the right thing to do.
(There are also other sources of XSS than injections, most commonly unsafe URL schemes.)

The short answer is yes, you should still encode all user input.
Whenever you do string concatenation, you need to escape the data correctly. MongoDB supports converting Javascript queries to it's native query language expression in BSON. When doing this there are two contexts to be aware of:
Inside a Javascript string
Everywhere else
If you are concatenating user input outside a string, you really need to be careful. It's really hard to get the escaping right unless the datatype of the variable is an integer or similar where the possible values are known and limited.
The best practice would be to avoid string concatenation whenever possible. You can read more about how MongoDB addresses SQL-Injection here.

How can I remove escaping from a RegExp pattern?

I'm trying to simplify input for a particular regex for my users. A simple example of the regex might be
\b(C|C\+\+|Java)\b
I'm now giving the user the option of appending another branch at the end of the regex by inputting the raw string into a <input type="text"> field. The branch will be interpreted literally, so I need to escape it. I've used https://stackoverflow.com/a/2593661/785663 to get RegExp.quote to do this. I then store the complete regex in a database.
Now, when I retrieve the regex from the database and split it back up and display the branches to the user, I need to remove all the escape characters again. Is there some pre-made function for this or do I need to roll my own?
Yes, I know I ought to replace this with a list of strings to search for. But this only a part of a larger (regex based) picture.

The optimal solution is to change your design: store the unescaped regex, then only escape it when you actually use it. That way you don't have to worry about this messy business of converting it back and forth all the time.
If you use this regex a lot and are worried about the overhead of having to escape it all the time, then store both the unescaped and escaped versions. Update both whenever the user makes a change.
p.s. Allowing user-entered regexes may make your site vulnerable to attack. (Update: Though in this case it is less likely to be a problem, since you are only allowing literal strings)

How to avoid "Cross-Site Script Attacks"

How do you avoid cross-site script attacks?
Cross-site script attacks (or cross-site scripting) is if you for example have a guestbook on your homepage and a client posts some javascript code which fx redirects you to another website or sends your cookies in an email to a malicious user or it could be a lot of other stuff which can prove to be real harmful to you and the people visiting your page.
I'm sure it can be done fx. in PHP by validating forms but I'm not experienced enough to fx. ban javascript or other things which can harm you.
I hope you understand my question and that you are able to help me.

I'm sure it can be done fx. in PHP by validating forms
Not really. The input stage is entirely the wrong place to be addressing XSS issues.
If the user types, say <script>alert(document.cookie)</script> into an input, there is nothing wrong with that in itself. I just did it in this message, and if StackOverflow didn't allow it we'd have great difficulty talking about JavaScript on the site! In most cases you want to allow any input(*), so that users can use a < character to literally mean a less-than sign.
The thing is, when you write some text into an HTML page, you must escape it correctly for the context it's going into. For PHP, that means using htmlspecialchars() at the output stage:
<p> Hello, <?php echo htmlspecialchars($name); ?>! </p>
[PHP hint: you can define yourself a function with a shorter name to do echo htmlspecialchars, since this is quite a lot of typing to do every time you want to put a variable into some HTML.]
This is necessary regardless of where the text comes from, whether it's from a user-submitted form or not. Whilst user-submitted data is the most dangerous place to forget your HTML-encoding, the point is really that you're taking a string in one format (plain text) and inserting it into a context in another format (HTML). Any time you throw text into a different context, you're going to need an encoding/escaping scheme appropriate to that context.
For example if you insert text into a JavaScript string literal, you would have to escape the quote character, the backslash and newlines. If you insert text into a query component in a URL, you will need to convert most non-alphanumerics into %xx sequences. Every context has its own rules; you have to know which is the right function for each context in your chosen language/framework. You cannot solve these problems by mangling form submissions at the input stage—though many naïve PHP programmers try, which is why so many apps mess up your input in corner cases and still aren't secure.
(*: well, almost any. There's a reasonable argument for filtering out the ASCII control characters from submitted text. It's very unlikely that allowing them would do any good.
Plus of course you will have application-specific validations that you'll want to do, like making sure an e-mail field looks like an e-mail address or that numbers really are numeric. But this is not something that can be blanket-applied to all input to get you out of trouble.)

Cross-site scripting attacks (XSS) happen when a server accepts input from the client and then blindly writes that input back to the page. Most of the protection from these attacks involves escaping the output, so the Javascript turns into plain HTML.
One thing to keep in mind is that it is not only data coming directly from the client that may contain an attack. A Stored XSS attack involves writing malicious JavaScript to a database, whose contents are then queried by the web application. If the database can be written separately from the client, the application may not be able to be sure that the data had been escaped properly. For this reason, the web application should treat ALL data that it writes to the client as if it may contain an attack.
See this link for a thorough resource on how to protect yourself: http://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet

Strip the last character sent by JavaScript through websockets to Python

I'm currently trying out websockets, creating a client in JavaScript and a server in Python.
I'm stuck on a simple problem, though: when I send something from the client to the server it always contains a special ending character, but I don't know how to remove it.
I've tried data[:-1] thinking that would get rid of it, but it didn't.
With the character my JSON code won't validate.
This is what I send through JavaScript:
ws.send('{"test":"test"}');
This is what I get in python:
{"test":"test"}�
I thought the ending character was \xff

The expression "data[:-1]" is an expression that produces a copy of data missing the last character. It doesn't modify the "data" variable. To do that, you have to assign back to "data", like so:
data = data[:-1]
My suspicion is the "special ending character" is a bug, somewhere, either in your code or how you're using the APIs. Network code does not generally introduce random characters into the data stream. Good luck!

We Keep Coding

JavaScript is the programming language of the Web.