Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
In a production perspective, I do client-side checks (type, length, regex, etc.) on every data field sent to the server using my methods. Of course, I will double check everything on server in the related methods.
Considering that every type or format error case should have been handled by client code, I assume that, on server, it is better to handle errors quietly instead of throwing an explicit error: the data is necessarily coming from a client where original code has been tampered with (if all my clients checks are ok). In practice, I would then use Match.test() (quietly) instead of check() (error thrown)
Is this a good practice to handle server errors quietly every time it happens on server if this kind of error should have been flagged on client first? If not, why?
Besides, I consider keeping track of these quiet errors and auto-block or flag accounts repeating them more than x times. Is that a good idea?
You are right in that a server should never, ever throw an explicit exception on the server without handling it - that will cause your server to crash, and it'll be a nice Denial-Of-Service for everybody.
However, the server should still be able to inform the user if the data is malformed, using a 400 class HTTP error message. Both security and user-friendliness are things that must absolutely be accounted for when dealing with user-inputted data - and with an untrusted client and slow server, both layers should have mechanisms for providing them.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
A user submits a search query to my site.
I then take this query and use it in other places, as well as echo'ing it back out to the page.
Right now I'm using htmlspecialchars(); to filter it.
What other steps should I take to prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
<?php
$query = $_GET["query"];
$query = htmlspecialchars($query);
?>
Right now I'm using htmlspecialchars(); to filter it.
What other steps should I take to prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
To cover all your bases, this depends a lot. The most straight forward (but unsatisfying) answer then probably is: do not accept user input.
And even this may sound easy, it is often not and then forgotten that any input from a different context has to be considered user input. For example when you open a file from the file-system, e.g. reading records from a database or some other data from some other system or service - not only some parameter from the HTTP request or a file upload.
Thinking this through, in context of PHP, this normally also includes the PHP code itself which is often read from disk. Not SQL, just PHP code injection.
So if you really think about the question in such a generally broad way ("etc"), the first thing you need to ensure is you've got a defined process to deploy the application and have checks and marks in place that the files of the deployment can't be tempered with (e.g. a read-only file-system). And from the operational side: You can create and restore the known state of the program within seconds with little or no side effects.
Only after that you should start to worry about other kind of user-input. For which - to complete the answer - you should only accept what is acceptable.
A user submits a search query to my site.
Accepting a search query is the higher art of user input. It involves (free form) text which tends to become more and more complex after every other day and may also include logical operators and instructions which may require parsing which involves even more components that can break and can be exploited by various kind of attacks (and SQL Injection is only one of those, albeit still pretty popular). So plan ahead for it.
As a first level mitigation, you can question if the search is really a feature that is needed. Then if you have decided on that, you should outline which problems it generally creates and you should take a look if the problems are common. That question is important because common questions may already have answers, even common answers. So if a problem is common, it is likely that the problem is already solved. Leaning towards an existing solution then only bears the problem to integrate that solution (that is understanding the problem - you always need to do it and you learn soon enough, one or two decades is normally fine - and then understanding the specific solution as you need to integrate it).
For example:
$query = $_GET["query"];
$query = htmlspecialchars($query);
is making use of variable re-use. This is commonly known to be error prone. Using different variable names that mark the context of its value(s) can help:
$getQuery = $_GET["query"];
$htmlQuery = htmlspecialchars($getQuery);
It is then more visible that $htmlQuery can be used in HTML output to show the search query (or at least was intended for it). Similar to $_GET["query"], it would make totally visible that $getQuery would not be appropriate for HTML output and its string concatenation operations.
In the original example, this would not be equally visible for $query.
It would then perhaps also made visible that in other than HTML output contexts, it ($htmlQuery) is not appropriate either. As your question suggests you already imagine that $getQuery or $htmlQuery is not appropriate to deal with the risks of an SQL Injection for example.
The example is intentionally verbose on the naming, real-life naming schemes are normally different and wouldn't emphasize the type on the variable name that much but would have a concrete type:
try {
...
$query = new Query($_GET["query"]);
...
<?= htmlspecialchars($query) ?>
If you already read up to this point, it may become more clear that there hardly can not be any one-size-fits-it-all function that magically prevents all attacks (apart from muting any kind of user-input which sometimes is equal to deleting the overall software in the first place - which is known to be safe, perhaps most of all for your software users). If you allow me the joke, maybe this is it:
$safeQuery = unset($_GET["query"]); // null
which technically works in PHP, but I hope you get the idea, it's not really meant as an answer to your question.
So now as it is hopefully clear that each input needs to be treated in context of input and output to work, it should give some pointers how and where to look for the data-handling that is of need.
Context is a big word here. One guidance is to take a look if you're dealing with user data (user input) in the input phase of a system or in the output phase.
In the input phase what you normally want to do is to sanitize, to verify the data. E.g. is it correctly encoded? Can the actual value or values the data represents (or is intended to represent) be safely decoded? Can any actual value be obtained from that data? If the encoding is already broken, ensure no further processing of that data is done. This is basically error handling and commonly means to refuse input. In context of a web-application this can mean to close the connection on the TCP transport layer (or not send anything (back) on UDP), to respond with a HTTP Status Code that denotes an error (with or without further, spare details in the response body), with a more user-friendly hypertext message in the response body, or, for a HTML-Form dedicated error messages for the part of the input that was not accepted and for some API in the format that the client can consume for the API protocol to channel out errors with the request input data (the deeper you go, the more complicated).
In the output phase it is a bit different. If you for example identified the user-input being a search query and passed the query (as value) to a search service or system and then get back the results (the reflected user input which still is user input), all this data needs to be correctly encoded to transport all result value(s) back to the user. So for example if you output the search query along with the search results, all this data needs to be passed in the expected format. In context of a web application, the user normally tells with each request what the preferred encoding of the response should be. Lets say this is normally hypertext encoded as HTML. Then all values need to be output in a way/form so that these are properly represented in HTML (and not for some error as HTML, e.g. a search for <marquee> would not cause the whole output to move all over the page - you get the idea).
htmlspecialchars() may do the job here, so might by chance htmlentities(), but which function to use with which parameters highly depends on underlying encoding like HTTP, HTML or character encoding and to which part something belongs in the response (e.g. using htmlspecialchars() on a value that is communicated back with a cookie response header would certainly not lead to intended results).
In the input phase you assert the input is matching your expectations so that you can safely let pass it along into the application or refuse further processing. Only you can know in detail what these requirements are.
In the output phase your job is to ensure that all data is properly encoded and formatted for the overall output to work and the user can safely consume it.
In the input phase you should not try to "fix" issues with the incoming data yourself, instead assume the best and communicate back that there will be no communication - or - what the problem was (note: do not let fool yourself: if this involves output of user input, mind what is important for the output phase of it, there is less risk in just dropping user input and not further process it, e.g. do not reflect it by communicating it back).
This is a bit different for the non-error handling output phase (given the input was acceptable), you err here on the safe side and encode it properly, you may even be fine with filtering the user-data so that it is safe in the output (not as the output which belongs to your overall process, and mind filtering is harder than it looks on first sight).
In short, don't filter input, only let it pass along if it is acceptable (sanitize). Filter input only in/for output if you do not have any other option (it is a fall-back, often gone wrong). Mind that filtering is often much harder and much more error prone incl. opening up to attacks than just refusing the data overall (so there is some truth in the initial joke).
Next to input or output context for the data, there is also the context in use of the values. In your example the search query. How could anyone here on Stackoverflow or any other internet site answer that as it remains completely undefined in your question: A search query. A search query for what? Isn't your question itself in a search for an answer? Taking it as an example, Stackoverflow can take it:
Verify the input is in the form of a question title and its text message that can safely enter their database - it passed that check, which can be verified as your question was published.
With your attempt to enter that query on Stackoverflow, some input validation steps were done prior sending it to the database - while already querying it: Similar questions, is your user valid etc.
As this short example shows, many of the questions for a concrete application (your application, your code) needs not only the basic foundation to work (and therefore do error handling on the protocol level, standard input and output so to say), but also to build on top of it to work technically correct (a database search for existing questions must not be prone to SQL injection, neither on the title, not on the question text, nor must the display of error messages or hints introduce other form of injections).
To come back to your own example, $htmlQuery is not appropriate if you need to encode it as a Javascript string in a response. To encode a value within Javascript as a string you would certainly use a different function, maybe json_encode($string) instead of htmlspecialchars($string).
And for passing the search query to a search service, it may be as well encoded differently, e.g. as XML, JSON or SQL (for which most database drivers offers a nice feature called parameterized queries or more formalized prepared statements which are of great help to handle input and output context more easily - common problems, common solutions).
prevent XSS, SQL Injection, etc, and things I can't even think of. I want to have all my bases covered.
You may already now spot the "error" with this "search query". It's not about the part that there aren't things you or anyone else can even think of. Regardless of how much knowledge you have, there always will be known and unknown unknowns. Next to the just sheer number of mistakes we do encode into software each other day. The one "wrong" perhaps is in thinking that there would be a one-size-fits-it-all solution (even in good intend as things must have been solved already - and truly most have been, but still one needs to learn about them first, so good you ask) and perhaps more important the other one to assume that others are solving your problems: your technical problems perhaps, but your problems you can only solve yourself. And if that sentence may sound hard, take the good side of it: You can solve them. And I write this even I can only give a lengthy answer to your question.
So take any security advice - including the text-wall I just placed here - on Stackoverflow or elsewhere with a grain of salt. Only your own sharp eyes can decide if they are appropriate to cover your bases.
Older PHP Security Poster (via my blog)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I was learning fetch() and found out that the body of the response uses something called readableStream. According to my research, readable stream allows us to start using the data once it gets being downloaded from the server(I hope I am correct:)). In terms of fetch(), how can readable stream be useful with fetch(), that is, we anyways need to download all of the data then start using it. Overall, I just cannot understand the point of readable stream in fetch() thus I need your kind help:).
Here's one scenario: a primitive ASCII art "video player".
For simplicity, imagine a frame of "video" in this demo is 80 x 50 = 4000 characters. Your video "decoder" reads 4000 characters, displays the characters in a 80 x 50 grid, reads another 4000 characters, and so on until the data is finished.
One way to do this is to send a GET request using fetch, get the whole body as a really long string, then start displaying. So for a 100 frame "video" it would receive 400,000 characters before showing the first frame to the user.
But, why does the user have to wait for the last frame to be sent, before they can view the first frame? Instead, still using fetch, read 4000 characters at a time from the ReadableStream response content. You can read these characters before the remaining data has even reached the client.
Potentially, you can be processing data at the start of the stream in the client, before the server has even begun to process the data at the end of the stream.
Potentially, a stream might not even have a defined end (consider a streaming radio station for example).
There are lots of situations where it's better to work with streaming data than it is to slurp up the whole of a response. A simple example is summing a long list of numbers coming from some data source. You don't need all the numbers in memory at once to achieve this - you just need to read one at a time, add it to the total, then discard it.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a problem that on specific view request.user returns AnonymousUser.
This is caused by a javascript library which I use to collect payments. That javascript library makes a cookie which makes django see a logged-in user as AnonymousUser.
If I delete that cookie, django sees the user as logged-in but after a couple of refreshes, I get a new cookie which makes again the logged-in user an AnonymousUser.
And I have this issue only in one specific page where that library is inserted in the page.
Any ideas what is wrong?
The javascript in question sets a cookie by the name mistertango[collect][mt_fp].
When cookies was defined (RFC 6265, I guess) it seems they didn't really specify what characters you're allowed to use in a cookie name, other than basically «text».
This causes some problems with parsing cookie names. Django relies on Python's http.cookies for this, and it seems http.cookies doesn't allow brackets in cookie names. http.cookie failes to parse cookie pairs with brackets in it, and doesn't parse pairs after that which means it doesn't see the sessionid cookie it uses for authentication.
I'm not able to tell if Django/http.cookie should or shouldn't support this.
PHP does however seem to support it (even if it's broken), while Ruby on Rails does not.
The easy solution is to use only alphanumeric characters in cookie names.
For your case, the best solution is to get the javascript author to change their cookie name. If that's not possible, or in the mean time, you could host the javascript yourself and change the cookie name in your copy. (This may not work if the cookie is used for something outside of this javascript snippet, but I don't really understand Javascript and does not see what it is used for.)
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
The community reviewed whether to reopen this question 7 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I have a REST api served by NodeJS to an AngularJS fronts.
I work with users :
GET /api/users #Returns all users
POST /api/users #Create new user
GET /api/users/:id #Return a user
PUT /api/users/:id #Edit a user
DELTE /api/users/:id #Delete a user
This is a user :
{
login : "admin"
email : "admin#admin.com"
pass : "hashedpassword"
...
}
My user can belong to groups
GET /api/users/:id/groups #Return the groups of a user
They can also have constraints or can inherits constraints form their groups
GET /api/users/:id/constraints #Return the constraints of a user
GET /api/groups/:id/constraints #Return the constraints of a group
The problem :
I'm making an admin page, displaying All the users, their groups, their constraints.
Should I :
Make many requests in a for loop in the javascript (Angular) front ?
Something like :
$http.get(/api/users).then(function(result) {
result.data.forEach(function(user) {
$http.get('/api/users/'+user.data.id+'/groups).then(function(groups) {
groups.forEach(function(group) {
$http.get('/api/groups/'+group.data.id+'/constraints)
})
})
})
})
Create a path /api/users/withConstraintsAndGroups
That would return a big list of all the users with their groups and their constraints.
I find solution 1 to be very nice, maintenable, easy, and generic but I'm affraid of very bad performance
I find solution 2 to be ugly, hard to maintain, hard to code, not generic but with good performance
Which solution should I choose?
Your question basically boils down to:
Which is better, one big HTTP request, or many small ones?
One thing to keep in mind is the expected network latency (ping time) between your clients and your server. In a high-latency situation with otherwise good bandwidth, many small requests will perform significantly worse than one large one.
Also, having one big request gives you better efficiency for compressing the response, and avoids the overhead of the extra HTTP requests and response headers.
Having said all that, I would still advise you to start with option 1, solely because you stated it is very easy for you to code. And then see if it meets your requirements. If not, then try option 2.
And as a final note, I generally prefer one big request to many small requests because that makes it easy to define each request as a transaction which is either completely applied or rolled back. That way my clients do not run into a state where some of their many small requests succeeded while others failed and they now have an inconsistent local copy of the data my API supplied.
Come on, how much would it cost in performance, based on the thesis that the result will be almost the same, multiple requests vs one massive request. We're in 2016 and unless you're dealing with poor internet connections, you should do it by making multiple requests.
You're developing your app for 99% of the population of the globe or for the 1% part which uses Opera ( for the turbo feature )?
Also, if we are talking about designing a REST api, being consistent is probably the main idea of such kind of API. In your case, if you write the second method you'll have to write something similar into the rest of your code to keep consistency of all controllers, which can't be done.
Also, an API is an API and should not be modified when a front end application is being built on that API. Think at the situation when you have 10 apps that are requesting your API, but you're in the same situation you presented, but you need a different response for each. You're going to add new methods to the API ? That's a bad practice
Routing in a REST api should be done accordingly to the logical resources you have ( objects that can be manipulated with HTTP verbs ) :
Examples:
* when you have a relation between entities
/users/3/accounts // should return a list of accounts from user 3
/users/3/accounts/2 // should return account 2 from user 3
* custom actions on your logical resources
/users/3/approve
/accounts/2/disable
Also a good API should be able to offer partial requests ( for example adding some querystrings parameters to an usual request : users/3?fields=Name,FirstName ), versioning, documentation ( apiDocs.js is very userful in this case ), same request type ( json ), pagination, token auth and compression ( I've never done the last one :) )
I recommend modifying your endpoint for serving user resources to accept (through a query parameter) what associated resources should be included as a part of the respose:
/api/users?includes=groups&includes=constraints
This is a very flexible solution and if your requirement expands in future, you can easily support more associated resources.
If you concerned about how to structure the response, I recommend taking a look at JSON:API.
An advantage of using JSON:API is that if multiple users share same groups/constraints then you will not have to fetch multiple representations of the same entity.
You're asking
which is better...
and that's not acceptable according to SO rules, so I will assume your question lies around what is REST supposed to do in certain cases.
REST is based on resources, which, in their own, are like objects with their own data and "accessors/getters".
When you're asking for /api/users/:id/groups, you're telling that you want to access, from the users resource/table, a specific user with an id and, from that user, the list of groups he/she belongs to (or owns, or whatever interpretation you want). In any case, the query is very specific for this groups resource (which, in REST terms, is unique, because every resource is universally directionable through its URL) and should not "collision" or be misinterpreted with another resource. If more than one object could be targeted with the URL (for example, if you call just /api/groups), then your resource is an array.
Taking that in consideration, I would consider (and recommend) always returning the biggest selection that matches the use case that you specify. For example, if I would create that API, I would probably do:
/users list of all users
/users/:id a specific user
/users/:id/groups groups to which the specific user has joined
/users/groups list of all groups that has at least one user
/users/groups/:id description of a particular group from /users/groups (detail)
etc...
Every user case should be completely defined by the URL. If not, then your REST specification is probably (and I'm almost sure) wrong.
==========
So, answering your question, considering that description, I would say your answer is: it depends what you need to do.
If you want to show every single constraint in the main page (possible, but not too elegant/crap UI), then yes, the first approach is fine (hope your end users don't kill you).
If you want to only ask for resources on demand (for example, when popping up info on mouse hover or after going to a specific section in the admin site), then your solution is something like the second one (yes, that's a big one... only if you really need that resource, you should use it).
Have you considered the path /api/users/:id/constraints on demand?
BTW, withConstraintsAndGroups is not a REST resource. REST resources are nouns (constraints or groups, but not both). Not verbs or adjectives.
I doesnt seem to me that for the admin page performance is too much of an issue. The only difference between the pretty much is, that in #1 you have 3 api calls, with #2 only one. The data should be the same, the data should be reasonable, not extremely huge. So if #1 is easier to code and maintain you should go with that. You should not have any performance issues.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a functionality in my project where I have implemented the search functionality.
On submitting the form via ajax, I need to show all the results in a division.
There are two ways I can do this.
Getting the JSON data as ajax response and bind it to the HTML elements.
I can also get the completely formatted HTML response from the ajax call and can directly bind it to the result div on the search page.
So which way is suggestible ?
To make a service (server-side script) the most re-usable or even make it into an API - the suggested way is to return JSON data (converted from data models) to the front end, where using JavaScript you can populate the data to the HTML.
As for the HTML - you can certainly make the server return the response as HTML (setting the correct mime & content type in headers) but this gives the server control over the UI layer and the separation between the interface and the server/db is not balanced properly...
Either option is fine, depending on your how much html there is and how much server-side processing you need to do on the HTML. If it is just a div and a value that needs to be inserted, then I say just go with JSON. The JSON approach will be more lightweight (consumes less bandwidth and keeps the role of the server as an API that is transferable to non web-page requests).
If you need to do a lot of server-side processing and assembling and what you are returning is really a sub-page, then you might consider html from the server. In this case have a partial html file that you read and send (inserting data where relevant) rather than building the html from strings on the fly. If you have a partial file, then you can edit and check it with standard html editors and you can see the html layout easily and it keeps the UI aspects separate from the business logic.
I'd send it as JSON and build the HTML client side for a few reasons:
The JSON payload would be lighter and therefore faster to send over the wire
The API becomes more reusable (if you ever wanted to hook up additional clients that render differently etc.)
If you build the HTML on the client side then it's probably easier to take advantage of templating libraries (e.g. JQuery Templates) or even better, directly binding the data to the UI (such as Knockout)
I always do the JSON response. To me it looks like a much more consistent and flexible way since you can return more data than only presentation. If you still want to return HTML you can also do it through a JSON response:
{
error: false
html: "<div>Done!</div>"
}
I'd send it as a JSON response, too.
As suggested by emiolioicai in his code, with JSON you can easily handle errors. For example:
{
error: true
error-message: wrong parameters
}
If you define your HTML client side, in the future you will be able to use the same AJAX request and customize HTML differently in another part of your website.