So, I am in the process of learning JavaScript and was wondering whether I could use a custom header field of a GET request to pass certain parameters to tell the Node-Server how to handle the data passed in the URL.
As I understand, this is not standard practice, but are there actually technical reasons that would prohibit such a use case?
It is possible to pass the actions via Headers to the Server.
Though it is not in practice because it brings security risks to the application. Complete control of operation is with the Client and Clients can be compromised to do some malicious activity.
The whole point of having a Back End in between your app data and your client is to mitigate the security risks by having more control over the operation.
One thumb rule that you can continue with: The Clint (The Website, Apps etc) are dumb and its the responsibility of Server to take preventive measures.
That's why passing operations via headers could be dangerous.
But if you are asking operations like Add, Delete, Update etc; you can use REST API specs. This gives you different type of HTTP request for different kind of operations.
GET - Read
POST - Create
PUT - Update
DELETE - Delete
And response can be a good readable message and/or HTTP Codes like
200 - Ok
404 - Not Found
403 - Unauthorized
204 - Ok but no content
More descriptive list: https://httpstatuses.com/
Related
In theory, one should use the HTTP GET method only for idempotent requests.
But, for some intricate reasons, I cannot use any other method than GET and my requests are not idempotent (they mutate the database). So my idea is to use the Cache-Control: no-cache header to ensure that any GET request actually hits the database. Also, I cannot change the URLs which means I cannot append a random URL argument to bust caches.
Am I safe or shall I implement some kind of mechanism to ensure that the GET request was received exactly once? (The client is the browser and the server is Node.js.)
What about a GET request that gets duplicated by some middle-man resulting in the same GET request being received twice by the server? I believe the spec allows such situation but does this ever happen in "real life"?
I've never seen a middle man, such as Cloudflare or NGNIX, preventing or duplicating a GET request with Cache-Control: no-cache.
Let's start by saying what you've already pointed out -- GET requests should be idempotent. That is, they should not modify the resource and therefore should return the same thing every time (barring any other methods being used to modify it in the meantime.)
It's worth pointing out, as restcookbook.com notes, that this doesn't mean nothing can change as a result of the request. Rather, the resource's representation should not change. So for instance, your database might log the request, but shouldn't return a different value in the response.
The main concern you've listed is middleware caching.
The danger isn't that the middleware sends the request to your server more than once (you mentioned 'duplicating' a request), but rather that (a) it sends an old, cached, no-longer-accurate response to whatever is making the request, and (b) the request does not reach the server.
For instance, imagine a response returning a count property that starts at 0 and increments when the GET endpoint is hit. Request #1 will return "1" as the count. Request #2 should now return "2" as the count, but if its cached, it might still show as 1, and not hit the server to increase the count to 2. That's 2 separate problems you have (caching, and not updating).
So, will a middleware prevent a request from reaching the server and serve a cached copy instead? We don't know. It depends on the middleware. You can absolutely write one right now that does just that. You can also write one that doesn't.
If you don't know what will be consuming your API, then it's not a great option. But whether it's "safe" depends on the specifics.
As you know, it's always best to follow the set of expectations that comes with the grammar of HTTP requests. Deviating from them sets yourself up for failure in many ways. (For instance, there are different security expectations for requests based on method. A browser may treat a GET request as "simple" from a CORS perspective, while it would never treat a PATCH request as such.)
I would go to great lengths to not break this convention, but if I were forced by circumstances to break this expectation, I would definitely note it in my APIs documentation.
One workaround to ensure that your GET request is only called once is to allow caching of responses and use the Vary header. The spec for the Vary header can be found here.
In summary, a Vary header basically tells any HTTP cache, which parts of the request header to take into account when trying to find the cached object.
For example, you have an endpoint /api/v1/something that accepts GET requests and does the required database updates. Let's say that when successful, this endpoint returns the following response.
HTTP/1.1 200 OK
Content-Length: 3458
Cache-Control: max-age=86400
Vary: X-Unique-ID
Notice the Vary header has a value of X-Unique-ID. This means that if you include the X-Unique-ID header in your request, any HTTP caching layer (be it the browser, CDN, or other middleware) will use the value in this header to determine whether to use a previously cached response or not.
Say your make a first request that includes a X-Unique-ID header with the value id_1 then you make a subsequent request with X-Unique-ID value of id_2. The caching layer will not use a previously cached response for the second request because the value of the X-Unique-ID is different.
However, if you make another request that contains the X-Unique-ID value of id_1 again, the caching layer will not make a request to the backend but instead reuse the cached response for the first request assuming that the cache hasn't expired yet.
One thing you have to consider though is this will only work if the caching layer actually respects the specifications for the Vary header.
The Hypertext Transfer Protocol (HTTP) is designed to enable communications between clients and servers.
where Get method is used to request the data from specified resources.
When we used 'Cache-control: no-cache' it means the cache can't store anything about the client request
or server responses. That Request hits to the server and a full response is downloaded each and every time.
This depends a lot on what's sat in the middle and where the retry logic sits, if there is any. Almost all of your problems will be in failure handling and retry handling - not the basic requests.
Let's say, for example that Alice talks to Bob via a proxy. Let's assume for the sake of simplicity that the requests are small and the proxy logic is pure store-and-forward. i.e. most of the time a request either gets through or doesn't but is unlikely to get stalled half-way through. There's no guarantee this is the case and some proxies will stop requests part-way through by design.
Alice -> Proxy GET
Proxy -> Bob GET
Bob -> Proxy 200
Proxy -> Alice 200
So far so good. Now imagine Bob dies before responding to the proxy. Does the proxy retry? If so, we have this:
Alice -> Proxy GET
Proxy -> Bob GET
Bob manipulates database then dies
Proxy -> Bob GET (retry)
Now we have a dupe
Unlikely, but possible.
Now imagine (much more likely) that the proxy (or even more likely, some bit of the network between the proxy and the client) dies. Does the client retry? If so, we have this:
Alice -> Proxy GET
Proxy -> Bob GET
Bob -> Proxy 200
Proxy or network dies
Alice -> Proxy GET (retry)
Proxy -> Bob GET
Is this a dupe or not? Depends on your point of view
Plus, for completeness there's also the degenerate case where the server receives the request zero times.
This question is related to Cross-Origin Resource Sharing (CORS, http://www.w3.org/TR/cors/).
If there is an error when making a CORS request, Chrome (and AFAIK other browsers as well) logs an error to the error console. An example message may look like this:
XMLHttpRequest cannot load http://domain2.example. Origin http://domain1.example is not allowed by Access-Control-Allow-Origin.
I'm wondering if there's a way to programmatically get this error message? I've tried wrapping my xhr.send() call in try/catch, I've also tried adding an onerror() event handler. Neither of which receives the error message.
See:
http://www.w3.org/TR/cors/#handling-a-response-to-a-cross-origin-request
...as well as notes in XHR Level 2 about CORS:
http://www.w3.org/TR/XMLHttpRequest2/
The information is intentionally filtered.
Edit many months later: A followup comment here asked for "why"; the anchor in the first link was missing a few characters which made it hard to see what part of the document I was referring to.
It's a security thing - an attempt to avoid exposing information in HTTP headers which might be sensitive. The W3C link about CORS says:
User agents must filter out all response headers other than those that are a simple response header or of which the field name is an ASCII case-insensitive match for one of the values of the Access-Control-Expose-Headers headers (if any), before exposing response headers to APIs defined in CORS API specifications.
That passage includes links for "simple response header", which lists Cache-Control, Content-Language, Content-Type, Expires, Last-Modified and Pragma. So those get passed. The "Access-Control-Expose-Headers headers" part lets the remote server expose other headers too by listing them in there. See the W3C documentation for more information.
Remember you have one origin - let's say that's the web page you've loaded in your browser, running some bit of JavaScript - and the script is making a request to another origin, which isn't ordinarily allowed because malware can do nasty things that way. So, the browser, running the script and performing the HTTP requests on its behalf, acts as gatekeeper.
The browser looks at the response from that "other origin" server and, if it doesn't seem to be "taking part" in CORS - the required headers are missing or malformed - then we're in a position of no trust. We can't be sure that the script running locally is acting in good faith, since it seems to be trying to contact servers that aren't expecting to be contacted in this way. The browser certainly shouldn't "leak" any sensitive information from that remote server by just passing its entire response to the script without filtering - that would basically be allowing a cross-origin request, of sorts. An information disclosure vulnerability would arise.
This can make debugging difficult, but it's a security vs usability tradeoff where, since the "user" is a developer in this context, security is given significant priority.
I've read some articles about HTTP statuses, and i'm still not sure why i should specify them in my app. Let's say a user provided wrong password while trying to login. In my express app i can send response in those ways :
res.status(401).json({ message: 'Wrong password' })
or
res.json({ message: 'Wrong password' })
Then my application logic is based on the message value.
I understand that in the second case the status code is going to be 200 OK , but why should i care about it ?
Many reasons. A few off the top of my head:
Caching. 2xx responses are cacheable by default. You shouldn't be sending a 200 response for an error or it may look to the user like they can never login!
Compatibility with client libraries. It sure is nice to be able to use things that are standards compliant, so you can maybe do a .catch() and show the user an error rather than having a ton of custom logic to figure out whether the response was good or not.
Status codes also come from other code in the chain. You're often not just talking to your app, but usually a proxy on your end. And then, there may be several proxies between the client and you. Why should your client code have to figure out both your application-level signalling and the status code signalling when they can be one of the same? Ask someone with satellite internet... they'll tell you about what a pain intermediate proxies can be when people don't follow the standards.
Automatic error logging. Surely you use some system to track errors in your system, right? If you use standard HTTP response codes, they can be categorized automatically.
You need to care about HTTP because you are relying on it for your application to work. Using the appropriate HTTP verbs and status codes is necessary for this, allowing e.g. middleware or browser caches to do their work. You don't want your error message to be cached.
There are plenty of reasons. At the highest, clearest level - it's accessibility / universality.
Consider a few scenarios:
What if the consumer of the request doesn't read the same language as your app?
Your web browser has immense power when it comes to intelligently sorting HTTP requests / responses. Many HTTP 2xx responses are cached; so on and so forth.
Many intermediary services, from firewalls to traffic filters and everything between, rely on status codes to make various calculations they need
Reliability. It's much safer to rely on finding a 401 Not Authorized than it is to rely on wrong password in your message field. What if you change the string, even just adding a . or captilizing something?
Actually, I´ve create a Batch HTTP API that receives a JSON array with many different requests to our backend server. The Batch API just call all of these requests to a load balancer, wait for the return of all of them and return a new JSON to the client.
The client receives a huge JSON array response with its indices in the same position as the request, so it is easy to know what response is addressed for what request.
The motivation for this API was to solve the 5 browser simultaneous connections and improve performance as the Batch API has a much more direct access to the server (we do not have a reverse proxy or a SSL server between then).
The service is running pretty well, but now I have some new requirements as it is gaining more use. First, the service can use a lot of memory as it has a buffer for each request that will only be flushed when all responses are ready (I am using an ordered JSON Array). More, since it can take some time to all requests be delivered, the client will need to wait everything be processed before receiving a single byte.
I am planning change the service to return each response as soon it is available (and solve both issues above). And would like to share and validate my ideas with you:
I will change the response from a JSON response to a multipart response.
The server will include, for every part, the index of the response
The server will flush the response once its available
The client XHR will need to understand multipart content type response and be able to process each part as soon as it is available.
I will create a PoC to validate every step, but at this moment I would like to validate the idea and hear some thoughts about it. Here some doubts I have about the solution:
From what I read, I am in doubbt of that content-type is right for the response. multipart/mixed? multipart/digest?
Can I use an accept request header to identify if the client is able to handle the new service implementation? If so, what is the right accept header for this? My plan is to use the same endpoint but very accept header.
How can I develop a XHR client that is able to process many parts of a single response as soon as they are available? I found some ideias on the Web but I am not entirely confident with then.
I will change the response from a JSON response to a multipart
response.
The server will include, for every part, the index of the
response
The server will flush the response once its available
The
client XHR will need to understand multipart content type response and
be able to process each part as soon as it is available.
The XHR protocol will not support this work flow through a single request from the client. Since XHR relies heavily on the HTTP protocol for communications, XHR follows the HTTP connection rules. The first and most important rule: HTTP connections are always initiated by the client. Another rule: XHR returns the entire content-body or fails.
The implications for your workflow is that each part of the multipart response must be requested individually by the client.
From what I read, I am in doubbt of that content-type is right for the
response. multipart/mixed? multipart/digest?
You should be in doubt as there is no provision in the specfication to do this. The response-type attribute is limited to the empty string (default), "arraybuffer", "blob", "document", "json", and "text". it is possible to set the override MIME type header, but that does not change the response type. Event given that case, the XHR spec is very clear about what it will send back. It is one of the types listed above as documented here.
Can I use an accept
request header to identify if the client is able to handle the new
service implementation? If so, what is the right accept header for
this? My plan is to use the same endpoint but very accept header.
Custom HTTP headers are designed to assist us in telling the server what our capabilities are on the client. This is easily done. it doesn't necessarily have to be in the accept header (as that also is a defined list of MIME types).
How
can I develop a XHR client that is able to process many parts of a
single response as soon as they are available? I found some ideias on
the Web but I am not entirely confident with then.
XHR is processed natively by the client and cannot be overridden for all sorts of security reasons. So this is unlikely to be available as a solution for this reason.
Note: ordinarily one might suggest the use of a custom version of Chromium, but your constraints do not allow for that.
I'm trying to determine the best practice in a REST API for determining whether the client can access a particular resource. Two quick example scenarios:
A phone directory lookup service. Client looks up a phone number by accessing eg.
GET http://host/directoryEntries/numbers/12345
... where 12345 is the phone number to try and find in the directory. If it exists, it would return information like the name and address of the person whose phone number it is.
A video format shifting service. Client submits a video in one format to eg.
POST http://host/videos/
... and receives a 'video GUID' which has been generated by the server for this video. Client then checks eg.
GET http://host/videos/[GUID]/flv
... to get the video, converted into the FLV format, if the converted version exists.
You'll notice that in both cases above, I didn't mention what should happen if the resource being checked for doesn't exist. That's my question here. I've read in various other places that the proper RESTful way for the client to check whether the resource exists here is to call HEAD (or maybe GET) on the resource, and if the resource doesn't exist, it should expect a 404 response. This would be fine, except that a 404 response is widely considered an 'error'; the HTTP/1.1 spec states that the 4xx class of status code is intended for cases in which the client 'seems to have erred'. But wait; in these examples, the client has surely not erred. It expects that it may get back a 404 (or others; maybe a 403 if it's not authorized to access this resource), and it has made no mistake whatsoever in requesting the resource. The 404 isn't intended to indicate an 'error condition', it is merely information - 'this does not exist'.
And browsers behave, as the HTTP spec suggests, as if the 404 response is a genuine error. Both Google Chrome and Firebug's console spew out a big red "404 Not Found" error message into the Javascript console each time a 404 is received by an XHR request, regardless of whether it was handled by an error handler or not, and there is no way to disable it. This isn't a problem for the user, as they don't see the console, but as a developer I don't want to see a bunch of 404 (or 403, etc.) errors in my JS console when I know perfectly well that they aren't errors, but information being handled by my Javascript code. It's line noise. In the second example I gave, it's line noise to the extreme, because the client is likely to be polling the server for that /flv as it may take a while to compile and the client wants to display 'not compiled yet' until it gets a non-404. There may be a 404 error appearing in the JS console every second or two.
So, is this the best or most proper way we have with REST to check for the existence of a resource? How do we get around the line noise in the JS console? It may well be suggested that, in my second example, a different URI could be queried to check the status of the compilation, like:
GET http://host/videos/[GUID]/compileStatus
... however, this seems to violate the REST principle a little, to me; you're not using HTTP to its full and paying attention to the HTTP headers, but instead creating your own protocol whereby you return information in the body telling you what you want to know instead, and always return an HTTP 200 to shut the browser up. This was a major criticism of SOAP - it tries to 'get around' HTTP rather than use it to its full. By this principle, why does one ever need to return a 404 status code? You could always return a 200 - of course, the 200 is indicating that the a resource's status information is available, and the status information tells you what you really wanted to know - the resource was not found. Surely the RESTful way should be to return a 404 status code.
This mechanism seems even more contrived if we apply it to the first of my above examples; the client would perhaps query:
GET http://host/directoryEntries/numberStatuses/12345
... and of course receive a 200; the number 12345's status information exists, and tells you... that the number is not found in the directory. This would mean that ANY number queried would be '200 OK', even though it may not exist - does this seem like a good REST interface?
Am I missing something? Is there a better way to determine whether a resource exists RESTfully, or should HTTP perhaps be updated to indicate that non-2xx status codes should not necessarily be considered 'errors', and are just information? Should browsers be able to be configured so that they don't always output non-2xx status responses as 'errors' in the JS console?
PS. If you read this far, thanks. ;-)
It is perfectly okay to use 404 to indicate that resource is not found. Some quotes from the book "RESTful Web Services" (very good book about REST by the way):
404 indicates that the server can’t map the client’s URI to a
resource. [...] A web service may use a 404 response as a signal to
the client that the URI is “free”; the client can then create a new
resource by sending a PUT request to that URI. Remember that a 404 may
be a lie to cover up a 403 or 401. It might be that the resource
exists, but the server doesn’t want to let the client know about it.
Use 404 when service can't find requested resource, do not overuse to indicate the errors which are actually not relevant to the existence of resource. Also, client may "query" the service to know whether this URI is free or not.
Performing long-running operations like encoding of video files
HTTP has a synchronous request-response model. The client opens an
Internet socket to the server, makes its request, and keeps the socket
open until the server has sent the response. [...]
The problem is not all operations can be completed in the time we
expect an HTTP request to take. Some operations take hours or days. An
HTTP request would surely be timed out after that kind of inactivity.
Even if it didn’t, who wants to keep a socket open for days just
waiting for a server to respond? Is there no way to expose such
operations asynchronously through HTTP?
There is, but it requires that the operation be split into two or more
synchronous requests. The first request spawns the operation, and
subsequent requests let the client learn about the status of the
operation. The secret is the status code 202 (“Accepted”).
So you could do POST /videos to create a video encoding task. The service will accept the task, answer with 202 and provide a link to a resource describing the state of the task.
202 Accepted
Location: http://tasks.example.com/video/task45543
Client may query this URI to see the status of the task. Once the task is complete, representation of resource will become available.
I think you have changed the semantics of the request.
With a RESTful architecture, you are requesting a resource. Therefore requesting a resource that does not exist or not found is considered an error.
I use:
404 if GET http://host/directoryEntries/numbers/12345 does not exist.
400 is actually a bad request 400 Bad Request
Perhaps, in your case you could think about searching instead.
Searches are done with query parameters on a collection of resources
What you want is
GET http://host/directoryEntries/numbers?id=1234
Which would return 200 and an empty list if none exist or a list of matches.
IMO the client has indeed erred in requesting a non-existent resource. In both your examples the service can be designed in a different way so an error can be avoided on the client side. For example, in the video conversion service as the GUID has already been assigned, the message body at videos/id can contain a flag indicating whether the conversion was done or not.
Similarly, in the phone directory example, you are searching for a resource and this can be handled through something like /numbers/?search_number=12345 etc. so that the server returns a list of matching resources which you can then query further.
Browsers are designed for working with the HTTP spec and showing an error is a genuine response (pretty helpful too). However, you need to think about your Javascript code as a separate entity from the browser. So you have your Javascript REST client which knows what the service is like and the browser which is sort of dumb with regards to your service.
Also, REST is independent of protocols in theory. HTTP happens to be the most common protocol where REST is used. Another example I can think of is Android content providers whose design is RESTful but not dependent on HTTP.
I've only ever seen GET/HEAD requests return 404 (Not Found) when a resource doesn't exist. I think if you are trying to just get a status of a resource a head request would be fine as it shouldn't return the body of a resource. This way you can differentiate between requests where you are trying to retrieve the resource and requests where you are trying to check for their existance.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
Edit: I remember reading about an alternative solution by adding a header to the original request that indicated how the server should handle 404 errors. Something along the lines of responding with 200, but an empty body.