Create a Batch HTTP API With multipart response

Create a Batch HTTP API With multipart response - javascript

Actually, I´ve create a Batch HTTP API that receives a JSON array with many different requests to our backend server. The Batch API just call all of these requests to a load balancer, wait for the return of all of them and return a new JSON to the client.
The client receives a huge JSON array response with its indices in the same position as the request, so it is easy to know what response is addressed for what request.
The motivation for this API was to solve the 5 browser simultaneous connections and improve performance as the Batch API has a much more direct access to the server (we do not have a reverse proxy or a SSL server between then).
The service is running pretty well, but now I have some new requirements as it is gaining more use. First, the service can use a lot of memory as it has a buffer for each request that will only be flushed when all responses are ready (I am using an ordered JSON Array). More, since it can take some time to all requests be delivered, the client will need to wait everything be processed before receiving a single byte.
I am planning change the service to return each response as soon it is available (and solve both issues above). And would like to share and validate my ideas with you:
I will change the response from a JSON response to a multipart response.
The server will include, for every part, the index of the response
The server will flush the response once its available
The client XHR will need to understand multipart content type response and be able to process each part as soon as it is available.
I will create a PoC to validate every step, but at this moment I would like to validate the idea and hear some thoughts about it. Here some doubts I have about the solution:
From what I read, I am in doubbt of that content-type is right for the response. multipart/mixed? multipart/digest?
Can I use an accept request header to identify if the client is able to handle the new service implementation? If so, what is the right accept header for this? My plan is to use the same endpoint but very accept header.
How can I develop a XHR client that is able to process many parts of a single response as soon as they are available? I found some ideias on the Web but I am not entirely confident with then.

I will change the response from a JSON response to a multipart
response.
The server will include, for every part, the index of the
response
The server will flush the response once its available
The
client XHR will need to understand multipart content type response and
be able to process each part as soon as it is available.
The XHR protocol will not support this work flow through a single request from the client. Since XHR relies heavily on the HTTP protocol for communications, XHR follows the HTTP connection rules. The first and most important rule: HTTP connections are always initiated by the client. Another rule: XHR returns the entire content-body or fails.
The implications for your workflow is that each part of the multipart response must be requested individually by the client.
From what I read, I am in doubbt of that content-type is right for the
response. multipart/mixed? multipart/digest?
You should be in doubt as there is no provision in the specfication to do this. The response-type attribute is limited to the empty string (default), "arraybuffer", "blob", "document", "json", and "text". it is possible to set the override MIME type header, but that does not change the response type. Event given that case, the XHR spec is very clear about what it will send back. It is one of the types listed above as documented here.
Can I use an accept
request header to identify if the client is able to handle the new
service implementation? If so, what is the right accept header for
this? My plan is to use the same endpoint but very accept header.
Custom HTTP headers are designed to assist us in telling the server what our capabilities are on the client. This is easily done. it doesn't necessarily have to be in the accept header (as that also is a defined list of MIME types).
How
can I develop a XHR client that is able to process many parts of a
single response as soon as they are available? I found some ideias on
the Web but I am not entirely confident with then.
XHR is processed natively by the client and cannot be overridden for all sorts of security reasons. So this is unlikely to be available as a solution for this reason.
Note: ordinarily one might suggest the use of a custom version of Chromium, but your constraints do not allow for that.

Related

Is it possible for an HTTP `GET` request with `Cache-Control: no-cache` to not hit the server exactly once? (Levering out idempotency of `GET`.)

In theory, one should use the HTTP GET method only for idempotent requests.
But, for some intricate reasons, I cannot use any other method than GET and my requests are not idempotent (they mutate the database). So my idea is to use the Cache-Control: no-cache header to ensure that any GET request actually hits the database. Also, I cannot change the URLs which means I cannot append a random URL argument to bust caches.
Am I safe or shall I implement some kind of mechanism to ensure that the GET request was received exactly once? (The client is the browser and the server is Node.js.)
What about a GET request that gets duplicated by some middle-man resulting in the same GET request being received twice by the server? I believe the spec allows such situation but does this ever happen in "real life"?
I've never seen a middle man, such as Cloudflare or NGNIX, preventing or duplicating a GET request with Cache-Control: no-cache.

Let's start by saying what you've already pointed out -- GET requests should be idempotent. That is, they should not modify the resource and therefore should return the same thing every time (barring any other methods being used to modify it in the meantime.)
It's worth pointing out, as restcookbook.com notes, that this doesn't mean nothing can change as a result of the request. Rather, the resource's representation should not change. So for instance, your database might log the request, but shouldn't return a different value in the response.
The main concern you've listed is middleware caching.
The danger isn't that the middleware sends the request to your server more than once (you mentioned 'duplicating' a request), but rather that (a) it sends an old, cached, no-longer-accurate response to whatever is making the request, and (b) the request does not reach the server.
For instance, imagine a response returning a count property that starts at 0 and increments when the GET endpoint is hit. Request #1 will return "1" as the count. Request #2 should now return "2" as the count, but if its cached, it might still show as 1, and not hit the server to increase the count to 2. That's 2 separate problems you have (caching, and not updating).
So, will a middleware prevent a request from reaching the server and serve a cached copy instead? We don't know. It depends on the middleware. You can absolutely write one right now that does just that. You can also write one that doesn't.
If you don't know what will be consuming your API, then it's not a great option. But whether it's "safe" depends on the specifics.
As you know, it's always best to follow the set of expectations that comes with the grammar of HTTP requests. Deviating from them sets yourself up for failure in many ways. (For instance, there are different security expectations for requests based on method. A browser may treat a GET request as "simple" from a CORS perspective, while it would never treat a PATCH request as such.)
I would go to great lengths to not break this convention, but if I were forced by circumstances to break this expectation, I would definitely note it in my APIs documentation.

One workaround to ensure that your GET request is only called once is to allow caching of responses and use the Vary header. The spec for the Vary header can be found here.
In summary, a Vary header basically tells any HTTP cache, which parts of the request header to take into account when trying to find the cached object.
For example, you have an endpoint /api/v1/something that accepts GET requests and does the required database updates. Let's say that when successful, this endpoint returns the following response.
HTTP/1.1 200 OK
Content-Length: 3458
Cache-Control: max-age=86400
Vary: X-Unique-ID
Notice the Vary header has a value of X-Unique-ID. This means that if you include the X-Unique-ID header in your request, any HTTP caching layer (be it the browser, CDN, or other middleware) will use the value in this header to determine whether to use a previously cached response or not.
Say your make a first request that includes a X-Unique-ID header with the value id_1 then you make a subsequent request with X-Unique-ID value of id_2. The caching layer will not use a previously cached response for the second request because the value of the X-Unique-ID is different.
However, if you make another request that contains the X-Unique-ID value of id_1 again, the caching layer will not make a request to the backend but instead reuse the cached response for the first request assuming that the cache hasn't expired yet.
One thing you have to consider though is this will only work if the caching layer actually respects the specifications for the Vary header.

The Hypertext Transfer Protocol (HTTP) is designed to enable communications between clients and servers.
where Get method is used to request the data from specified resources.
When we used 'Cache-control: no-cache' it means the cache can't store anything about the client request
or server responses. That Request hits to the server and a full response is downloaded each and every time.

This depends a lot on what's sat in the middle and where the retry logic sits, if there is any. Almost all of your problems will be in failure handling and retry handling - not the basic requests.
Let's say, for example that Alice talks to Bob via a proxy. Let's assume for the sake of simplicity that the requests are small and the proxy logic is pure store-and-forward. i.e. most of the time a request either gets through or doesn't but is unlikely to get stalled half-way through. There's no guarantee this is the case and some proxies will stop requests part-way through by design.
Alice -> Proxy GET
Proxy -> Bob GET
Bob -> Proxy 200
Proxy -> Alice 200
So far so good. Now imagine Bob dies before responding to the proxy. Does the proxy retry? If so, we have this:
Alice -> Proxy GET
Proxy -> Bob GET
Bob manipulates database then dies
Proxy -> Bob GET (retry)
Now we have a dupe
Unlikely, but possible.
Now imagine (much more likely) that the proxy (or even more likely, some bit of the network between the proxy and the client) dies. Does the client retry? If so, we have this:
Alice -> Proxy GET
Proxy -> Bob GET
Bob -> Proxy 200
Proxy or network dies
Alice -> Proxy GET (retry)
Proxy -> Bob GET
Is this a dupe or not? Depends on your point of view
Plus, for completeness there's also the degenerate case where the server receives the request zero times.

Ignore response from a PUT call - javascript

I've a JS (Angular) client that makes a PUT request (REST API) to server and server sends back a large payload that I'm not using in the client currently.
Is there a way to just fire the request and ignore any response that comes back? The main need here is to avoid the data cost incurred by receiving that payload. I've looked at closing the connection once the request is fired, but am not sure if that's the best way to handle this.

If able, I think the only way to change this would be to change the api endpoint to not include a payload from the put request.
I'm assuming you are using angular's http class and using Observables. But even if you aren't, your angular client is going to need to read the response status sent back from the server to determine whether or not the put request was successful or not. In order to read the status, you'll need to response, and unfortunately the full response sent from the server.
You could close the connection right after the request, but as I've mentioned you'll have no way of knowing whether or not the request was successful.

To ignore the request just don't do anything if the request is successful.
If you don't want the request to exist at all then do it on the backend.

JSON/JSONP how to use for(;;); in the respose body

I can't seem to figure out a way to ignore the for(;;); in the response body of my cross domain JSONP requests. I am doing this on my own servers, nothing else going on here. I am trying to include that for(;;); inside the response body of my callback as such:
_callbacks_.callback(for(;;);[jsondata....]);
but how can I remove it from the response body before the JS code gets parsed? I am using the Google Closure Library btw.

Ok I think I figured it out.
The reason why the for(;;); is there is to prevent cross-domain data requests of certain information. So basically if you have information you are trying to protect you go through a normal Ajax JSON channel and if you are storing data on multiple servers you deal with it on server level.
JSONP requests are actually a remote script inclusion, which means whatever the server outputs is actual Javascript code, so if you have a for(;;); before your _callbacks_.callback(); the code will be executed on the origin domain on request success. If it's an infinite for loop, it will obviously jam the page.
So the normal implementation method is the following:
Send a normal Ajax request to a file located on the same server.
Perform the server level stuff and send requests to external servers via encrypted CURL.
Add security to the server response(a for(;;); or while(1); or throw(1); followed by a <prevent eval statements> string.
Get the response as a text string.
Remove your security implementations from the string.
Convert the string(which is now a "JSON string") to a JS Object/Array etc with a standard JSON parser.
Do whatever you want to do with the data.
Just thought I should put this out here in case someone else will Google it in the future, as I didn't find proper information by Google-ing. This should help prevent cross domain request forgery.

What's the RESTful way to check whether the client can access a resource?

I'm trying to determine the best practice in a REST API for determining whether the client can access a particular resource. Two quick example scenarios:
A phone directory lookup service. Client looks up a phone number by accessing eg.
GET http://host/directoryEntries/numbers/12345
... where 12345 is the phone number to try and find in the directory. If it exists, it would return information like the name and address of the person whose phone number it is.
A video format shifting service. Client submits a video in one format to eg.
POST http://host/videos/
... and receives a 'video GUID' which has been generated by the server for this video. Client then checks eg.
GET http://host/videos/[GUID]/flv
... to get the video, converted into the FLV format, if the converted version exists.
You'll notice that in both cases above, I didn't mention what should happen if the resource being checked for doesn't exist. That's my question here. I've read in various other places that the proper RESTful way for the client to check whether the resource exists here is to call HEAD (or maybe GET) on the resource, and if the resource doesn't exist, it should expect a 404 response. This would be fine, except that a 404 response is widely considered an 'error'; the HTTP/1.1 spec states that the 4xx class of status code is intended for cases in which the client 'seems to have erred'. But wait; in these examples, the client has surely not erred. It expects that it may get back a 404 (or others; maybe a 403 if it's not authorized to access this resource), and it has made no mistake whatsoever in requesting the resource. The 404 isn't intended to indicate an 'error condition', it is merely information - 'this does not exist'.
And browsers behave, as the HTTP spec suggests, as if the 404 response is a genuine error. Both Google Chrome and Firebug's console spew out a big red "404 Not Found" error message into the Javascript console each time a 404 is received by an XHR request, regardless of whether it was handled by an error handler or not, and there is no way to disable it. This isn't a problem for the user, as they don't see the console, but as a developer I don't want to see a bunch of 404 (or 403, etc.) errors in my JS console when I know perfectly well that they aren't errors, but information being handled by my Javascript code. It's line noise. In the second example I gave, it's line noise to the extreme, because the client is likely to be polling the server for that /flv as it may take a while to compile and the client wants to display 'not compiled yet' until it gets a non-404. There may be a 404 error appearing in the JS console every second or two.
So, is this the best or most proper way we have with REST to check for the existence of a resource? How do we get around the line noise in the JS console? It may well be suggested that, in my second example, a different URI could be queried to check the status of the compilation, like:
GET http://host/videos/[GUID]/compileStatus
... however, this seems to violate the REST principle a little, to me; you're not using HTTP to its full and paying attention to the HTTP headers, but instead creating your own protocol whereby you return information in the body telling you what you want to know instead, and always return an HTTP 200 to shut the browser up. This was a major criticism of SOAP - it tries to 'get around' HTTP rather than use it to its full. By this principle, why does one ever need to return a 404 status code? You could always return a 200 - of course, the 200 is indicating that the a resource's status information is available, and the status information tells you what you really wanted to know - the resource was not found. Surely the RESTful way should be to return a 404 status code.
This mechanism seems even more contrived if we apply it to the first of my above examples; the client would perhaps query:
GET http://host/directoryEntries/numberStatuses/12345
... and of course receive a 200; the number 12345's status information exists, and tells you... that the number is not found in the directory. This would mean that ANY number queried would be '200 OK', even though it may not exist - does this seem like a good REST interface?
Am I missing something? Is there a better way to determine whether a resource exists RESTfully, or should HTTP perhaps be updated to indicate that non-2xx status codes should not necessarily be considered 'errors', and are just information? Should browsers be able to be configured so that they don't always output non-2xx status responses as 'errors' in the JS console?
PS. If you read this far, thanks. ;-)

It is perfectly okay to use 404 to indicate that resource is not found. Some quotes from the book "RESTful Web Services" (very good book about REST by the way):
404 indicates that the server can’t map the client’s URI to a
resource. [...] A web service may use a 404 response as a signal to
the client that the URI is “free”; the client can then create a new
resource by sending a PUT request to that URI. Remember that a 404 may
be a lie to cover up a 403 or 401. It might be that the resource
exists, but the server doesn’t want to let the client know about it.
Use 404 when service can't find requested resource, do not overuse to indicate the errors which are actually not relevant to the existence of resource. Also, client may "query" the service to know whether this URI is free or not.
Performing long-running operations like encoding of video files
HTTP has a synchronous request-response model. The client opens an
Internet socket to the server, makes its request, and keeps the socket
open until the server has sent the response. [...]
The problem is not all operations can be completed in the time we
expect an HTTP request to take. Some operations take hours or days. An
HTTP request would surely be timed out after that kind of inactivity.
Even if it didn’t, who wants to keep a socket open for days just
waiting for a server to respond? Is there no way to expose such
operations asynchronously through HTTP?
There is, but it requires that the operation be split into two or more
synchronous requests. The first request spawns the operation, and
subsequent requests let the client learn about the status of the
operation. The secret is the status code 202 (“Accepted”).
So you could do POST /videos to create a video encoding task. The service will accept the task, answer with 202 and provide a link to a resource describing the state of the task.
202 Accepted
Location: http://tasks.example.com/video/task45543
Client may query this URI to see the status of the task. Once the task is complete, representation of resource will become available.

I think you have changed the semantics of the request.
With a RESTful architecture, you are requesting a resource. Therefore requesting a resource that does not exist or not found is considered an error.
I use:
404 if GET http://host/directoryEntries/numbers/12345 does not exist.
400 is actually a bad request 400 Bad Request
Perhaps, in your case you could think about searching instead.
Searches are done with query parameters on a collection of resources
What you want is
GET http://host/directoryEntries/numbers?id=1234
Which would return 200 and an empty list if none exist or a list of matches.

IMO the client has indeed erred in requesting a non-existent resource. In both your examples the service can be designed in a different way so an error can be avoided on the client side. For example, in the video conversion service as the GUID has already been assigned, the message body at videos/id can contain a flag indicating whether the conversion was done or not.
Similarly, in the phone directory example, you are searching for a resource and this can be handled through something like /numbers/?search_number=12345 etc. so that the server returns a list of matching resources which you can then query further.
Browsers are designed for working with the HTTP spec and showing an error is a genuine response (pretty helpful too). However, you need to think about your Javascript code as a separate entity from the browser. So you have your Javascript REST client which knows what the service is like and the browser which is sort of dumb with regards to your service.
Also, REST is independent of protocols in theory. HTTP happens to be the most common protocol where REST is used. Another example I can think of is Android content providers whose design is RESTful but not dependent on HTTP.

I've only ever seen GET/HEAD requests return 404 (Not Found) when a resource doesn't exist. I think if you are trying to just get a status of a resource a head request would be fine as it shouldn't return the body of a resource. This way you can differentiate between requests where you are trying to retrieve the resource and requests where you are trying to check for their existance.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
Edit: I remember reading about an alternative solution by adding a header to the original request that indicated how the server should handle 404 errors. Something along the lines of responding with 200, but an empty body.

jQuery.getJSON - Access-Control-Allow-Origin Issue

I'm jusing jQuery's $.getJSON() function to return a short set of JSON data.
I've got the JSON data sitting on a url such as example.com.
I didn't realize it, but as I was accessing that same url, the JSON data couldn't be loaded. I followed through the console and found that XMLHttpRequest couldn't load due to Access-Control-Allow-Origin.
Now, I've read through, a lot of sites that just said to use $.getJSON() and that would be the work around, but obviously it didn't work. Is there something I should change in the headers or in the function?
Help is greatly appreciated.

It's simple, use $.getJSON() function and in your URL just include
callback=?
as a parameter. That will convert the call to JSONP which is necessary to make cross-domain calls. More info: http://api.jquery.com/jQuery.getJSON/

You may well want to use JSON-P instead (see below). First a quick explanation.
The header you've mentioned is from the Cross Origin Resource Sharing standard. Beware that it is not supported by some browsers people actually use, and on other browsers (Microsoft's, sigh) it requires using a special object (XDomainRequest) rather than the standard XMLHttpRequest that jQuery uses. It also requires that you change server-side resources to explicitly allow the other origin (www.xxxx.com).
To get the JSON data you're requesting, you basically have three options:
If possible, you can be maximally-compatible by correcting the location of the files you're loading so they have the same origin as the document you're loading them into. (I assume you must be loading them via Ajax, hence the Same Origin Policy issue showing up.)
Use JSON-P, which isn't subject to the SOP. jQuery has built-in support for it in its ajax call (just set dataType to "jsonp" and jQuery will do all the client-side work). This requires server side changes, but not very big ones; basically whatever you have that's generating the JSON response just looks for a query string parameter called "callback" and wraps the JSON in JavaScript code that would call that function. E.g., if your current JSON response is:
{"weather": "Dreary start but soon brightening into a fine summer day."}
Your script would look for the "callback" query string parameter (let's say that the parameter's value is "jsop123") and wraps that JSON in the syntax for a JavaScript function call:
jsonp123({"weather": "Dreary start but soon brightening into a fine summer day."});
That's it. JSON-P is very broadly compatible (because it works via JavaScript script tags). JSON-P is only for GET, though, not POST (again because it works via script tags).
Use CORS (the mechanism related to the header you quoted). Details in the specification linked above, but basically:
A. The browser will send your server a "preflight" message using the OPTIONS HTTP verb (method). It will contain the various headers it would send with the GET or POST as well as the headers "Origin", "Access-Control-Request-Method" (e.g., GET or POST), and "Access-Control-Request-Headers" (the headers it wants to send).
B. Your PHP decides, based on that information, whether the request is okay and if so responds with the "Access-Control-Allow-Origin", "Access-Control-Allow-Methods", and "Access-Control-Allow-Headers" headers with the values it will allow. You don't send any body (page) with that response.
C. The browser will look at your response and see whether it's allowed to send you the actual GET or POST. If so, it will send that request, again with the "Origin" and various "Access-Control-Request-xyz" headers.
D. Your PHP examines those headers again to make sure they're still okay, and if so responds to the request.
In pseudo-code (I haven't done much PHP, so I'm not trying to do PHP syntax here):
// Find out what the request is asking for
corsOrigin = get_request_header("Origin")
corsMethod = get_request_header("Access-Control-Request-Method")
corsHeaders = get_request_header("Access-Control-Request-Headers")
if corsOrigin is null or "null" {
// Requests from a `file://` path seem to come through without an
// origin or with "null" (literally) as the origin.
// In my case, for testing, I wanted to allow those and so I output
// "*", but you may want to go another way.
corsOrigin = "*"
}
// Decide whether to accept that request with those headers
// If so:
// Respond with headers saying what's allowed (here we're just echoing what they
// asked for, except we may be using "*" [all] instead of the actual origin for
// the "Access-Control-Allow-Origin" one)
set_response_header("Access-Control-Allow-Origin", corsOrigin)
set_response_header("Access-Control-Allow-Methods", corsMethod)
set_response_header("Access-Control-Allow-Headers", corsHeaders)
if the HTTP request method is "OPTIONS" {
// Done, no body in response to OPTIONS
stop
}
// Process the GET or POST here; output the body of the response
Again stressing that this is pseudo-code.

We Keep Coding

JavaScript is the programming language of the Web.