Why are my server sent events arriving as a batch? - javascript

I have a Java 8 / Spring4-based web application that is reporting the progress of a long-running process using Server Sent Events (SSEs) to a browser-based client running some Javascript and updating a progress bar. In my development environment and on our development server, the SSEs arrive in near-real-time at the client. I can see them arriving (along with their timestamps) using Chrome dev tools and the progress bar updates smoothly.
However, when I deploy to our production environment, I observe different behaviour. The events do not arrive at the browser until the long-running process completes. Then they all arrive in a burst (the events all have the timestamps within a few hundred milliseconds of each other according to dev tools). The progress bar is stuck at 0% for the duration and then skips to 100% really quickly. Meanwhile, my server logs tell me the events were generated and sent at regular intervals.
Here's the relevant server side code:
public class LongRunningProcess extends Thread {
private SseEmitter emitter;
public LongRunningProcess(SseEmitter emitter) {
this.emitter = emitter;
}
public void run() {
...
// Sample event, representing 10% progress
SseEventBuilder event = SseEmitter.event();
event.name("progress");
event.data("{ \"progress\": 10 }"); // Hand-coded JSON
emitter.send(event);
...
}
}
#RestController
public class UploadController {
#GetMapping("/start")
public SseEmitter start() {
SseEmitter emitter = new SseEmitter();
LongRunningProcess process = new LongRunningProcess(emitter);
process.start();
return emitter;
}
}
Here's the relevant client-side Javascript:
EventSource src = new EventSource("https://www.example.com/app/start");
src.addEventListener('progress', function(event) {
// Process event.data and update progress bar accordingly
});
I believe my code is fairly typical and it works just fine in DEV. However if anyone can see an issue let me know.
The issue could be related to the configuration of our production servers. DEV and PROD are all running the same version of Tomcat. However, some of them are accessed via a load balancer (F5 in out case). Almost all of them are behind a CDN (Akamai in our case). Could there be some part of this setup that causes the SSEs to be buffered (or queued or cached) that might produce what I'm seeing?
Following up on the infrastructure configuration idea, I've observed the following in the response headers. In the development environment, my browser receives:
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Connection: Keep-Alive
Content-Type: text/event-stream;charset=UTF-8
Keep-Alive: timeout=15, max=99
Pragma: no-cache
Server: Apache
Transfer-Encoding: chunked
Via: 1.1 example.com
This is what I'd expect for an event stream. A chunked response of an unknown content length. In the production environment, my browser receives something different:
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Connection: keep-alive
Content-Type: text/event-stream;charset=UTF-8
Content-Encoding: gzip
Content-Length: 318
Pragma: no-cache
Vary: Accept-Encoding
Here the returned content has a known length and is compressed. I don't think this should happen for an event stream. It would appear that something is converting my event stream into single file. Any thoughts on how I can figure out what's doing this?

It took a significant amount of investigation to determine that the cause of the issue was the elements in our network path. So the code above is correct and safe to use. If you find SSE buffering you will most likely want to check the configuration of key networking elements.
In my case, it was Akamai as our CDN and the use of an F5 device as a load balancer. Indeed it was the fact that both can introduce buffering that made it quite difficult to diagnose the issue.
Akamai Edge servers buffer event streams by default. This can be disabled through the use of Akamai's advanced metadata and controlled via custom behaviours. At this time, this cannot be controlled directly through Amakai's portal, so you will need to get their engineers to do some of the work for you.
F5 devices appear to default to buffering response data as well. Fortunately, this is quite simple to change and can be done yourself via the device's configuration portal. For the virtual device in question, go to Profile : Services : HTTP and change the configuration of Response Chunking to Preserve (in our case it had defaulted to Selective).
Once I made these changes, I began to receive SSEs in near real-time from our PROD servers (and not just our DEV servers).

Have you tried alternative browsers? I'm trying to debug a similar problem in which SSE works on an iPhone client but not on MacOS/Safari or Firefox.
There may be a work-around for your issue - if the server sends "Connection: close" instead of keep-alive, or even closes the connection itself, the client should re-connect in a few seconds and the server will send the current progress bar event.
I'm guessing that closing the connection will flush whatever buffer is causing the problem.

This is not a solution to this question exactly, but related to SSE, Spring and use of compression.
In my case I had ziplet CompressionFilter configured in my Spring application and it was closing the Http Response and causing SSE to fail. This seems to be related to an open issue in the ziplet project. I disabled the filter and enabled Tomcat compression in application.properties (server.compression.enabled=true) and it solved the SSE issue.
Note that I did not change the default compressionMinSize setting, which may have something to do with SSE traffic not getting compressed and passing through.

The webpack dev server also buffers server sent events when using the proxy setting.

Related

Send some data to server without REST in JS

As far as i understand all what REST do is standartize a data sended to server by adding some headers. For example REST request can generate a line of bytes like so: POST /qwe HTTP/1.1 Host: 127.0.0.1 Connection: keep-alive and finish it with some user input.
Now im just playing with writing my own JS server and here is my question: is there a way in JS to send some data(bytes) without this REST addings like headers/method and will it work for browsers and HTTP protocol itself?
For example instead of sending POST /qwe HTTP/1.1 Host: 127.0.0.1 Connection: keep-alive MY DATA OVER THERE!!! just send MY DATA OVER THERE!!! so my server can read only user data without everything else.
Iv tried to google and end up that XMLhttpRequest and fetch both require some CRUD method to be specified and adding some headers in request anyway.
HTTP requests:
Need to specify the method
Need to specify the Host as a header (in HTTP 1.1. and newer)
Will include some other request headers automatically when make using JS from a browser
This has nothing to do with REST. It's just how HTTP works.
A non-HTTP protocol could avoid having that. JavaScript in a browser has no mechanisms that allow making non HTTP requests.
You might want to research WebSocket which allows two way communication over a single connection … but that is a bootstrapped by HTTP so doesn't really fulfil your requirement.
For example instead of sending POST /qwe HTTP/1.1 Host: 127.0.0.1 Connection: keep-alive MY DATA OVER THERE!!! just send MY DATA OVER THERE!!! so my server can read only user data without everything else.
I suspect you're misunderstanding what a request is, on a fundamental level. Without POST (the method), /qwe (the path), HTTP/1.1 (the protocol) and 127.0.0.1 (the address) there is no way for your computer to know where and how to send the data. These are necessary if you want to communicate with a server, and removing them will mean your code no longer works.
You're working with very low-level data here, which is probably not what you actually want to be doing. There are some packages which will let you ignore the how and what of the request, and focus on just the data inside it. Express might be a good place to start. You can set up a simple express server to handle requests on specific paths, and reply with data that your frontend can then use.
A REST API is a high-level concept and largely unrelated to what you're asking about.

Firefox combining 'Connection: keep-alive, Upgrade' conflicts with mobile operator proxy

I have a WebSocketServer running on a server box, with a website attempting to connect to it and send back and forth information.
I have noticed that on WiFi it works perfectly on all the browsers I have tested, however over Mobile Data Firefox. I intercepted and edited headers and managed to reproduce the problem. Firefox is sending a combined header Connection: keep-alive, Upgrade in the request. Chrome in comparison is just sending Connection: Upgrade. My theory is that when the request passes through the mobile data provider's proxy, as well as adding their own identifying headers, it re-parses all of the other headers, and does not understand a combined header. This is confirmed by the fact that at the server end, the request is received (from Firefox) but the Connection header is truncated to Connection: keep-alive. If I manually remove the keep-alive from the Connection header using the interception program, the problem is solved.
I don't need the keep-alive part of the request (in fact if anything I would prefer it not to be enabled) so I'm asking if there is a way to stop Firefox sending it without using about:config etc (e.g. in JS or HTML), as I would like for this to work for the general end-user.
Many thanks,
Richard
I had a similar problem, henceforth resolved.
In my case, the problem was that my hosting provider had a proxy which was not dealing correctly with the Connection and/or Upgrade headers. Indeed, these headers are hop-by-hop and as such:
Hop-by-hop headers
are meaningful only for a single transport-level connection and must not be retransmitted by proxies or cached. Such headers are: Connection, Keep-Alive, Proxy-Authenticate, Proxy-Authorization, TE, Trailer, Transfer-Encoding and Upgrade. Note that only hop-by-hop headers may be set using the Connection general header.
Souce: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers
Shortly, these headers are not retransmitted but somehow interpreted before being passed to your server. When these headers are sent by Firefox, this phase of interpretation becomes critical since the value associated to the Connection header is more "complicated" than that sent by other browsers, i.e.
Firefox sends Connection: keep-alive, Upgrade
Chrome/Edge/... sends Connection: Upgrade
Solution : I simply told my hosting provider that only Connection: keep alive arrives to my server when one sends Upgrade: <my_protocole> AND Connection: keep alive, Upgrade (and he had the possibility to correct the issu within 72 hours).

What do I put in my HTML to ensure users get latest version of my page, not old version?

I have a mostly static HTML website served from CDN (plus a bit of AJAX to the server), and do want user's browsers to cache everything, until I update any files and then I want the user's browsers to get the new version.
How do I do achieve this please, for all types of static files on my site (HTML, JS, CSS, images etc.)? (settings in HTML or elsewhere). Obviously I can tell the CDN to expire it's cache, so it's the client side I'm thinking of.
Thanks.
One way to achieve this is to make use of the HTTP Last-Modified or ETag headers. In the HTTP headers of the served file, the server will send either the date when the page was last modified (in the Last-Modified header), or a random ID representing the current state of the page (ETag), or both:
HTTP/1.1 200 OK
Content-Type: text/html
Last-Modified: Fri, 18 Dec 2015 08:24:52 GMT
ETag: "208f11-52727df9c7751"
Cache-Control: must-revalidate
If the header Cache-Control is set to must-revalidate, it causes the browser to cache the page along with the Last-Modified and ETag headers it received with it. On the next request, it will send them as If-Modified-Since and If-None-Match:
GET / HTTP/1.1
Host: example.com
If-None-Match: "208f11-52727df9c7751"
If-Modified-Since: Fri, 18 Dec 2015 08:24:52 GMT
If the current ETag of the page matches the one that comes from the browser, or if the page hasn’t been modified since the date that was sent by the browser, instead of sending the page, the server will send a Not Modified header with an empty body:
HTTP/1.1 304 Not Modified
Note that only one of the two mechanisms (ETag or Last-Modified) is required, they both work on their own.
The disadvantage of this is that a request has to be sent anyways, so the performance benefit will mostly be for pages that contain a lot of data, but particularly on internet connections with high latency, the page will still take a long time to load. (It will for sure reduce your traffic though.)
Apache automatically generates an ETag (using the file’s inode number, modification time, and size) and a Last-Modified header (based on the modification time of the file) for static files. I don’t know about other web-servers, but I assume it will be similar. For dynamic pages, you can set the headers yourself (for example by sending the MD5 sum of the content as ETag).
By default, Apache doesn’t send a Cache-Control header (and the default is Cache-Control: private). This example .htaccess file makes Apache send the header for all .html files:
<FilesMatch "\.html$">
Header set Cache-Control "must-revalidate"
</FilesMatch>
The other mechanism is to make the browser cache the page by sending Cache-Control: public, but to dynamically vary the URL, for example by appending the modification time of the file as a query string (?12345). This is only really possible if your page/file is only linked from within your web application, in which case you can generate the links to it dynamically. For example, in PHP you could do something like this:
<script src="script.js?<?php echo filemtime("script.js"); ?>"></script>
To achieve what you want on the client side, you have to change the url of your static files when you load them in HTML, i.e. change the file name, add a random query string like unicorn.css?p=1234, etc. An easy way to automate this is to use a task runner such as Gulp and have a look at this package gulp-rev.
In short, if you integrate gulp-rev in your Gulp task, it will automatically append a content hash to all the static files piped into the task stream and generate a JSON manifest file which maps the old files to newly renamed files. So a file like unicorn.css will become unicorn-d41d8cd98f.css. You can then write another Gulp task to crawl through your HTML/JS/CSS files and replace all the urls or use this package gulp-rev-replace.
There should be plenty of online tutorial that shows you how to accomplish this. If you use Yeoman, you can check out this static webapp generator I wrote here which contains a Gulp routine for this.
This is what the HTML5 Application Cache does for you. Put all of your static content into the Cache Manifest and it will be cached in the browser until the manifest file is changed. As an added bonus, the static content will be available even if the browser is offline.
The only change to your HTML is in the <head> tag:
<!DOCTYPE HTML>
<html manifest="cache.appcache">
...
</html>

How can I use deflated/gzipped content with an XHR onProgress function?

I've seen a bunch of similar questions to this get asked before, but I haven't found one that describes my current problem exactly, so here goes:
I have a page which loads a large (between 0.5 and 10 MB) JSON document via AJAX so that the client-side code can process it. Once the file is loaded, I don't have any problems that I don't expect. However, it takes a long time to download, so I tried leveraging the XHR Progress API to render a progress bar to indicate to the user that the document is loading. This worked well.
Then, in an effort to speed things up, I tried compressing the output on the server side via gzip and deflate. This worked too, with tremendous gains, however, my progress bar stopped working.
I've looked into the issue for a while and found that if a proper Content-Length header isn't sent with the requested AJAX resource, the onProgress event handler cannot function as intended because it doesn't know how far along in the download it is. When this happens, a property called lengthComputable is set to false on the event object.
This made sense, so I tried setting the header explicitly with both the uncompressed and the compressed length of the output. I can verify that the header is being sent, and I can verify that my browser knows how to decompress the content. But the onProgress handler still reports lengthComputable = false.
So my question is: is there a way to gzipped/deflated content with the AJAX Progress API? And if so, what am I doing wrong right now?
This is how the resource appears in the Chrome Network panel, showing that compression is working:
These are the relevant request headers, showing that the request is AJAX and that Accept-Encoding is set properly:
GET /dashboard/reports/ajax/load HTTP/1.1
Connection: keep-alive
Cache-Control: no-cache
Pragma: no-cache
Accept: application/json, text/javascript, */*; q=0.01
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.99 Safari/537.22
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
These are the relevant response headers, showing that the Content-Length and Content-Type are being set correctly:
HTTP/1.1 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Content-Encoding: deflate
Content-Type: application/json
Date: Tue, 26 Feb 2013 18:59:07 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
P3P: CP="CAO PSA OUR"
Pragma: no-cache
Server: Apache/2.2.8 (Unix) mod_ssl/2.2.8 OpenSSL/0.9.8g PHP/5.4.7
X-Powered-By: PHP/5.4.7
Content-Length: 223879
Connection: keep-alive
For what it's worth, I've tried this on both a standard (http) and secure (https) connection, with no differences: the content loads fine in the browser, but isn't processed by the Progress API.
Per Adam's suggestion, I tried switching the server side to gzip encoding with no success or change. Here are the relevant response headers:
HTTP/1.1 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Content-Encoding: gzip
Content-Type: application/json
Date: Mon, 04 Mar 2013 22:33:19 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
P3P: CP="CAO PSA OUR"
Pragma: no-cache
Server: Apache/2.2.8 (Unix) mod_ssl/2.2.8 OpenSSL/0.9.8g PHP/5.4.7
X-Powered-By: PHP/5.4.7
Content-Length: 28250
Connection: keep-alive
Just to repeat: the content is being downloaded and decoded properly, it's just the progress API that I'm having trouble with.
Per Bertrand's request, here's the request:
$.ajax({
url: '<url snipped>',
data: {},
success: onDone,
dataType: 'json',
cache: true,
progress: onProgress || function(){}
});
And here's the onProgress event handler I'm using (it's not too crazy):
function(jqXHR, evt)
{
// yes, I know this generates Infinity sometimes
var pct = 100 * evt.position / evt.total;
// just a method that updates some styles and javascript
updateProgress(pct);
});
A slightly more elegant variation on your solution would be to set a header like 'x-decompressed-content-length' or whatever in your HTTP response with the full decompressed value of the content in bytes and read it off the xhr object in your onProgress handler.
Your code might look something like:
request.onProgress = function (e) {
var contentLength;
if (e.lengthComputable) {
contentLength = e.total;
} else {
contentLength = parseInt(e.target.getResponseHeader('x-decompressed-content-length'), 10);
}
progressIndicator.update(e.loaded / contentLength);
};
I wasn't able to solve the issue of using onProgress on the compressed content itself, but I came up with this semi-simple workaround. In a nutshell: send a HEAD request to the server at the same time as a GET request, and render the progress bar once there's enough information to do so.
function loader(onDone, onProgress, url, data)
{
// onDone = event handler to run on successful download
// onProgress = event handler to run during a download
// url = url to load
// data = extra parameters to be sent with the AJAX request
var content_length = null;
self.meta_xhr = $.ajax({
url: url,
data: data,
dataType: 'json',
type: 'HEAD',
success: function(data, status, jqXHR)
{
content_length = jqXHR.getResponseHeader("X-Content-Length");
}
});
self.xhr = $.ajax({
url: url,
data: data,
success: onDone,
dataType: 'json',
progress: function(jqXHR, evt)
{
var pct = 0;
if (evt.lengthComputable)
{
pct = 100 * evt.position / evt.total;
}
else if (self.content_length != null)
{
pct = 100 * evt.position / self.content_length;
}
onProgress(pct);
}
});
}
And then to use it:
loader(function(response)
{
console.log("Content loaded! do stuff now.");
},
function(pct)
{
console.log("The content is " + pct + "% loaded.");
},
'<url here>', {});
On the server side, set the X-Content-Length header on both the GET and the HEAD requests (which should represent the uncompressed content length), and abort sending the content on the HEAD request.
In PHP, setting the header looks like:
header("X-Content-Length: ".strlen($payload));
And then abort sending the content if it's a HEAD request:
if ($_SERVER['REQUEST_METHOD'] == "HEAD")
{
exit;
}
Here's what it looks like in action:
The reason the HEAD takes so long in the below screenshot is because the server still has to parse the file to know how long it is, but that's something I can definitely improve on, and it's definitely an improvement from where it was.
Don't get stuck just because there isn't a native solution; a hack of one line can solve your problem without messing with Apache configuration (that in some hostings is prohibited or very restricted):
PHP to the rescue:
var size = <?php echo filesize('file.json') ?>;
That's it, you probably already know the rest, but just as a reference here it is:
<script>
var progressBar = document.getElementById("p"),
client = new XMLHttpRequest(),
size = <?php echo filesize('file.json') ?>;
progressBar.max = size;
client.open("GET", "file.json")
function loadHandler () {
var loaded = client.responseText.length;
progressBar.value = loaded;
}
client.onprogress = loadHandler;
client.onloadend = function(pe) {
loadHandler();
console.log("Success, loaded: " + client.responseText.length + " of " + size)
}
client.send()
</script>
Live example:
Another SO user thinks I am lying about the validity of this solution so here it is live: http://nyudvik.com/zip/, it is gzip-ed and the real file weights 8 MB
Related links:
SO: Content-Length not sent when gzip compression enabled in Apache?
Apache Module mod_deflate doc
PHP filsize function doc
Try changing your server encoding to gzip.
Your request header shows three potential encodings (gzip,deflate,sdch), so the server can pick any one of those three. By the response header, we can see that your server is choosing to respond with deflate.
Gzip is an encoding format that includes a deflate payload in addition to additional headers and footer (which includes the original uncompressed length) and a different checksum algorithm:
Gzip at Wikipedia
Deflate has some issues. Due to legacy issues dealing with improper decoding algorithms, client implementations of deflate have to run through silly checks just to figure out which implementation they're dealing with, and unfortunately, they often still get it wrong:
Why use deflate instead of gzip for text files served by Apache?
In the case of your question, the browser probably sees a deflate file coming down the pipe and just throws up its arms and says, "When I don't even know exactly how I'll end up decoding this thing, how can you expect me to worry about getting the progress right, human?"
If you switch your server configuration so the response is gzipped (i.e., gzip shows up as the content-encoding), I'm hopeful your script works as you'd hoped/expected it would.
We have created a library that estimates the progress and always sets lengthComputable to true.
Chrome 64 still has this issue (see Bug)
It is a javascript shim that you can include in your page which fixes this issue and you can use the standard new XMLHTTPRequest() normally.
The javascript library can be found here:
https://github.com/AirConsole/xmlhttprequest-length-computable
This solution worked for me.
I increased deflate buffer size to cover biggest file size I may have, which is going to be compressed generally, to around 10mb, and it yielded from 9.3mb to 3.2mb compression, in apache configuration so content-length header to be returned instead of omitted as result of Transfer Encoding specification which is used when loading compressed file exceeds the buffer size, refer to https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding for more info about chunked encoding header which is used in compression as well as more info about deflate buffer size in https://httpd.apache.org/docs/2.4/mod/mod_deflate.html#deflatebuffersize.
1- Include the following in your apache configuration, and note buffer size value is in bytes.
<IfModule mod_deflate.c>
DeflateBufferSize 10000000
</IfModule>
2- Restart apache server.
3- Incldue the following in your .htaccess file to make sure content-length header is exposed to JS HTTP requests.
<IfModule mod_headers.c>
Header set Access-Control-Expose-Headers "Content-Length"
</IfModule>
4- In onDownloadProgress event before calculating progress total percentage append following to retrieve total bytes value.
var total = e.total;
if(!e.lengthComputable){
total = e.target.getResponseHeader('content-length') * 2.2;
}
5- Note, I learnt by comparing, that lengthComputable is set to false, as flag indicates if content-length is passed in header, while relying not on Content-Length header omission but actually it’s Content-Encoding header, as I found when it is passed in file response headers, lengthComputable is only then set to false, it seems as a normal behaviour as part of JS HTTP requests specification Also, the reason why I multiplied by 2.2 the total from compressed content-length, because it achieves more accurate download/upload progress tracking with my server compression level and method, as the loaded total in HTTP progress returned reflects the decompressed data total instead of compressed data thus it requires tweaking the code logic a little bit to meet your server compression method as it may vary than mine, and first step is to examine the general difference of the compression across multiple files and see if multiplying by 2 e.g. results with closest value to the decompressed files size i.e. original size and multiply accordingly and yet make sure by multiplication the result is still smaller or equal but not bigger than original file size, so for the loaded data its guaranteed reaching and most likely as well as slightly surpassing 100 in all cases. Also, there is hacky enhancement for this issue solution that is by capping progress calculation to 100 and no need to check if progress exceeded while taking the relevant point on assuring to reach 100% into implementation must be addressed.
In my condition, this allowed me, to know when each file/resource loading has completed i.e. check total to be like the following where >= used to take into account slight surpassing 100% after compressed total multiplication to reach decompressed or if percentage calculating method was capped to 100 then use == operator instead, to find when each file completed preloading. Also, I thought about resolving this issue from roots, through storing fixed decompressed loaded totals for each file i.e original file size and using it during preloading files e.g. such as the resources in my condition to calculate progress percentage. Here is following snippet from my onProgress event handling conditions.
// Some times 100 reached in the progress event more than once.
if(preloadedResources < resourcesLength && progressPercentage < 100) {
canIncreaseCounter = true;
}
if(progressPercentage >= 100 && canIncreaseCounter && preloadedResources < resourcesLength) {
preloadedResources++;
canIncreaseCounter = false;
}
Also, note expected loaded total usage as fixed solution it's valid in all circumstances except when oneself have no prior access to files going to preload or download and I think its seldom to happen, as most of times we know the files we want to preload thus can retrieve its size prior to preloading perhaps through serving via PHP script list of sizes for the files of interest that is located in a server with HTTP first request, and then in second, preloading request one will have each relevant original file size and or even before hand store as part of code, the preloaded resources fixed decompressed size in associative array, then one can use it in tracking loading progress.
For my tracking loading progress implementation live example refer to resources preloading in my personal website at https://zakaria.website.
Lastly, I'm not aware of any downsides with increasing deflate buffer size, except extra load on server memory, and if anyone have input on this issue, it would be very much appreciated to let us know about.
The only solution I can think of is manually compressing the data (rather than leaving it to the server and browser), as that allows you to use the normal progress bar and should still give you considerable gains over the uncompressed version. If for example the system only is required to work in latest generation web browsers you can for example zip it on the server side (whatever language you use, I am sure there is a zip function or library) and on the client side you can use zip.js. If more browser support is required you can check this SO answer for a number of compression and decompression functions (just choose one which is supported in the server side language you're using). Overall this should be reasonably simple to implement, although it will perform worse (though still good probably) than native compression/decompression. (Btw, after giving it a bit more thought it could in theory perform even better than the native version in case you would choose a compression algorithm which fits the type of data you're using and the data is sufficiently big)
Another option would be to use a websocket and load the data in parts where you parse/handle every part at the same time it's loaded (you don't need websockets for that, but doing 10's of http requests after eachother can be quite a hassle). Whether this is possible depends on the specific scenario, but to me it sounds like report data is the kind of data that can be loaded in parts and isn't required to be first fully downloaded.
I do not clearly understand the issue, it should not happen since the decompression should done by the browser.
You may try to move away from jQuery or hack jQuery because the $.ajax does not seems to work well with binary data:
Ref: http://blog.vjeux.com/2011/javascript/jquery-binary-ajax.html
You could try to do your own implementation of the ajax request
See: https://developer.mozilla.org/en-US/docs/DOM/XMLHttpRequest/Using_XMLHttpRequest#Handling_binary_data
You could try to uncompress the json the content by javascript (see resources in comments).
* UPDATE 2 *
the $.ajax function does not support the progress event handler or it is not part of the jQuery documentation (see comment below).
here is a way to get this handler work but I never tried it myself:
http://www.dave-bond.com/blog/2010/01/JQuery-ajax-progress-HMTL5/
* UPDATE 3 *
The solution use tierce third party library to extend (?) jQuery ajax functionnality, so my suggestion do not apply

Why serve 1x1 pixel GIF (web bugs) data at all?

Many analytic and tracking tools are requesting 1x1 GIF image (web bug, invisible for the user) for cross-domain event storing/processing.
Why to serve this GIF image at all? Wouldn't it be more efficient to simply return some error code such as 503 Service Temporary Unavailable or empty file?
Update: To be more clear, I'm asking why to serve GIF image data when all information required has been already sent in request headers. The GIF image itself does not return any useful information.
Doug's answer is pretty comprehensive; I thought I'd add in an additional note (at the OP's request, off of my comment)
Doug's answer explains why 1x1 pixel beacons are used for the purpose they are used for; I thought I'd outline a potential alternative approach, which is to use HTTP Status Code 204, No Content, for a response, and not send an image body.
204 No Content
The server has fulfilled the request
but does not need to return an
entity-body, and might want to return
updated metainformation. The response
MAY include new or updated
metainformation in the form of
entity-headers, which if present
SHOULD be associated with the
requested variant.
Basically, the server receives the request, and decides to not send a body (in this case, to not send an image). But it replies with a code to inform the agent that this was a conscious decision; basically, its just a shorter way to respond affirmatively.
From Google's Page Speed documentation:
One popular way of recording page
views in an asynchronous fashion is to
include a JavaScript snippet at the
bottom of the target page (or as an
onload event handler), that notifies a
logging server when a user loads the
page. The most common way of doing
this is to construct a request to the
server for a "beacon", and encode all
the data of interest as parameters in
the URL for the beacon resource. To
keep the HTTP response very small, a
transparent 1x1-pixel image is a good
candidate for a beacon request. A
slightly more optimal beacon would use
an HTTP 204 response ("no content")
which is marginally smaller than a 1x1
GIF.
I've never tried it, but in theory it should serve the same purpose without requiring the gif itself to be transmitted, saving you 35 bytes, in the case of Google Analytics. (In the scheme of things, unless you're Google Analytics serving many trillions of hits per day, 35 bytes is really nothing.)
You can test it with this code:
var i = new Image();
i.src = "http://httpstat.us/204";
First, i disagree with the two previous answers--neither engages the question.
The one-pixel image solves an intrinsic problem for web-based analytics apps (like Google Analytics) when working in the HTTP Protocol--how to transfer (web metrics) data from the client to the server.
The simplest of the methods described by the Protocol, the simplest (at lest the simplest method that includes a request body) is the GET request. According to this Protocol method, clients initiate requests to servers for resources; servers process those requests and return appropriate responses.
For a web-based analytics app, like GA, this uni-directional scheme is bad news, because it doesn't appear to allow a server to retrieve data from a client on demand--again, all servers can do is supply resources not request them.
So what's the solution to the problem of getting data from the client back to the server? Within the HTTP context there are other Protocol methods other than GET (e.g., POST) but that's a limited option for many reasons (as evidenced by its infrequent and specialized use such as submitting form data).
If you look at a GET Request from a browser, you'll see it is comprised of a Request URL and Request Headers (e.g., Referer and User-Agent Headers), the latter contains information about the client--e.g., browser type and version, browser langauge, operating system, etc.
Again, this is part of the Request that the client sends to the server. So the idea that motivates the one-pixel gif is for the client to send the web metrics data to the server, wrapped inside a Request Header.
But then how to get the client to Request a resource so it can be "tricked" into sending the metrics data? And how to get the client to send the actual data the server wants?
Google Analytics is a good example: the ga.js file (the large file whose download to the client is triggered by a small script in the web page) includes a few lines of code that directs the client to request a particular resource from a particular server (the GA server) and to send certain data wrapped in the Request Header.
But since the purpose of this Request is not to actually get a resource but to send data to the server, this resource should be a small as possible and it should not be visible when rendered in the web page--hence, the 1 x 1 pixel transparent gif. The size is the smallest size possible, and the format (gif) is the smallest among the image formats.
More precisely, all GA data--every single item--is assembled and packed into the Request URL's query string (everything after the '?'). But in order for that data to go from the client (where it is created) to the GA server (where it is logged and aggregated) there must be an HTTP Request, so the ga.js (google analytics script that's downloaded, unless it's cached, by the client, as a result of a function called when the page loads) directs the client to assemble all of the analytics data--e.g., cookies, location bar, request headers, etc.--concatenate it into a single string and append it as a query string to a URL (*http://www.google-analytics.com/__utm.gif*?) and that becomes the Request URL.
It's easy to prove this using any web browser that has allows you to view the HTTP Request for the web page displayed in your browser (e.g., Safari's Web Inspector, Firefox/Chrome Firebug, etc.).
For instance, i typed in valid url to a corporate home page into my browser's location bar, which returned that home page and displayed it in my browser (i could have chosen any web site/page that uses one of the major analytics apps, GA, Omniture, Coremetrics, etc.)
The browser i used was Safari, so i clicked Develop in the menu bar then Show Web Inspector. On the top row of the Web Inspector, click Resources, find and click the utm.gif resource from the list of resources shown on the left-hand column, then click the Headers tab. That will show you something like this:
Request URL:http://www.google-analytics.com/__utm.gif?
utmwv=1&utmn=1520570865&
utmcs=UTF-8&
utmsr=1280x800&
utmsc=24-bit&
utmul=enus&
utmje=1&
utmfl=10.3%20r181&
Request Method:GET
Status Code:200 OK
Request Headers
User-Agent:Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/533.21.1
(KHTML, like Gecko) Version/5.0.5 Safari/533.21.1
Response Headers
Cache-Control:private, no-cache, no-cache=Set-Cookie, proxy-revalidate
Content-Length:35
Content-Type:image/gif
Date:Wed, 06 Jul 2011 21:31:28 GMT
The key points to notice are:
The Request was in fact a request
for the utm.gif, as evidenced by the
first line above: *Request
URL:http://www.google-analytics.com/__utm.gif*.
The Google Analytics parameters are clearly visible in the query string
appended to the Request URL: e.g.,
utmsr is GA's variable name to refer to the client screen
resolution, for me, shows a value of
1280x800; utmfl is the variable
name for flash version, which has a
value of 10.3, etc.
The Response Header called
Content-Type (sent by the server back to the client) also confirms
that the resource requested and
returned was a 1x1 pixel gif:
Content-Type:image/gif
This general scheme for transferring data between a client and a server has been around forever; there could very well be a better way of doing this, but it's the only way i know of (that satisfies the constraints imposed by a hosted analytics service).
Some browsers may display an error icon if the resource could not load. It makes debugging/monitoring the service also a little bit more complicated, you have to make sure that your monitoring tools treat the error as a good result.
OTOH you don't gain anything. The error message returned by the server/framework is typically bigger then the 1x1 image. This means you increase your network traffic for basically nothing.
Because such a GIF has a known presentation in a browser - it's a single pixel, period. Anything else presents a risk of visually interfering with the actual content of the page.
HTTP errors could appear as oversized frames of error text or even as a pop-up window. Some browsers may also complain if they receive empty replies.
In addition, in-page images are one of the very few data types allowed by default in all broswers. Anything else may require explicit user action to be downloaded.
This is to answer the OP's question - "why to serve GIF image data..."
Some users will put a simple img tag to call your event logging service -
<img src="http://www.example.com/logger?event_id=1234">
In this case, if you don't serve an image, the browser will show a placeholder icon that will look ugly and give the impression that your service is broken!
What I do is, look for the Accept header field. When your script is called via an img tag like this, you will see something like following in the header of the request -
Accept: image/gif, image/*
Accept-Encoding:gzip,deflate
...
When there is "image/"* string in the Accept header field, I supply the image, otherwise I just reply with 204.
Well the major reason is to attach the cookie to it so if users go from one side to another we still have the same element to attach cookie to.
#Maciej Perliński is basically correct, but I feel a detailed answer will be beneficial.
why 1x1 GIF and not a 204 No-Content status code?
204 No-Content enables the server to omit all response headers (Content-Type, Content-Length, Content-Encoding, Cache-Control etc...) and return an empty response body with 0 bytes (and saving a lot of unneeded bandwidth).
Browsers know to respect 204 No-Content responses, and not to expect/wait for response headers and response body.
if the server needs to set any response header (e.g. cache-control or cookie), he cannot use 204 No-Content because browsers will ignore any response header by design (according to the HTTP protocol spec).
why 1x1 GIF and not a Content-Length: 0 header with 200 OK status code?
Probably a mix of several issues, just to name a few:
legacy browsers compatibility
MIME type checks on browsers, 0 bytes is not a valid image.
200 OK with 0 bytes might not be fully supported by intermediate proxy servers and VPNs
You don't have to serve an image if you are using the Beacon API (https://w3c.github.io/beacon/) implementation method.
An error code would work if you have access to the log files of your server. The purpose of serving the image is to obtain more data about the user than you normally would with a log file.

Categories