How can I use deflated/gzipped content with an XHR onProgress function? - javascript

I've seen a bunch of similar questions to this get asked before, but I haven't found one that describes my current problem exactly, so here goes:
I have a page which loads a large (between 0.5 and 10 MB) JSON document via AJAX so that the client-side code can process it. Once the file is loaded, I don't have any problems that I don't expect. However, it takes a long time to download, so I tried leveraging the XHR Progress API to render a progress bar to indicate to the user that the document is loading. This worked well.
Then, in an effort to speed things up, I tried compressing the output on the server side via gzip and deflate. This worked too, with tremendous gains, however, my progress bar stopped working.
I've looked into the issue for a while and found that if a proper Content-Length header isn't sent with the requested AJAX resource, the onProgress event handler cannot function as intended because it doesn't know how far along in the download it is. When this happens, a property called lengthComputable is set to false on the event object.
This made sense, so I tried setting the header explicitly with both the uncompressed and the compressed length of the output. I can verify that the header is being sent, and I can verify that my browser knows how to decompress the content. But the onProgress handler still reports lengthComputable = false.
So my question is: is there a way to gzipped/deflated content with the AJAX Progress API? And if so, what am I doing wrong right now?
This is how the resource appears in the Chrome Network panel, showing that compression is working:
These are the relevant request headers, showing that the request is AJAX and that Accept-Encoding is set properly:
GET /dashboard/reports/ajax/load HTTP/1.1
Connection: keep-alive
Cache-Control: no-cache
Pragma: no-cache
Accept: application/json, text/javascript, */*; q=0.01
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.99 Safari/537.22
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
These are the relevant response headers, showing that the Content-Length and Content-Type are being set correctly:
HTTP/1.1 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Content-Encoding: deflate
Content-Type: application/json
Date: Tue, 26 Feb 2013 18:59:07 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
P3P: CP="CAO PSA OUR"
Pragma: no-cache
Server: Apache/2.2.8 (Unix) mod_ssl/2.2.8 OpenSSL/0.9.8g PHP/5.4.7
X-Powered-By: PHP/5.4.7
Content-Length: 223879
Connection: keep-alive
For what it's worth, I've tried this on both a standard (http) and secure (https) connection, with no differences: the content loads fine in the browser, but isn't processed by the Progress API.
Per Adam's suggestion, I tried switching the server side to gzip encoding with no success or change. Here are the relevant response headers:
HTTP/1.1 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Content-Encoding: gzip
Content-Type: application/json
Date: Mon, 04 Mar 2013 22:33:19 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
P3P: CP="CAO PSA OUR"
Pragma: no-cache
Server: Apache/2.2.8 (Unix) mod_ssl/2.2.8 OpenSSL/0.9.8g PHP/5.4.7
X-Powered-By: PHP/5.4.7
Content-Length: 28250
Connection: keep-alive
Just to repeat: the content is being downloaded and decoded properly, it's just the progress API that I'm having trouble with.
Per Bertrand's request, here's the request:
$.ajax({
url: '<url snipped>',
data: {},
success: onDone,
dataType: 'json',
cache: true,
progress: onProgress || function(){}
});
And here's the onProgress event handler I'm using (it's not too crazy):
function(jqXHR, evt)
{
// yes, I know this generates Infinity sometimes
var pct = 100 * evt.position / evt.total;
// just a method that updates some styles and javascript
updateProgress(pct);
});

A slightly more elegant variation on your solution would be to set a header like 'x-decompressed-content-length' or whatever in your HTTP response with the full decompressed value of the content in bytes and read it off the xhr object in your onProgress handler.
Your code might look something like:
request.onProgress = function (e) {
var contentLength;
if (e.lengthComputable) {
contentLength = e.total;
} else {
contentLength = parseInt(e.target.getResponseHeader('x-decompressed-content-length'), 10);
}
progressIndicator.update(e.loaded / contentLength);
};

I wasn't able to solve the issue of using onProgress on the compressed content itself, but I came up with this semi-simple workaround. In a nutshell: send a HEAD request to the server at the same time as a GET request, and render the progress bar once there's enough information to do so.
function loader(onDone, onProgress, url, data)
{
// onDone = event handler to run on successful download
// onProgress = event handler to run during a download
// url = url to load
// data = extra parameters to be sent with the AJAX request
var content_length = null;
self.meta_xhr = $.ajax({
url: url,
data: data,
dataType: 'json',
type: 'HEAD',
success: function(data, status, jqXHR)
{
content_length = jqXHR.getResponseHeader("X-Content-Length");
}
});
self.xhr = $.ajax({
url: url,
data: data,
success: onDone,
dataType: 'json',
progress: function(jqXHR, evt)
{
var pct = 0;
if (evt.lengthComputable)
{
pct = 100 * evt.position / evt.total;
}
else if (self.content_length != null)
{
pct = 100 * evt.position / self.content_length;
}
onProgress(pct);
}
});
}
And then to use it:
loader(function(response)
{
console.log("Content loaded! do stuff now.");
},
function(pct)
{
console.log("The content is " + pct + "% loaded.");
},
'<url here>', {});
On the server side, set the X-Content-Length header on both the GET and the HEAD requests (which should represent the uncompressed content length), and abort sending the content on the HEAD request.
In PHP, setting the header looks like:
header("X-Content-Length: ".strlen($payload));
And then abort sending the content if it's a HEAD request:
if ($_SERVER['REQUEST_METHOD'] == "HEAD")
{
exit;
}
Here's what it looks like in action:
The reason the HEAD takes so long in the below screenshot is because the server still has to parse the file to know how long it is, but that's something I can definitely improve on, and it's definitely an improvement from where it was.

Don't get stuck just because there isn't a native solution; a hack of one line can solve your problem without messing with Apache configuration (that in some hostings is prohibited or very restricted):
PHP to the rescue:
var size = <?php echo filesize('file.json') ?>;
That's it, you probably already know the rest, but just as a reference here it is:
<script>
var progressBar = document.getElementById("p"),
client = new XMLHttpRequest(),
size = <?php echo filesize('file.json') ?>;
progressBar.max = size;
client.open("GET", "file.json")
function loadHandler () {
var loaded = client.responseText.length;
progressBar.value = loaded;
}
client.onprogress = loadHandler;
client.onloadend = function(pe) {
loadHandler();
console.log("Success, loaded: " + client.responseText.length + " of " + size)
}
client.send()
</script>
Live example:
Another SO user thinks I am lying about the validity of this solution so here it is live: http://nyudvik.com/zip/, it is gzip-ed and the real file weights 8 MB
Related links:
SO: Content-Length not sent when gzip compression enabled in Apache?
Apache Module mod_deflate doc
PHP filsize function doc

Try changing your server encoding to gzip.
Your request header shows three potential encodings (gzip,deflate,sdch), so the server can pick any one of those three. By the response header, we can see that your server is choosing to respond with deflate.
Gzip is an encoding format that includes a deflate payload in addition to additional headers and footer (which includes the original uncompressed length) and a different checksum algorithm:
Gzip at Wikipedia
Deflate has some issues. Due to legacy issues dealing with improper decoding algorithms, client implementations of deflate have to run through silly checks just to figure out which implementation they're dealing with, and unfortunately, they often still get it wrong:
Why use deflate instead of gzip for text files served by Apache?
In the case of your question, the browser probably sees a deflate file coming down the pipe and just throws up its arms and says, "When I don't even know exactly how I'll end up decoding this thing, how can you expect me to worry about getting the progress right, human?"
If you switch your server configuration so the response is gzipped (i.e., gzip shows up as the content-encoding), I'm hopeful your script works as you'd hoped/expected it would.

We have created a library that estimates the progress and always sets lengthComputable to true.
Chrome 64 still has this issue (see Bug)
It is a javascript shim that you can include in your page which fixes this issue and you can use the standard new XMLHTTPRequest() normally.
The javascript library can be found here:
https://github.com/AirConsole/xmlhttprequest-length-computable

This solution worked for me.
I increased deflate buffer size to cover biggest file size I may have, which is going to be compressed generally, to around 10mb, and it yielded from 9.3mb to 3.2mb compression, in apache configuration so content-length header to be returned instead of omitted as result of Transfer Encoding specification which is used when loading compressed file exceeds the buffer size, refer to https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding for more info about chunked encoding header which is used in compression as well as more info about deflate buffer size in https://httpd.apache.org/docs/2.4/mod/mod_deflate.html#deflatebuffersize.
1- Include the following in your apache configuration, and note buffer size value is in bytes.
<IfModule mod_deflate.c>
DeflateBufferSize 10000000
</IfModule>
2- Restart apache server.
3- Incldue the following in your .htaccess file to make sure content-length header is exposed to JS HTTP requests.
<IfModule mod_headers.c>
Header set Access-Control-Expose-Headers "Content-Length"
</IfModule>
4- In onDownloadProgress event before calculating progress total percentage append following to retrieve total bytes value.
var total = e.total;
if(!e.lengthComputable){
total = e.target.getResponseHeader('content-length') * 2.2;
}
5- Note, I learnt by comparing, that lengthComputable is set to false, as flag indicates if content-length is passed in header, while relying not on Content-Length header omission but actually it’s Content-Encoding header, as I found when it is passed in file response headers, lengthComputable is only then set to false, it seems as a normal behaviour as part of JS HTTP requests specification Also, the reason why I multiplied by 2.2 the total from compressed content-length, because it achieves more accurate download/upload progress tracking with my server compression level and method, as the loaded total in HTTP progress returned reflects the decompressed data total instead of compressed data thus it requires tweaking the code logic a little bit to meet your server compression method as it may vary than mine, and first step is to examine the general difference of the compression across multiple files and see if multiplying by 2 e.g. results with closest value to the decompressed files size i.e. original size and multiply accordingly and yet make sure by multiplication the result is still smaller or equal but not bigger than original file size, so for the loaded data its guaranteed reaching and most likely as well as slightly surpassing 100 in all cases. Also, there is hacky enhancement for this issue solution that is by capping progress calculation to 100 and no need to check if progress exceeded while taking the relevant point on assuring to reach 100% into implementation must be addressed.
In my condition, this allowed me, to know when each file/resource loading has completed i.e. check total to be like the following where >= used to take into account slight surpassing 100% after compressed total multiplication to reach decompressed or if percentage calculating method was capped to 100 then use == operator instead, to find when each file completed preloading. Also, I thought about resolving this issue from roots, through storing fixed decompressed loaded totals for each file i.e original file size and using it during preloading files e.g. such as the resources in my condition to calculate progress percentage. Here is following snippet from my onProgress event handling conditions.
// Some times 100 reached in the progress event more than once.
if(preloadedResources < resourcesLength && progressPercentage < 100) {
canIncreaseCounter = true;
}
if(progressPercentage >= 100 && canIncreaseCounter && preloadedResources < resourcesLength) {
preloadedResources++;
canIncreaseCounter = false;
}
Also, note expected loaded total usage as fixed solution it's valid in all circumstances except when oneself have no prior access to files going to preload or download and I think its seldom to happen, as most of times we know the files we want to preload thus can retrieve its size prior to preloading perhaps through serving via PHP script list of sizes for the files of interest that is located in a server with HTTP first request, and then in second, preloading request one will have each relevant original file size and or even before hand store as part of code, the preloaded resources fixed decompressed size in associative array, then one can use it in tracking loading progress.
For my tracking loading progress implementation live example refer to resources preloading in my personal website at https://zakaria.website.
Lastly, I'm not aware of any downsides with increasing deflate buffer size, except extra load on server memory, and if anyone have input on this issue, it would be very much appreciated to let us know about.

The only solution I can think of is manually compressing the data (rather than leaving it to the server and browser), as that allows you to use the normal progress bar and should still give you considerable gains over the uncompressed version. If for example the system only is required to work in latest generation web browsers you can for example zip it on the server side (whatever language you use, I am sure there is a zip function or library) and on the client side you can use zip.js. If more browser support is required you can check this SO answer for a number of compression and decompression functions (just choose one which is supported in the server side language you're using). Overall this should be reasonably simple to implement, although it will perform worse (though still good probably) than native compression/decompression. (Btw, after giving it a bit more thought it could in theory perform even better than the native version in case you would choose a compression algorithm which fits the type of data you're using and the data is sufficiently big)
Another option would be to use a websocket and load the data in parts where you parse/handle every part at the same time it's loaded (you don't need websockets for that, but doing 10's of http requests after eachother can be quite a hassle). Whether this is possible depends on the specific scenario, but to me it sounds like report data is the kind of data that can be loaded in parts and isn't required to be first fully downloaded.

I do not clearly understand the issue, it should not happen since the decompression should done by the browser.
You may try to move away from jQuery or hack jQuery because the $.ajax does not seems to work well with binary data:
Ref: http://blog.vjeux.com/2011/javascript/jquery-binary-ajax.html
You could try to do your own implementation of the ajax request
See: https://developer.mozilla.org/en-US/docs/DOM/XMLHttpRequest/Using_XMLHttpRequest#Handling_binary_data
You could try to uncompress the json the content by javascript (see resources in comments).
* UPDATE 2 *
the $.ajax function does not support the progress event handler or it is not part of the jQuery documentation (see comment below).
here is a way to get this handler work but I never tried it myself:
http://www.dave-bond.com/blog/2010/01/JQuery-ajax-progress-HMTL5/
* UPDATE 3 *
The solution use tierce third party library to extend (?) jQuery ajax functionnality, so my suggestion do not apply

Related

Why are my server sent events arriving as a batch?

I have a Java 8 / Spring4-based web application that is reporting the progress of a long-running process using Server Sent Events (SSEs) to a browser-based client running some Javascript and updating a progress bar. In my development environment and on our development server, the SSEs arrive in near-real-time at the client. I can see them arriving (along with their timestamps) using Chrome dev tools and the progress bar updates smoothly.
However, when I deploy to our production environment, I observe different behaviour. The events do not arrive at the browser until the long-running process completes. Then they all arrive in a burst (the events all have the timestamps within a few hundred milliseconds of each other according to dev tools). The progress bar is stuck at 0% for the duration and then skips to 100% really quickly. Meanwhile, my server logs tell me the events were generated and sent at regular intervals.
Here's the relevant server side code:
public class LongRunningProcess extends Thread {
private SseEmitter emitter;
public LongRunningProcess(SseEmitter emitter) {
this.emitter = emitter;
}
public void run() {
...
// Sample event, representing 10% progress
SseEventBuilder event = SseEmitter.event();
event.name("progress");
event.data("{ \"progress\": 10 }"); // Hand-coded JSON
emitter.send(event);
...
}
}
#RestController
public class UploadController {
#GetMapping("/start")
public SseEmitter start() {
SseEmitter emitter = new SseEmitter();
LongRunningProcess process = new LongRunningProcess(emitter);
process.start();
return emitter;
}
}
Here's the relevant client-side Javascript:
EventSource src = new EventSource("https://www.example.com/app/start");
src.addEventListener('progress', function(event) {
// Process event.data and update progress bar accordingly
});
I believe my code is fairly typical and it works just fine in DEV. However if anyone can see an issue let me know.
The issue could be related to the configuration of our production servers. DEV and PROD are all running the same version of Tomcat. However, some of them are accessed via a load balancer (F5 in out case). Almost all of them are behind a CDN (Akamai in our case). Could there be some part of this setup that causes the SSEs to be buffered (or queued or cached) that might produce what I'm seeing?
Following up on the infrastructure configuration idea, I've observed the following in the response headers. In the development environment, my browser receives:
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Connection: Keep-Alive
Content-Type: text/event-stream;charset=UTF-8
Keep-Alive: timeout=15, max=99
Pragma: no-cache
Server: Apache
Transfer-Encoding: chunked
Via: 1.1 example.com
This is what I'd expect for an event stream. A chunked response of an unknown content length. In the production environment, my browser receives something different:
Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Connection: keep-alive
Content-Type: text/event-stream;charset=UTF-8
Content-Encoding: gzip
Content-Length: 318
Pragma: no-cache
Vary: Accept-Encoding
Here the returned content has a known length and is compressed. I don't think this should happen for an event stream. It would appear that something is converting my event stream into single file. Any thoughts on how I can figure out what's doing this?
It took a significant amount of investigation to determine that the cause of the issue was the elements in our network path. So the code above is correct and safe to use. If you find SSE buffering you will most likely want to check the configuration of key networking elements.
In my case, it was Akamai as our CDN and the use of an F5 device as a load balancer. Indeed it was the fact that both can introduce buffering that made it quite difficult to diagnose the issue.
Akamai Edge servers buffer event streams by default. This can be disabled through the use of Akamai's advanced metadata and controlled via custom behaviours. At this time, this cannot be controlled directly through Amakai's portal, so you will need to get their engineers to do some of the work for you.
F5 devices appear to default to buffering response data as well. Fortunately, this is quite simple to change and can be done yourself via the device's configuration portal. For the virtual device in question, go to Profile : Services : HTTP and change the configuration of Response Chunking to Preserve (in our case it had defaulted to Selective).
Once I made these changes, I began to receive SSEs in near real-time from our PROD servers (and not just our DEV servers).
Have you tried alternative browsers? I'm trying to debug a similar problem in which SSE works on an iPhone client but not on MacOS/Safari or Firefox.
There may be a work-around for your issue - if the server sends "Connection: close" instead of keep-alive, or even closes the connection itself, the client should re-connect in a few seconds and the server will send the current progress bar event.
I'm guessing that closing the connection will flush whatever buffer is causing the problem.
This is not a solution to this question exactly, but related to SSE, Spring and use of compression.
In my case I had ziplet CompressionFilter configured in my Spring application and it was closing the Http Response and causing SSE to fail. This seems to be related to an open issue in the ziplet project. I disabled the filter and enabled Tomcat compression in application.properties (server.compression.enabled=true) and it solved the SSE issue.
Note that I did not change the default compressionMinSize setting, which may have something to do with SSE traffic not getting compressed and passing through.
The webpack dev server also buffers server sent events when using the proxy setting.

How can I compress the response of an ajax request?

I return the whole HTML as the response of an ajax request (not just an array as JSON). So as you know, the response will be larger a bit (than JSON). Because it contains some more things like html tags, html attributes etc ... So to
tncreased scalability (reduced server load).
Less network bandwidth (lower cost).
Better user experience (faster).
I want to compress the response. Something like .gzip format. Based on so tests, this is the size of an ajax response:
And this is the size of an ajax response which is zipped:
See? There is a huge different between them in theirs sizes.
All I want to know, is it possible to compress the response of an ajax request on the way and convert it to a regular text on the client side? For doing that do I need to do some changes in the web service (like nginx) configuration? Or it does that automatically?
You should use Lz-string method to compress data and send using ajax call and at the time of response, again use it by decompressing it
GITHUB links : https://github.com/pieroxy/lz-string/
Documentation & usage : https://coderwall.com/p/mekopw/jsonc-compress-your-json-data-up-to-80
You may want to check out some of the other StackOverflow posts, plus Wikipedia links & articles on the web regarding Deflate & Gzip. Here are some links for you to peruse. They have some interesting charts & information in them:
StackOverflow: Why Use Deflate Instead of Gzip for Text Files Served by Apache
StackOverflow: Is There Any Performance Hit Involved In Choosing Gzip Over Deflate For-
HTTP.com
Wikpedia: Deflate
Wikipedia: Gzip
Article: How to Optimize Your Site with GZip Compression
RFC: Deflate

What do I put in my HTML to ensure users get latest version of my page, not old version?

I have a mostly static HTML website served from CDN (plus a bit of AJAX to the server), and do want user's browsers to cache everything, until I update any files and then I want the user's browsers to get the new version.
How do I do achieve this please, for all types of static files on my site (HTML, JS, CSS, images etc.)? (settings in HTML or elsewhere). Obviously I can tell the CDN to expire it's cache, so it's the client side I'm thinking of.
Thanks.
One way to achieve this is to make use of the HTTP Last-Modified or ETag headers. In the HTTP headers of the served file, the server will send either the date when the page was last modified (in the Last-Modified header), or a random ID representing the current state of the page (ETag), or both:
HTTP/1.1 200 OK
Content-Type: text/html
Last-Modified: Fri, 18 Dec 2015 08:24:52 GMT
ETag: "208f11-52727df9c7751"
Cache-Control: must-revalidate
If the header Cache-Control is set to must-revalidate, it causes the browser to cache the page along with the Last-Modified and ETag headers it received with it. On the next request, it will send them as If-Modified-Since and If-None-Match:
GET / HTTP/1.1
Host: example.com
If-None-Match: "208f11-52727df9c7751"
If-Modified-Since: Fri, 18 Dec 2015 08:24:52 GMT
If the current ETag of the page matches the one that comes from the browser, or if the page hasn’t been modified since the date that was sent by the browser, instead of sending the page, the server will send a Not Modified header with an empty body:
HTTP/1.1 304 Not Modified
Note that only one of the two mechanisms (ETag or Last-Modified) is required, they both work on their own.
The disadvantage of this is that a request has to be sent anyways, so the performance benefit will mostly be for pages that contain a lot of data, but particularly on internet connections with high latency, the page will still take a long time to load. (It will for sure reduce your traffic though.)
Apache automatically generates an ETag (using the file’s inode number, modification time, and size) and a Last-Modified header (based on the modification time of the file) for static files. I don’t know about other web-servers, but I assume it will be similar. For dynamic pages, you can set the headers yourself (for example by sending the MD5 sum of the content as ETag).
By default, Apache doesn’t send a Cache-Control header (and the default is Cache-Control: private). This example .htaccess file makes Apache send the header for all .html files:
<FilesMatch "\.html$">
Header set Cache-Control "must-revalidate"
</FilesMatch>
The other mechanism is to make the browser cache the page by sending Cache-Control: public, but to dynamically vary the URL, for example by appending the modification time of the file as a query string (?12345). This is only really possible if your page/file is only linked from within your web application, in which case you can generate the links to it dynamically. For example, in PHP you could do something like this:
<script src="script.js?<?php echo filemtime("script.js"); ?>"></script>
To achieve what you want on the client side, you have to change the url of your static files when you load them in HTML, i.e. change the file name, add a random query string like unicorn.css?p=1234, etc. An easy way to automate this is to use a task runner such as Gulp and have a look at this package gulp-rev.
In short, if you integrate gulp-rev in your Gulp task, it will automatically append a content hash to all the static files piped into the task stream and generate a JSON manifest file which maps the old files to newly renamed files. So a file like unicorn.css will become unicorn-d41d8cd98f.css. You can then write another Gulp task to crawl through your HTML/JS/CSS files and replace all the urls or use this package gulp-rev-replace.
There should be plenty of online tutorial that shows you how to accomplish this. If you use Yeoman, you can check out this static webapp generator I wrote here which contains a Gulp routine for this.
This is what the HTML5 Application Cache does for you. Put all of your static content into the Cache Manifest and it will be cached in the browser until the manifest file is changed. As an added bonus, the static content will be available even if the browser is offline.
The only change to your HTML is in the <head> tag:
<!DOCTYPE HTML>
<html manifest="cache.appcache">
...
</html>

How to prevent POST spamming in node.js Express using busboy?

I am following the example here: https://www.npmjs.com/package/busboy
I am worried that someone may deliberately try to overload the server. I wonder if there is a convenient way, before the data is uploaded, to prevent spamming by measuring the size of the entire POST body, not just the file(s) uploaded. I tried the following, which apparently didn't work:
if (JSON.stringify(req.body).length > 5 * 1024 * 1024) res.redirect('/');
You cannot rely on Content-Length being set. Even if it were set, if the person was acting malicious, they either may use an incorrect Content-Length or they may use Transfer-Encoding: chunked, in which case there is no way to tell how large the request body is.
Additionally, calling stringify() every time on req.body could easily cause a DoS-style attack as well.
However, busboy does have several options for limiting various aspects of application/x-www-form-urlencoded and multipart/form-data requests (e.g. max file size, max number of files, etc.).
You might also limit the parsing of request bodies to routes where you're expecting request bodies, instead of trying to parse request bodies for all requests.

XMLHttpRequest and Chrome developer tools don't say the same thing

I'm downloading a ~50MB file in 5 MB chunks using XMLHttpRequest and the Range header. Things work great, except for detecting when I've downloaded the last chunk.
Here's a screenshot of the request and response for the first chunk. Notice the Content-Length is 1024 * 1024 * 5 (5 MB). Also notice that the server responds correctly with the first 5 MB, and in the Content-Range header, properly specifies the size of the entire file (after the /):
When I copy the response body into a text editor (Sublime), I only get 5,242,736 characters instead of the expected 5,242,880 as indicated by Content-Length:
Why are 144 characters missing? This is true of every chunk that gets downloaded, though the exact difference varies a little bit.
However, what's especially strange is the last chunk. The server responds with the last ~2.9 MB of the file (instead of a whole 5 MB) and apparently properly indicates this in the response:
Notice that I am requesting the next 5 MB (even though it goes beyond the total file size). No biggie, the server responds with the last part of the file and the headers indicate the actual byte range returned.
But does it really?
When I call xhr.getResponseHeader("Content-Length") with Javascript, I see a different story in Chrome:
The XMLHttpRequest object is telling me that another 5 MB was downloaded, beyond the end of the file. Is there something I don't understand about the xhr object?
What's even weirder is that it works in Firefox 30 as expected:
So between the xhr.responseText.length not matching the Content-Length and these headers not agreeing between the xhr object and the Network tools, I don't know what to do to fix this.
What's causing these discrepancies?
Update: I have confirmed that the server itself is properly sending the request, despite the overshot Range header in the request for the last chunk. This is the output from the raw HTTP request, thanks to good 'ol telnet:
HTTP/1.1 206 Partial Content
Server: nginx/1.4.5
Date: Mon, 14 Jul 2014 21:50:06 GMT
Content-Type: application/octet-stream
Content-Length: 2987360
Last-Modified: Sun, 13 Jul 2014 22:05:10 GMT
Connection: keep-alive
ETag: "53c30296-2fd9560"
Content-Range: bytes 47185920-50173279/50173280
So it looks like Chrome is malfunctioning. Should this be filed as a bug? Where?
The main issue is that you are reading binary data as text. Note that the server responds with Content-Type: application/octet-stream which doesn't specify the encoding explicitly - in that case the browser will typically assume that the data is encoded in UTF-8. While the length will mostly be unchanged (bytes with values 0 to 127 are interpreted as a single character in UTF-8 and bytes with higher values will usually be replaced by the replacement character �), your binary file will certainly contain a few valid multi-byte UTF-8 sequences - and these will be combined into one character. That explains why responseText.length doesn't match the number of bytes received from the server.
Now you could of course force some specific encoding using request.overrideMimeType() method, ISO 8859-1 would make sense in particular because the first 256 Unicode code points are identical with ISO 8859-1:
request.overrideMimeType("application/octet-stream; charset=iso-8859-1");
That should make sure that one byte will always be interpreted as one character. Still, a better approach would be storing the server response in an ArrayBuffer which is explicitly meant to deal with binary data.
var request = new XMLHttpRequest();
request.open(...);
request.responseType = "arraybuffer";
request.send();
...
var array = new Uint8Array(request.response);
alert("First byte has value " + array[0]);
alert("Array length is " + array.length);
According to MDN,
responseType = "arraybuffer" is supported starting with Chrome 10, Firefox 6 and Internet Explorer 10. See also: Typed arrays.
Side-note: Firefox also supports responseType = "moz-chunked-text" and responseType = "moz-chunked-arraybuffer" starting with Firefox 9 which allow receiving data in chunks without resorting to ranged requests. It seems that Chrome doesn't plan to implement it, instead they are working on implementing the Streams API.
Edit: I was unable to reproduce your issue with Chrome lying to you about the response headers, at least not without your code. However, the code responsible should be this function in partial_data.cc:
// We are making multiple requests to complete the range requested by the user.
// Just assume that everything is fine and say that we are returning what was
// requested.
void PartialData::FixResponseHeaders(HttpResponseHeaders* headers,
bool success) {
if (truncated_)
return;
if (byte_range_.IsValid() && success) {
headers->UpdateWithNewRange(byte_range_, resource_size_, !sparse_entry_);
return;
}
This code will remove the Content-Length and Content-Range headers returned by the server and replace them by ones generated from your request parameters. Given that I cannot reproduce the issue myself, the following is only guesses:
This code path seems to be used only for requests that can be satisfied from cache, so I guess that things will work correctly if you clear your cache.
resource_size_ variable must have a wrong value in your case, larger than the actual size of the requested file. This variable is determined from the Content-Range header in the first chunk requested, maybe you have a server response cached there which indicates a larger file.

Categories