I was looking at this MDN tutorial https://developer.mozilla.org/en-US/docs/Web/HTTP/Messages
where it says
HTTP messages are composed of textual information encoded in ASCII.
I thought it means that HTTP can only transfer textual info aka strings, assuming the HTTP message here refers to header + body in responses.
But later I found out that HTTP response body can have multiple MIME types outside of text, such as image, video, application/json etc. Doesn't that mean HTTP can also transfer non-textual information, which contradicts what that MDN page says about HTTP messages?
I am aware of encoding methods like utf-8 and base64, I guess you can use Base64 Encoding for the binary data so that it is transformed into text — and then can be sent with an application/json content type as another property of the JSON payload. But when you choose not to do encoding, instead using correct content-type you can just transfer the binary data? I am still trying to figure this out.
Also I have some experience consuming REST APIs from the front end. My impression is that you typically don't transfer any binary data e.g. images, files, audios with RESTful APIs. They often serve JSON or XML as the response. I wonder why is that? Is it because REST APIs is not suitable for transferring binary data directly? What are some of the common practice for transferring images or audios files to the front end?
The line you quoted is talking about the start line, status line, and headers, which use only ASCII.
The body of a request or response is an arbitrary sequence of bytes. It's mainly intepreted by the application, not by the HTTP layer. It doesn't need to be in any particular encoding. The header has a Content-Length field, and the client simply reads that many bytes after the header (there's also chunked encoding, which breaks the content up into chunks, but each one starts with a byte length, and the client simply concatenates them).
In addition, HTTP includes Transfer-Encoding types that specify the encoding of the data. This includes a number of compression formats that produce binary data.
While it's possible to use textual encodings such as base64, this is not usually done in HTTP because it increases the size of the message and it's not necessary.
Summary: I would like to make a Range header request from GitHub pages. However, in some browsers this is failing- possibly due to Gzip compression issues. It works in Chrome (v74) but not in FF (v66), Mac OS.
Goal: I would like to reliably make this request in all browsers, such as by forcing the response type to be encoded as text whenever a range request is made.
It is not clear to me whether this behavior is dictated by the browser, the server, or some combination of the two. Knowing origins could help to define a fix- working with Github pages would be nice but not mandatory.
It is also not clear to me whether this represents a bug, or if so, where. (in browser, in spec, etc)
Sample test case:
Possibly because this involves server-side gzip encoding, the sample test case doesn't reproduce locally. You'll need to enter these commands in the JS console at https://abought.github.io/weetabix/ to reproduce.
fetch('https://abought.github.io/weetabix/example_data/McDonald_s.csv', {headers: {range: 'bytes=1-100'}} ).then(resp => resp.text());
In chrome, this fetches the response text. In firefox, it gives a "decoding error".
If I omit resp.text, Firefox can complete the request- the decoding error is in reading the body, rather than any other code. Copying as curl shows that FF adds a --compress flag and Chrome does not.
Investigations
If the byte range is 0-100, the request works in FF. If the range is 1-100, it fails. This section of the file is all ASCII characters.
If I inspect the response headers (Array.from(r.headers.entries())), FF has an extra "content-encoding: gz flag" that I think is causing the issue.
(eg, gzip makes no sense without the secret decoder instructions)
I tried adding 'accept-encoding': 'identity' to the fetch request, but it is a forbidden header and modifying it via code has no effect.
The specs have changed quite recently here. Here is the link to the PR.
TLDR; They now ask the UA that the Acccept-Encoding/Identity header be added to all the Range-requests.
[§5.15]
If httpRequest’s header list contains Range, then append Accept-Encoding/identity to httpRequest’s header list.
Firefox has not yet followed up here, but a bug report has been filled.
For the time being, the Range requests in Firefox indeed are made with the Gzipped data, and thus, you must not break the bytes integrity (for instance the range 0-100 is decode-able in Firefox).
I've seen a bunch of similar questions to this get asked before, but I haven't found one that describes my current problem exactly, so here goes:
I have a page which loads a large (between 0.5 and 10 MB) JSON document via AJAX so that the client-side code can process it. Once the file is loaded, I don't have any problems that I don't expect. However, it takes a long time to download, so I tried leveraging the XHR Progress API to render a progress bar to indicate to the user that the document is loading. This worked well.
Then, in an effort to speed things up, I tried compressing the output on the server side via gzip and deflate. This worked too, with tremendous gains, however, my progress bar stopped working.
I've looked into the issue for a while and found that if a proper Content-Length header isn't sent with the requested AJAX resource, the onProgress event handler cannot function as intended because it doesn't know how far along in the download it is. When this happens, a property called lengthComputable is set to false on the event object.
This made sense, so I tried setting the header explicitly with both the uncompressed and the compressed length of the output. I can verify that the header is being sent, and I can verify that my browser knows how to decompress the content. But the onProgress handler still reports lengthComputable = false.
So my question is: is there a way to gzipped/deflated content with the AJAX Progress API? And if so, what am I doing wrong right now?
This is how the resource appears in the Chrome Network panel, showing that compression is working:
These are the relevant request headers, showing that the request is AJAX and that Accept-Encoding is set properly:
GET /dashboard/reports/ajax/load HTTP/1.1
Connection: keep-alive
Cache-Control: no-cache
Pragma: no-cache
Accept: application/json, text/javascript, */*; q=0.01
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.99 Safari/537.22
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
These are the relevant response headers, showing that the Content-Length and Content-Type are being set correctly:
HTTP/1.1 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Content-Encoding: deflate
Content-Type: application/json
Date: Tue, 26 Feb 2013 18:59:07 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
P3P: CP="CAO PSA OUR"
Pragma: no-cache
Server: Apache/2.2.8 (Unix) mod_ssl/2.2.8 OpenSSL/0.9.8g PHP/5.4.7
X-Powered-By: PHP/5.4.7
Content-Length: 223879
Connection: keep-alive
For what it's worth, I've tried this on both a standard (http) and secure (https) connection, with no differences: the content loads fine in the browser, but isn't processed by the Progress API.
Per Adam's suggestion, I tried switching the server side to gzip encoding with no success or change. Here are the relevant response headers:
HTTP/1.1 200 OK
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Content-Encoding: gzip
Content-Type: application/json
Date: Mon, 04 Mar 2013 22:33:19 GMT
Expires: Thu, 19 Nov 1981 08:52:00 GMT
P3P: CP="CAO PSA OUR"
Pragma: no-cache
Server: Apache/2.2.8 (Unix) mod_ssl/2.2.8 OpenSSL/0.9.8g PHP/5.4.7
X-Powered-By: PHP/5.4.7
Content-Length: 28250
Connection: keep-alive
Just to repeat: the content is being downloaded and decoded properly, it's just the progress API that I'm having trouble with.
Per Bertrand's request, here's the request:
$.ajax({
url: '<url snipped>',
data: {},
success: onDone,
dataType: 'json',
cache: true,
progress: onProgress || function(){}
});
And here's the onProgress event handler I'm using (it's not too crazy):
function(jqXHR, evt)
{
// yes, I know this generates Infinity sometimes
var pct = 100 * evt.position / evt.total;
// just a method that updates some styles and javascript
updateProgress(pct);
});
A slightly more elegant variation on your solution would be to set a header like 'x-decompressed-content-length' or whatever in your HTTP response with the full decompressed value of the content in bytes and read it off the xhr object in your onProgress handler.
Your code might look something like:
request.onProgress = function (e) {
var contentLength;
if (e.lengthComputable) {
contentLength = e.total;
} else {
contentLength = parseInt(e.target.getResponseHeader('x-decompressed-content-length'), 10);
}
progressIndicator.update(e.loaded / contentLength);
};
I wasn't able to solve the issue of using onProgress on the compressed content itself, but I came up with this semi-simple workaround. In a nutshell: send a HEAD request to the server at the same time as a GET request, and render the progress bar once there's enough information to do so.
function loader(onDone, onProgress, url, data)
{
// onDone = event handler to run on successful download
// onProgress = event handler to run during a download
// url = url to load
// data = extra parameters to be sent with the AJAX request
var content_length = null;
self.meta_xhr = $.ajax({
url: url,
data: data,
dataType: 'json',
type: 'HEAD',
success: function(data, status, jqXHR)
{
content_length = jqXHR.getResponseHeader("X-Content-Length");
}
});
self.xhr = $.ajax({
url: url,
data: data,
success: onDone,
dataType: 'json',
progress: function(jqXHR, evt)
{
var pct = 0;
if (evt.lengthComputable)
{
pct = 100 * evt.position / evt.total;
}
else if (self.content_length != null)
{
pct = 100 * evt.position / self.content_length;
}
onProgress(pct);
}
});
}
And then to use it:
loader(function(response)
{
console.log("Content loaded! do stuff now.");
},
function(pct)
{
console.log("The content is " + pct + "% loaded.");
},
'<url here>', {});
On the server side, set the X-Content-Length header on both the GET and the HEAD requests (which should represent the uncompressed content length), and abort sending the content on the HEAD request.
In PHP, setting the header looks like:
header("X-Content-Length: ".strlen($payload));
And then abort sending the content if it's a HEAD request:
if ($_SERVER['REQUEST_METHOD'] == "HEAD")
{
exit;
}
Here's what it looks like in action:
The reason the HEAD takes so long in the below screenshot is because the server still has to parse the file to know how long it is, but that's something I can definitely improve on, and it's definitely an improvement from where it was.
Don't get stuck just because there isn't a native solution; a hack of one line can solve your problem without messing with Apache configuration (that in some hostings is prohibited or very restricted):
PHP to the rescue:
var size = <?php echo filesize('file.json') ?>;
That's it, you probably already know the rest, but just as a reference here it is:
<script>
var progressBar = document.getElementById("p"),
client = new XMLHttpRequest(),
size = <?php echo filesize('file.json') ?>;
progressBar.max = size;
client.open("GET", "file.json")
function loadHandler () {
var loaded = client.responseText.length;
progressBar.value = loaded;
}
client.onprogress = loadHandler;
client.onloadend = function(pe) {
loadHandler();
console.log("Success, loaded: " + client.responseText.length + " of " + size)
}
client.send()
</script>
Live example:
Another SO user thinks I am lying about the validity of this solution so here it is live: http://nyudvik.com/zip/, it is gzip-ed and the real file weights 8 MB
Related links:
SO: Content-Length not sent when gzip compression enabled in Apache?
Apache Module mod_deflate doc
PHP filsize function doc
Try changing your server encoding to gzip.
Your request header shows three potential encodings (gzip,deflate,sdch), so the server can pick any one of those three. By the response header, we can see that your server is choosing to respond with deflate.
Gzip is an encoding format that includes a deflate payload in addition to additional headers and footer (which includes the original uncompressed length) and a different checksum algorithm:
Gzip at Wikipedia
Deflate has some issues. Due to legacy issues dealing with improper decoding algorithms, client implementations of deflate have to run through silly checks just to figure out which implementation they're dealing with, and unfortunately, they often still get it wrong:
Why use deflate instead of gzip for text files served by Apache?
In the case of your question, the browser probably sees a deflate file coming down the pipe and just throws up its arms and says, "When I don't even know exactly how I'll end up decoding this thing, how can you expect me to worry about getting the progress right, human?"
If you switch your server configuration so the response is gzipped (i.e., gzip shows up as the content-encoding), I'm hopeful your script works as you'd hoped/expected it would.
We have created a library that estimates the progress and always sets lengthComputable to true.
Chrome 64 still has this issue (see Bug)
It is a javascript shim that you can include in your page which fixes this issue and you can use the standard new XMLHTTPRequest() normally.
The javascript library can be found here:
https://github.com/AirConsole/xmlhttprequest-length-computable
This solution worked for me.
I increased deflate buffer size to cover biggest file size I may have, which is going to be compressed generally, to around 10mb, and it yielded from 9.3mb to 3.2mb compression, in apache configuration so content-length header to be returned instead of omitted as result of Transfer Encoding specification which is used when loading compressed file exceeds the buffer size, refer to https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Transfer-Encoding for more info about chunked encoding header which is used in compression as well as more info about deflate buffer size in https://httpd.apache.org/docs/2.4/mod/mod_deflate.html#deflatebuffersize.
1- Include the following in your apache configuration, and note buffer size value is in bytes.
<IfModule mod_deflate.c>
DeflateBufferSize 10000000
</IfModule>
2- Restart apache server.
3- Incldue the following in your .htaccess file to make sure content-length header is exposed to JS HTTP requests.
<IfModule mod_headers.c>
Header set Access-Control-Expose-Headers "Content-Length"
</IfModule>
4- In onDownloadProgress event before calculating progress total percentage append following to retrieve total bytes value.
var total = e.total;
if(!e.lengthComputable){
total = e.target.getResponseHeader('content-length') * 2.2;
}
5- Note, I learnt by comparing, that lengthComputable is set to false, as flag indicates if content-length is passed in header, while relying not on Content-Length header omission but actually it’s Content-Encoding header, as I found when it is passed in file response headers, lengthComputable is only then set to false, it seems as a normal behaviour as part of JS HTTP requests specification Also, the reason why I multiplied by 2.2 the total from compressed content-length, because it achieves more accurate download/upload progress tracking with my server compression level and method, as the loaded total in HTTP progress returned reflects the decompressed data total instead of compressed data thus it requires tweaking the code logic a little bit to meet your server compression method as it may vary than mine, and first step is to examine the general difference of the compression across multiple files and see if multiplying by 2 e.g. results with closest value to the decompressed files size i.e. original size and multiply accordingly and yet make sure by multiplication the result is still smaller or equal but not bigger than original file size, so for the loaded data its guaranteed reaching and most likely as well as slightly surpassing 100 in all cases. Also, there is hacky enhancement for this issue solution that is by capping progress calculation to 100 and no need to check if progress exceeded while taking the relevant point on assuring to reach 100% into implementation must be addressed.
In my condition, this allowed me, to know when each file/resource loading has completed i.e. check total to be like the following where >= used to take into account slight surpassing 100% after compressed total multiplication to reach decompressed or if percentage calculating method was capped to 100 then use == operator instead, to find when each file completed preloading. Also, I thought about resolving this issue from roots, through storing fixed decompressed loaded totals for each file i.e original file size and using it during preloading files e.g. such as the resources in my condition to calculate progress percentage. Here is following snippet from my onProgress event handling conditions.
// Some times 100 reached in the progress event more than once.
if(preloadedResources < resourcesLength && progressPercentage < 100) {
canIncreaseCounter = true;
}
if(progressPercentage >= 100 && canIncreaseCounter && preloadedResources < resourcesLength) {
preloadedResources++;
canIncreaseCounter = false;
}
Also, note expected loaded total usage as fixed solution it's valid in all circumstances except when oneself have no prior access to files going to preload or download and I think its seldom to happen, as most of times we know the files we want to preload thus can retrieve its size prior to preloading perhaps through serving via PHP script list of sizes for the files of interest that is located in a server with HTTP first request, and then in second, preloading request one will have each relevant original file size and or even before hand store as part of code, the preloaded resources fixed decompressed size in associative array, then one can use it in tracking loading progress.
For my tracking loading progress implementation live example refer to resources preloading in my personal website at https://zakaria.website.
Lastly, I'm not aware of any downsides with increasing deflate buffer size, except extra load on server memory, and if anyone have input on this issue, it would be very much appreciated to let us know about.
The only solution I can think of is manually compressing the data (rather than leaving it to the server and browser), as that allows you to use the normal progress bar and should still give you considerable gains over the uncompressed version. If for example the system only is required to work in latest generation web browsers you can for example zip it on the server side (whatever language you use, I am sure there is a zip function or library) and on the client side you can use zip.js. If more browser support is required you can check this SO answer for a number of compression and decompression functions (just choose one which is supported in the server side language you're using). Overall this should be reasonably simple to implement, although it will perform worse (though still good probably) than native compression/decompression. (Btw, after giving it a bit more thought it could in theory perform even better than the native version in case you would choose a compression algorithm which fits the type of data you're using and the data is sufficiently big)
Another option would be to use a websocket and load the data in parts where you parse/handle every part at the same time it's loaded (you don't need websockets for that, but doing 10's of http requests after eachother can be quite a hassle). Whether this is possible depends on the specific scenario, but to me it sounds like report data is the kind of data that can be loaded in parts and isn't required to be first fully downloaded.
I do not clearly understand the issue, it should not happen since the decompression should done by the browser.
You may try to move away from jQuery or hack jQuery because the $.ajax does not seems to work well with binary data:
Ref: http://blog.vjeux.com/2011/javascript/jquery-binary-ajax.html
You could try to do your own implementation of the ajax request
See: https://developer.mozilla.org/en-US/docs/DOM/XMLHttpRequest/Using_XMLHttpRequest#Handling_binary_data
You could try to uncompress the json the content by javascript (see resources in comments).
* UPDATE 2 *
the $.ajax function does not support the progress event handler or it is not part of the jQuery documentation (see comment below).
here is a way to get this handler work but I never tried it myself:
http://www.dave-bond.com/blog/2010/01/JQuery-ajax-progress-HMTL5/
* UPDATE 3 *
The solution use tierce third party library to extend (?) jQuery ajax functionnality, so my suggestion do not apply
Sending request with URL length ~ 4950 characters.
Getting the following XMLHTTPRequest.ResponseText:
ERROR
The requested URL could not be retrieved
While trying to retrieve the URL: ##my long url##
The following error was encountered:
Invalid URL
Some aspect of the requested URL is incorrect. Possible problems:
Missing or incorrect access protocol (should be `http://'' or similar)
Missing hostname
Illegal double-escape in the URL-Path
Illegal character in hostname; underscores are not allowed
Your cache administrator is webmaster.
But when I'm entering the same url in the browser it works just fine. I checked for possible errors(that are listed in the response text) - everything's ok.
When the number of parameters is less than ~200 the script works, so the clue must be in some limits. On the other hand there are no any settings in the apache or php or js.
Any advices or where should I look(some additional configs or whatever) for the solution?
Sending request with URL length ~ 4950 characters.
That is too much for Internet Explorer anyway. Also possibly for Opera, which IIRC has a limit of 4096 bytes for GET requests.
You should use POST for this amount of data.
Maximum URL length is 2,083 characters in Internet Explorer
Apache replies with 413 Entity Too Large if the URL exceeds approximately 4000 characters (request lines are capped to 8190 bytes).
Using the LimitRequestLine directive won't help, you'll have to recompile Apache with -D DEFAULT_LIMIT_REQUEST_LINE=some huge value if you absolutely want to send large GET requests.
EDIT: Some thoughts about the ~4000 character cap: 8190 looks a lot like 8192 with two bytes reserved for the string terminator, so there's a good chance that Apache uses UCS-2 or similar to store request lines, since DEFAULT_LIMIT_REQUEST_LINE is expressed in bytes, not characters.
That would give a 4095 character cap per request line, i.e. a maximum URL length of 4079 characters (taking into account the initial GET and the final CR/LF pair), which would make sense.