In my web app we serve up all images with HTTP cache headers good for 24 hours, and also ETags. But sometimes the Javascript client app has cause to suspect that an image might have been updated. In those cases I would like to force the browser to revalidate the image cache, without actually breaking the cache.
For example, I get a user record (in JSON) which has a recent LastUpdated date on it. There are several possible reasons the LastUpdated date could have changed: the user might've changed their nickname, or joined a new board, or changed their image. So there's a good chance the image did not change, but we need to check.
I'm aware that I could re-request the image with a cache breaker appended to the URL. But that would force the image to reload whether it had changed or not, cause two entries in the browser's cache, and force me to update all my images with the new url. What I really want is to make the browser re-request the same URL, and send proper If-None-Match and/or If-Modified-Since headers in the request so that it will get a 304 if the image hasn't changed.
Is there any way to accomplish that in Javascript?
Since the image lives at the server, in a cache, the server is where the query needs to take place.
I would do a check before page load, force the server to renew the particular image in cache if needed. The "HTML" part should always assume all img-tags etc are correct.
Best ways of renewing single files in server cache is a different beast IMO.
This is actually rather simple to accomplish. Make an AJAX request for the image with the Cache-Control header set to max-age=0.
The server must respond with either the updated image (and a new set of cache headers), or with a 304 containing a Cache-Control header like public, max-age=123.
The 304 with a Cache-Control header is very important because it instructs the browser to update the freshness information associated with the resource.
When the request completes, remove the old <img> element from the DOM and reinsert a new one pointing to the same src.
This works because the request Cache-Control header tells the browser to ignore its cache and go to the network. When the browser gets a response back, it updates its cache – even though you don't actually do anything with the AJAX results.
Now when you remove the old element and insert a new one, the browser grabs the recently updated image directly from its cache.
In Chrome's network inspector, you see two requests: the AJAX request, then the request generated by reinserting the <img> element. However, if the server has set the Cache-Control headers properly, the second "request" is fulfilled by the browser's cache, and nothing is actually sent over the wire.
function updateImage(el) {
el = $(el);
var src = el.attr('src');
$.ajax({
url: src,
headers: {
'Cache-Control': 'max-age=0'
},
success: function() {
var parent = el.parent();
el.remove();
// this assumes that the <img> has no siblings and no attributes/classes/etc...
parent.append('<img>').attr('src',src);
}
});
}
Related
I have two questions regarding HTTP cache using the Cache-Control header:
How does the browser identify which request can be fulfilled using the existing cache? Is the browser checking if the endpoint matches? But even requests to the same endpoint can have different body or config, I don't quite understand how does the browser know when to use the cache and when to not use the cache given the time it sends out the request is still within the time frame specified by max-age in the response's cache-control header?
I recently learned that both request and response can set max-age in their own cache-control header. I understand that the request's max-age would tell the server (or any intermediate caches) how fresh of response the client is willing to accept from them. The response max-age (or lack thereof) tells the client how long it can consider that response to be fresh. (feel free to correct me if I am wrong). But what would happen in this scenario:
let's say the response has a max-age for one year and then we send another request for the same resources with max-age being 0. Does that make the browser ignore the cache? or Does the browser accept the cache and not send out the request?
You can get information from this specification. According to the document,
The cache entry contains the headers of the request:
As discussed above, caching servers will by default match future
requests only to requests with exactly the same headers and header
values.
This means that you get one entry in your cache every time you make exactly the same request to the server (the cache can be personal or shared, like in a proxy). In practice, for entities that only cache GET requests, the key can be the URI of the request. By the process of normalization, two very similar requests can share a cache entry. The decision to use the cached entry depends on several factors, as detailed below. The figures in the document explain this very well. Bottom line, max-age only determines freshness, not the behavior of the cache.
According to this specification, the cache is never ignored if the entry exists. Even a fresh entry can be discarded to save disk space, and a stale entry can be kept long after it has expired. The diference is that a stale entry is not directly retrieved. In that case, the caching entity (browser/proxy/load_balancer...) sends a freshness request to the server. The server then decides whether the cached page is fresh.
In summary, if a cached page is fresh according to max-age and whatever other modifiers are used, the caching entity decides that the cached resource will be used. If it is stale, the server decided whether the cached resource can be used.
EDIT after comment:
To understand the difference between max-age sent by the client and the server, we need to dig into the http protocol. In section 5.2.1., It says
5.2.1. Request Cache-Control Directives
5.2.1.1. max-age
Argument syntax:
delta-seconds (see Section 1.2.1)
The "max-age" request directive indicates that the client is
unwilling to accept a response whose age is greater than the
specified number of seconds. Unless the max-stale request directive
is also present, the client is not willing to accept a stale
response.
The language seems to indicate that the server is not forced by the directive, but it is expected to honor it. In your example, this would means that the client directive prevails, as it is more restrictive. The client is saying "I do not want any page cached for more than 0 seconds", and the cache server is suposed to contact the server to fulfill the condition.
I'm using the Google "Page Speed" plug-in for Firefox to access my web site.
Some of the components on my page is indicated as HTTP status:
200
200 (cache)
304
By Google's "Page Speed".
What I'm confused about is the difference between 200 (cache) and 304.
I've refreshed the page multiple times (but have not cleared my cache) and it always seems that my favicon.ico and a few images are status=200 (cache) while some other images are http status 304.
I don't understand why the difference.
UPDATE:
Using Google "Page Speed", I receive a "200 (cache)" for http://example.com/favicon.ico as well as http://cdn.example.com/js/ga.js
But, I receive a http status "304" for http://cdn.example.com/js/combined.min.js
I don't understand why I have two JavaScript files located in the same directory /js/, one returning a http status 304 and the other returning a 200 (cache) status code.
The items with code "200 (cache)" were fulfilled directly from your browser cache, meaning that the original requests for the items were returned with headers indicating that the browser could cache them (e.g. future-dated Expires or Cache-Control: max-age headers), and that at the time you triggered the new request, those cached objects were still stored in local cache and had not yet expired.
304s, on the other hand, are the response of the server after the browser has checked if a file was modified since the last version it had cached (the answer being "no").
For most optimal web performance, you're best off setting a far-future Expires: or Cache-Control: max-age header for all assets, and then when an asset needs to be changed, changing the actual filename of the asset or appending a version string to requests for that asset. This eliminates the need for any request to be made unless the asset has definitely changed from the version in cache (no need for that 304 response). Google has more details on correct use of long-term caching.
200 (cache) means Firefox is simply using the locally cached version. This is the fastest because no request to the Web server is made.
304 means Firefox is sending a "If-Modified-Since" conditional request to the Web server. If the file has not been updated since the date sent by the browser, the Web server returns a 304 response which essentially tells Firefox to use its cached version. It is not as fast as 200 (cache) because the request is still sent to the Web server, but the server doesn't have to send the contents of the file.
To your last question, I don't know why the two JavaScript files in the same directory are returning different results.
This threw me for a long time too. The first thing I'd verify is that you're not reloading the page by clicking the refresh button, that will always issue a conditional request for resources and will return 304s for many of the page elements. Instead go up to the url bar select the page and hit enter as if you had just typed in the same URL again, that will give you a better indicator of what's being cached properly. This article does a great job explaining the difference between conditional and unconditional requests and how the refresh button affects them:
http://blogs.msdn.com/b/ieinternals/archive/2010/07/08/technical-information-about-conditional-http-requests-and-the-refresh-button.aspx
HTTP 304 is "not modified". Your web server is basically telling the browser "this file hasn't changed since the last time you requested it." Whereas an HTTP 200 is telling the browser "here is a successful response" - which should be returned when it's either the first time your browser is accessing the file or the first time a modified copy is being accessed.
For more info on status codes check out http://en.wikipedia.org/wiki/List_of_HTTP_status_codes.
For your last question, why ? I'll try to explain with what I know
A brief explanation of those three status codes in layman's terms.
200 - success (browser requests and get file from server)
If caching is enabled in the server
200 (from memory cache) - file found in browser, so browser is not going request from server
304 - browser request a file but it is rejected by server
For some files browser is deciding to request from server and for some it's deciding to read from stored (cached) files. Why is this ? Every files has an expiry date, so
If a file is not expired then the browser will use from cache (200 cache).
If file is expired, browser requests server for a file. Server check file in both places (browser and server). If same file found, server refuses the request. As per protocol browser uses existing file.
look at this nginx configuration
location / {
add_header Cache-Control must-revalidate;
expires 60;
etag on;
...
}
Here the expiry is set to 60 seconds, so all static files are cached for 60 seconds. So if u request a file again within 60 seconds browser will read from memory (200 memory). If u request after 60 seconds browser will request server (304).
I assumed that the file is not changed after 60 seconds, in that case you would get 200 (ie, updated file will be fetched from server).
So, if the servers are configured with different expiring and caching headers (policies), the status may differ.
In your case you are using cdn, the main purpose of cdn is high availability and fast delivery. Therefore they use multiple servers. Even though it seems like files are in same directory, cdn might use multiple servers to provide u content, if those servers have different configurations. Then these status can change. Hope it helps.
I'm reading this great article on caching and there is the following there:
Validators are very important; if one isn’t present, and there isn’t
any freshness information (Expires or Cache-Control) available, caches
will not store a representation at all.
The most common validator is the time that the document last changed,
as communicated in Last-Modified header. When a cache has a
representation stored that includes a Last-Modified header, it can use
it to ask the server if the representation has changed since the last
time it was seen, with an If-Modified-Since request.
So, I'm wondering whether browser continues to send requests (for example HEAD) for a resource even if I specified Cache-Control: max-age=3600? If it doesn't, than what's the point in this header? Is it used after the max-age time passes?
The Cache-Control: max-age=3600 header means that the browser will cache the response for up to 3600 seconds. After that time has passed it may no longer serve the response without first confirming that it is still fresh.
In order to this, the browser can either:
Fetch the full resource with a normal GET request (transfers the whole response body again)
Or perform a revalidation based on an ETag (If-None-Match) or the Last-Modified header (If-Modified-Since), i.e. the client only fetches the response body if it has actually changed. This is of course only possible if the validator was present in the original response.
In short: the reason to use both max-age and a cache validator is to first cache the response for some time and then perform a bandwidth-saving revalidation to confirm the resource's freshness.
I am using HTML 5 history api to save state when ajax requests happen and i provide full html content if user request to same page with none ajax request.
"Re-open last closed tab" feature of browser brings last ajax request content without hitting to server. If browser would request without bring last request content then everything would work without problem. But browser just show last ajax request content.
I have been experienced this on Chrome 17, Firefox 10. (i haven't tried it on ie9 because it has no support history api)
What is well-known solution for this problem ?
Edit: These ajax requests are just "get" request to server.
it is really not possible to demonstrate it in jsfiddle.net because few reasons. You can demonstrate it in your localhost like below.
Make "get" request to server and pull json objects then push that url into history api like below.
history.pushState(null,null,url);
Then close that tab and click "Re-open last closed tab" feature of your browser. What do you see ? Json response body ? Browser shows it without making request to server, right ?
Problem was causing by http response headers. Headers was contain cacheable information for ajax requests so browser was showing that url content from cache without hit to database.
After removing cache params from response headers then browser was able to hit server without brings content from cache.
When you reopen a closed tab, the browser is allowed to reuse the data from cache for the given URL to populate the window. Since the data in cache is from the ajax request response, that's what it uses, and you see the JSON.
So that leads to the question: Why didn't the browser use the HTML from cache when satisfying the ajax request? Browsers use different rules to determine whether to use cached content depending on what they're doing. In this case, it appears Chrome is happy to reuse it when restoring the recently-closed tab, and not when doing the ajax request.
You can correct it by telling the browser to never cache the response. Whether that's desirable depends on your use case.
For instance, inserting these at the top of your file (after the opening <?php tag, of course) makes it not happen for me:
header("Cache-Control: no-store, no-cache, must-revalidate, max-age=0");
header("Cache-Control: post-check=0, pre-check=0", false);
header("Pragma: no-cache");
It all depends which browser you are using, and which optimisations are enabled.
Google Chrome for instance will keep the page in memory, so when you hit back and it goes to a new site, or when you do re-open closed tab - it will restore the page from memory.
Older/slower browsers will just refresh anything.
Though this shouldn't really be a problem, as your javascript state should be restored as well - it should be exactly the same in every way when they re-open that page.
Many analytic and tracking tools are requesting 1x1 GIF image (web bug, invisible for the user) for cross-domain event storing/processing.
Why to serve this GIF image at all? Wouldn't it be more efficient to simply return some error code such as 503 Service Temporary Unavailable or empty file?
Update: To be more clear, I'm asking why to serve GIF image data when all information required has been already sent in request headers. The GIF image itself does not return any useful information.
Doug's answer is pretty comprehensive; I thought I'd add in an additional note (at the OP's request, off of my comment)
Doug's answer explains why 1x1 pixel beacons are used for the purpose they are used for; I thought I'd outline a potential alternative approach, which is to use HTTP Status Code 204, No Content, for a response, and not send an image body.
204 No Content
The server has fulfilled the request
but does not need to return an
entity-body, and might want to return
updated metainformation. The response
MAY include new or updated
metainformation in the form of
entity-headers, which if present
SHOULD be associated with the
requested variant.
Basically, the server receives the request, and decides to not send a body (in this case, to not send an image). But it replies with a code to inform the agent that this was a conscious decision; basically, its just a shorter way to respond affirmatively.
From Google's Page Speed documentation:
One popular way of recording page
views in an asynchronous fashion is to
include a JavaScript snippet at the
bottom of the target page (or as an
onload event handler), that notifies a
logging server when a user loads the
page. The most common way of doing
this is to construct a request to the
server for a "beacon", and encode all
the data of interest as parameters in
the URL for the beacon resource. To
keep the HTTP response very small, a
transparent 1x1-pixel image is a good
candidate for a beacon request. A
slightly more optimal beacon would use
an HTTP 204 response ("no content")
which is marginally smaller than a 1x1
GIF.
I've never tried it, but in theory it should serve the same purpose without requiring the gif itself to be transmitted, saving you 35 bytes, in the case of Google Analytics. (In the scheme of things, unless you're Google Analytics serving many trillions of hits per day, 35 bytes is really nothing.)
You can test it with this code:
var i = new Image();
i.src = "http://httpstat.us/204";
First, i disagree with the two previous answers--neither engages the question.
The one-pixel image solves an intrinsic problem for web-based analytics apps (like Google Analytics) when working in the HTTP Protocol--how to transfer (web metrics) data from the client to the server.
The simplest of the methods described by the Protocol, the simplest (at lest the simplest method that includes a request body) is the GET request. According to this Protocol method, clients initiate requests to servers for resources; servers process those requests and return appropriate responses.
For a web-based analytics app, like GA, this uni-directional scheme is bad news, because it doesn't appear to allow a server to retrieve data from a client on demand--again, all servers can do is supply resources not request them.
So what's the solution to the problem of getting data from the client back to the server? Within the HTTP context there are other Protocol methods other than GET (e.g., POST) but that's a limited option for many reasons (as evidenced by its infrequent and specialized use such as submitting form data).
If you look at a GET Request from a browser, you'll see it is comprised of a Request URL and Request Headers (e.g., Referer and User-Agent Headers), the latter contains information about the client--e.g., browser type and version, browser langauge, operating system, etc.
Again, this is part of the Request that the client sends to the server. So the idea that motivates the one-pixel gif is for the client to send the web metrics data to the server, wrapped inside a Request Header.
But then how to get the client to Request a resource so it can be "tricked" into sending the metrics data? And how to get the client to send the actual data the server wants?
Google Analytics is a good example: the ga.js file (the large file whose download to the client is triggered by a small script in the web page) includes a few lines of code that directs the client to request a particular resource from a particular server (the GA server) and to send certain data wrapped in the Request Header.
But since the purpose of this Request is not to actually get a resource but to send data to the server, this resource should be a small as possible and it should not be visible when rendered in the web page--hence, the 1 x 1 pixel transparent gif. The size is the smallest size possible, and the format (gif) is the smallest among the image formats.
More precisely, all GA data--every single item--is assembled and packed into the Request URL's query string (everything after the '?'). But in order for that data to go from the client (where it is created) to the GA server (where it is logged and aggregated) there must be an HTTP Request, so the ga.js (google analytics script that's downloaded, unless it's cached, by the client, as a result of a function called when the page loads) directs the client to assemble all of the analytics data--e.g., cookies, location bar, request headers, etc.--concatenate it into a single string and append it as a query string to a URL (*http://www.google-analytics.com/__utm.gif*?) and that becomes the Request URL.
It's easy to prove this using any web browser that has allows you to view the HTTP Request for the web page displayed in your browser (e.g., Safari's Web Inspector, Firefox/Chrome Firebug, etc.).
For instance, i typed in valid url to a corporate home page into my browser's location bar, which returned that home page and displayed it in my browser (i could have chosen any web site/page that uses one of the major analytics apps, GA, Omniture, Coremetrics, etc.)
The browser i used was Safari, so i clicked Develop in the menu bar then Show Web Inspector. On the top row of the Web Inspector, click Resources, find and click the utm.gif resource from the list of resources shown on the left-hand column, then click the Headers tab. That will show you something like this:
Request URL:http://www.google-analytics.com/__utm.gif?
utmwv=1&utmn=1520570865&
utmcs=UTF-8&
utmsr=1280x800&
utmsc=24-bit&
utmul=enus&
utmje=1&
utmfl=10.3%20r181&
Request Method:GET
Status Code:200 OK
Request Headers
User-Agent:Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/533.21.1
(KHTML, like Gecko) Version/5.0.5 Safari/533.21.1
Response Headers
Cache-Control:private, no-cache, no-cache=Set-Cookie, proxy-revalidate
Content-Length:35
Content-Type:image/gif
Date:Wed, 06 Jul 2011 21:31:28 GMT
The key points to notice are:
The Request was in fact a request
for the utm.gif, as evidenced by the
first line above: *Request
URL:http://www.google-analytics.com/__utm.gif*.
The Google Analytics parameters are clearly visible in the query string
appended to the Request URL: e.g.,
utmsr is GA's variable name to refer to the client screen
resolution, for me, shows a value of
1280x800; utmfl is the variable
name for flash version, which has a
value of 10.3, etc.
The Response Header called
Content-Type (sent by the server back to the client) also confirms
that the resource requested and
returned was a 1x1 pixel gif:
Content-Type:image/gif
This general scheme for transferring data between a client and a server has been around forever; there could very well be a better way of doing this, but it's the only way i know of (that satisfies the constraints imposed by a hosted analytics service).
Some browsers may display an error icon if the resource could not load. It makes debugging/monitoring the service also a little bit more complicated, you have to make sure that your monitoring tools treat the error as a good result.
OTOH you don't gain anything. The error message returned by the server/framework is typically bigger then the 1x1 image. This means you increase your network traffic for basically nothing.
Because such a GIF has a known presentation in a browser - it's a single pixel, period. Anything else presents a risk of visually interfering with the actual content of the page.
HTTP errors could appear as oversized frames of error text or even as a pop-up window. Some browsers may also complain if they receive empty replies.
In addition, in-page images are one of the very few data types allowed by default in all broswers. Anything else may require explicit user action to be downloaded.
This is to answer the OP's question - "why to serve GIF image data..."
Some users will put a simple img tag to call your event logging service -
<img src="http://www.example.com/logger?event_id=1234">
In this case, if you don't serve an image, the browser will show a placeholder icon that will look ugly and give the impression that your service is broken!
What I do is, look for the Accept header field. When your script is called via an img tag like this, you will see something like following in the header of the request -
Accept: image/gif, image/*
Accept-Encoding:gzip,deflate
...
When there is "image/"* string in the Accept header field, I supply the image, otherwise I just reply with 204.
Well the major reason is to attach the cookie to it so if users go from one side to another we still have the same element to attach cookie to.
#Maciej Perliński is basically correct, but I feel a detailed answer will be beneficial.
why 1x1 GIF and not a 204 No-Content status code?
204 No-Content enables the server to omit all response headers (Content-Type, Content-Length, Content-Encoding, Cache-Control etc...) and return an empty response body with 0 bytes (and saving a lot of unneeded bandwidth).
Browsers know to respect 204 No-Content responses, and not to expect/wait for response headers and response body.
if the server needs to set any response header (e.g. cache-control or cookie), he cannot use 204 No-Content because browsers will ignore any response header by design (according to the HTTP protocol spec).
why 1x1 GIF and not a Content-Length: 0 header with 200 OK status code?
Probably a mix of several issues, just to name a few:
legacy browsers compatibility
MIME type checks on browsers, 0 bytes is not a valid image.
200 OK with 0 bytes might not be fully supported by intermediate proxy servers and VPNs
You don't have to serve an image if you are using the Beacon API (https://w3c.github.io/beacon/) implementation method.
An error code would work if you have access to the log files of your server. The purpose of serving the image is to obtain more data about the user than you normally would with a log file.