I understand how to use version numbers and force the download of updated files on my own website but what can be done under this circumstance.....
I have some small scripts i've written for public use , and i have about 200 different websites who link my js file on their website. When i make an update to the file , i have to get them all to manually change the version number of the file so they and their users are re downloading the latest update.
Is there anything i can do on my server , the host , that can force the other sites to redownload latest version without anything manual on their end of things ?
There are 2 persistent problems in computing: Cache invalidation, naming things, and off-by-one errors.
If you want clients to get new versions of a file without changing the name of the file then you simply have to lower the max-age you set in the caching headers so that they check more frequently and get the new version in a reasonable period of time.
That's it. End of list.
You can somewhat mitigate the effects of the increased request load by also implementing an ETag header that the client will send back on subsequent requests and can be used to detect if the resource is unchanged and optionally serve a 304 Not modified response.
However, depending on the cost of implementing and running ETag checks you might just want to re-serve the existing resource and be done with it.
Or use a CDN which should handle all the ETag nonsense for you.
Related
I am serving Angular app as a static content in Express Server. When serving static files with Express, Express by default adds ETag to the files. So, each next request will first check if ETag is matched and if it is, it will not send files again. I know that Service Worker works similar and it tries to match the hash. Does anyone know what is the main difference between these two approaches (caching with ETag and caching with Service Workers), and when we should use one over the other? What would be the most efficient when it comes to performance:
Server side caching and serving Angular app static files
Implementing Angular Service Worker for caching
Do both 1 and 2
To give a better perspective, I'll address a third cache option as well, to clarify the differences.
Types of caching
Basically we have 3 possible layers of caching, based on the priority they are checked from the client:
Service Worker cache (client-side)
Browser Cache, also known as HTTP cache (client-side)
Server side cache (CDN)
PS: Some browser like Chrome have an extra memory cache layer in front of the service worker cache.
Characteristics / differences
The service worker is the most reliable from the client-side ones, since it defines its own rules over how to manage the caching, and provide extra capabilities and fine-grained control over exactly what is cached and how caching is done.
The Browser caching is defined based on some HTTP headers from the assets response (Cache-Control and Expires), but the main issue is that there are many conditions in which those are ignored.
For instance, I've heard that for files bigger than 25Mb, normally they are not cached, specially on mobile, where the memory is limited (I believe it's getting even more strict lately, due to the increase in mobile usage).
So between those 2 options, I'd always chose the Service Worker cache for more reliability.
Now, talking to the 3rd option, the CDN checks the HTTP headers to look for ETag for busting the cache.
The idea of the Server-side caching is to only call the origin server in case the asset is not found on the CDN.
Now, between 1st and 3rd, the main difference is that Service Workers works best for Slow / failing network connections and offline, since the cache is done client-side, so if the network is off, then the service worker retrieves the last cached information, allowing for a smooth user experience.
Server-side, on the other hand, only works when we are able to reach the server, but at the same time, the caching happens out of user's device, saving local space, and reducing the application memory consumption.
So as you see, there's no right / wrong answers, just what works best for your use case.
Some Sources
MDN Cache
MDN HTTP caching
Great article from web.dev
Facebook study on caching duration and efficiency
Let's answer your questions:
what is the main difference between these two approaches (caching with ETag and caching with Service Workers)
Both solutions cache files, the main difference is the need to reach the server or stay locally:
For the ETag, the browser hits the server asking for a file with a hash (the etag), depending on the file stored in the server, the server will answer with a "the file was not modified, use your local copy" with a 300 HTTP response or "here is a new version of that file" with a 200 HTTP response and a new file. In both cases the server always decides. and the user will wait for a round trip.
With the Service worker approach you can decide locally what to do. You can write some logic to control what/when to use a local copy (cached) or when go to the server. This is very useful for offline capabilities since the logic is happening in the client, and there is no need to hit the server.
when we should use one over the other?
You can use both together. You can define some logic in the service worker, if there is no connection return the local copies, otherwise go to the server.
What would be the most efficient when it comes to performance:
Server side caching and serving Angular app static files
Implementing Angular Service Worker for caching
Do both 1 and 2
My recommended approach is use both approaches. Although treat your files differently, the 'index.html' file can change, in this case use the service worker (in case there is no internet access) and if there is internet access let the web server answer with the etag. All the other static files (CSS and JS) should be immutable files, this is you can be sure the local copy is valid, in this case add a hash to the files' name (so they are always unique files) and cache them. When you have a new version of your app, you will modify the 'index.html' pointing to new immutable files.
I am building a simple web page that will run on a computer connected to a large TV, displaying some relevant information for whomever passes it.
The page will (somehow) fetch some text files which are located on a svn server and then render the them into html.
So I have two choices how to do this:
Set up a cron job that periodically checks the svn server for any changes, and if so updates the files from svn, and and (somehow) updates the page. This has the problem of violating the Access-Control-Allow-Origin policy, since the files now exist locally, and what is a good way to refresh a page that runs in full screen mode?
Make the javascript do the whole job: Set it up to periodically ajax request the files directly from the svn server, check for differences, and then render the page. This somehow does not seem as elegant.
Update
The Access-Control-Allow-Origin policy doesn't seem to be a problem when running on a web server, even though the content is on the same domain..
What I did in the end was a split between the two:
A cron job update the files from svn.
The javascript periodicly requests the files using window.setInterval and turning on the ifModified flag on the ajax request to only update the html if a changed had occured.
On what basis does javascript files get cached? Say I load a file with the name 'm-script.js' from one site and on another website I use the same name 'm-script.js' but with different contents. Will the browser fetch the new one, or just look at the name and load it from the cache? The urls for both the m-script.js file are different (obviously).
Thanks.
If the url is different the cached copy will not be used. A new request will be made and the new file will be downloaded.
There would be a huge security and usability issue with the browser if a Javascript file cached from one website was used on another.
Browsers cache files by their full URI.
This thread( How to force browser to reload cached CSS/JS files? ) will help you to understand.
Since nobody has mentioned it yet, there is a lot more involved in HTTP caching than just the URI. There are various headers that control the process, e.g. Cache-Control, Expires, ETag, Vary, and so on. Requesting a different URI is always guaranteed to fetch a new copy, but these headers give more control over how requests to the potentially-cached resource are issued (or not issued, or issued but receive back a 304 Not Modified, or...).
Here is a detailed document describing the process. You can also google things like "caching expires" or "caching etag" for some more specific resources.
I have a really simple site that I created. I am trying to test JS caching in the browser but it doesn't seem to be working. I thought that most major browsers cached your JS file by default as long as the file name doesn't change. I have the site running in IIS 7 locally.
For my test I have a simple JS file that is doing a document write on body load. If I make a change to the JS file (change the text the document write is writing), then save the file, I see that updated when refreshing the browser. Why is this? Shouldn't I see the original output as long as the JS file name hasn't changed?
Here is the simple site I created to test.
When you refresh your browser, the browser sends a request to the server for all the resources required to display the page. If the browser has a cached version of any of the required resources, it may send an If-Modified-Since header in the request for that resource. When a server receives this header, rather than just serving up the resource, it compares the modified time of the resource to the time submitted in the If-Modified-Since header. If the resource has changed, the server will send back the resource as usual with a 200 status. But, if the resource has not changed, the server will reply with a status 304 (Not Modified), and the browser will use its cached version.
In your case, the modified date has changed, so the browser sends the new version.
The best way to test caching in your browser would probably be to use fiddler and monitor requests and responses while you navigate your site. Avoid using the refresh button in your testing as that frequently causes the browser to request fresh copies of all resources (ie, omitting the If-Modified-Since header).
Edit: The above may be an over-simplification of what's going on. Surely a web search will yield plenty of in-depth articles that can provide a deeper understanding of how browser caching works in each browser.
A web app I'm developing uses lots of asynchronously loaded images that are often modified over time while their URLs are preserved. There are several problems with this:
If I do not provide the images with caching explicitly disabled in HTTP headers, the user will often receive an out of date image version, but doing so substantially increases server load.
How can I take the cache control away from the browser and manually evaluate if I should use the cached image or reload it from the server?
Since there are many separate images to be loaded, I also parallelize image downloads over different hostnames (i.e. the image01.example.com, image02.example.com, but all these hostnames resolve to the same physical server). Since the NN of the hostname is generated randomly, I also get cache misses where I could have retrieved the up-to-date image from the browser cache. Should I abandon this practice and replace it with something else?
What cache control techniques and further reading material would you recommend to use?
To force a load, add a nonsense parameter to the URL
<img src='http://whatever/foo.png?x=random'>
where "random" would be something like a millisecond timestamp. Now, if what you want is to have the image be reloaded only if it's changed, then you have to make sure your server is setting up "Etag" values for the images, and that it's using appropriate expiration and "if modified since" headers. Ultimately you can't take the cache control away from the browser in any way other than your HTTP headers.
Instead of generating NN randomly, generate it from a hash of the image name. That way the same image name will always map to the same hostname, and you'll still have images distributed across them.
I don't have a good suggestion but web implementation advice is abundant on the Internet, so I'd say start with Google.