Manipulating browser cache

Manipulating browser cache - javascript

A web app I'm developing uses lots of asynchronously loaded images that are often modified over time while their URLs are preserved. There are several problems with this:
If I do not provide the images with caching explicitly disabled in HTTP headers, the user will often receive an out of date image version, but doing so substantially increases server load.
How can I take the cache control away from the browser and manually evaluate if I should use the cached image or reload it from the server?
Since there are many separate images to be loaded, I also parallelize image downloads over different hostnames (i.e. the image01.example.com, image02.example.com, but all these hostnames resolve to the same physical server). Since the NN of the hostname is generated randomly, I also get cache misses where I could have retrieved the up-to-date image from the browser cache. Should I abandon this practice and replace it with something else?
What cache control techniques and further reading material would you recommend to use?

To force a load, add a nonsense parameter to the URL
<img src='http://whatever/foo.png?x=random'>
where "random" would be something like a millisecond timestamp. Now, if what you want is to have the image be reloaded only if it's changed, then you have to make sure your server is setting up "Etag" values for the images, and that it's using appropriate expiration and "if modified since" headers. Ultimately you can't take the cache control away from the browser in any way other than your HTTP headers.
Instead of generating NN randomly, generate it from a hash of the image name. That way the same image name will always map to the same hostname, and you'll still have images distributed across them.
I don't have a good suggestion but web implementation advice is abundant on the Internet, so I'd say start with Google.

Related

Browser cache on files you host

I understand how to use version numbers and force the download of updated files on my own website but what can be done under this circumstance.....
I have some small scripts i've written for public use , and i have about 200 different websites who link my js file on their website. When i make an update to the file , i have to get them all to manually change the version number of the file so they and their users are re downloading the latest update.
Is there anything i can do on my server , the host , that can force the other sites to redownload latest version without anything manual on their end of things ?

There are 2 persistent problems in computing: Cache invalidation, naming things, and off-by-one errors.
If you want clients to get new versions of a file without changing the name of the file then you simply have to lower the max-age you set in the caching headers so that they check more frequently and get the new version in a reasonable period of time.
That's it. End of list.
You can somewhat mitigate the effects of the increased request load by also implementing an ETag header that the client will send back on subsequent requests and can be used to detect if the resource is unchanged and optionally serve a 304 Not modified response.
However, depending on the cost of implementing and running ETag checks you might just want to re-serve the existing resource and be done with it.
Or use a CDN which should handle all the ETag nonsense for you.

On what basis are javascript files cached?

On what basis does javascript files get cached? Say I load a file with the name 'm-script.js' from one site and on another website I use the same name 'm-script.js' but with different contents. Will the browser fetch the new one, or just look at the name and load it from the cache? The urls for both the m-script.js file are different (obviously).
Thanks.

If the url is different the cached copy will not be used. A new request will be made and the new file will be downloaded.
There would be a huge security and usability issue with the browser if a Javascript file cached from one website was used on another.

Browsers cache files by their full URI.
This thread( How to force browser to reload cached CSS/JS files? ) will help you to understand.

Since nobody has mentioned it yet, there is a lot more involved in HTTP caching than just the URI. There are various headers that control the process, e.g. Cache-Control, Expires, ETag, Vary, and so on. Requesting a different URI is always guaranteed to fetch a new copy, but these headers give more control over how requests to the potentially-cached resource are issued (or not issued, or issued but receive back a 304 Not Modified, or...).
Here is a detailed document describing the process. You can also google things like "caching expires" or "caching etag" for some more specific resources.

What should I be aware of when allowing users to upload images via a URL?

I'm working on a site where users can post notes. I'm considering allowing users to post images by providing a url to the image (ie not uploading it via a form).
However, I've learned that this can be used to do some kind of hacking, for example, users can paste an url that is not an image, so when the page was load, a GET request will be made to that url.
I'd like to know:
1. what other malicious things could be done and how can I stop them?
2. is there an easy way (just use JavaScript) to check if an url is an image?

There are a lot of issues to be concerned with when allowing users to upload arbitrary files to a server (which this is).
Firstly, you need to make sure the file is an image, and can only be accessed as an image (ie not executed as a script)
Secondly, the image can't be too large or act as a DoS to users or the server
Thirdly, you need to make sure you save the image in a secure way such that users can't overwrite sensitive files.
Fourthly, serve all uploaded content from a different origin than your primary content. Not only is this more secure, it can also be faster (no cookies)
Fifth, ensure you use ImageMagick or GD to process the images to remove any unwanted data in the image. Remember that image files could be used as containers for more than just visible data.
Six: Be aware of path injection, LFI, RFI, etc attacks
Seven: Don't rely on the extension to tell you what type of file an image is. Check this yourself.
However, I've learned that this can be
used to do some kind of hacking, for
example, users can paste an url that
is not an image, so when the page was
load, a GET request will be made to
that url.
Are users going to be pasting a link to an image or uploading a file from their own computer?
what else hacking things could be done with this?
Let's put it this way: the users are uploading arbitrary files to your server
is there an easy way(just use JavaScript) to check if an url is an
image?
Not really, and it wouldn't help. You want to do server side checking
List of things for you to read:
http://www.scriptygoddess.com/archives/2004/10/19/gifs-that-execute-a-php-script/
There are more issues...

If you embed a unknown URL as an image in HTML, the get request to this URL will be done either way, independent of it really being an image or not.
The question is whether this could do some damage, by either
revealing to the image's server that it (and the page it is embedded in) is being looked at (you can't really avoid this without copying the image to your server)
running a script on the image's server with the user context of the current viewer (who might be logged in there, and would not do something like this voluntarily), doing something different depending on the viewer (and potentially indirectly harmful to the viewer). You could avoid this by proxying the access, or copying the image to your server.)
or being misinterpreted by the user's browser as a script or other harmful content, doing strange things to the look of your page or even more malicious things. I think this is what most browsers try to protect against by cross-site access policies. This could be actually worse if you copied the image to your server.

Is there a way to prevent browser from sending a specific cookie?

I'm storing some preference data in cookies. However, I just noticed that this data gets sent to the server with every request. Is there a way to prevent that from happening?
A friend tipped off web storage, but this still leaves IE6/7 without a solution.

You can set cookies to be HTTP Only (so supporting browsers won't let JS access them), but not the other way around.
Web storage is the ideal solution, but you'll need to fallback to cookies for legacy browsers.
You can reduce the number of requests that include the cookies by moving some content (images, css and stylesheets in particular) to a different hostname, and limit the cookies to your primary host name.

The appropriate solution is to not store a huge amount of data in a cookie in the first place. Store it on your server, and only store a reference to the information (like a row identifier from a database) in the cookie.

Nope, no way to change it. Cookie data gets sent back with every single request to the same server, including requests for static stuff like images, stylesheets and javascript.
If you want to speed up the site and minimize server bandwidth, use a different domain name - or better yet, a CDN like Rackspace Cloudfiles - for your static stuff. The cookies won't get sent to the different domain.
Good luck!

Why cache AJAX with jQuery?

I use jQuery for AJAX. My question is simple - why cache AJAX? At work and in every tutorial I read, they always say to set caching to false. What happens if you don't, will the server "store" such requests and get "clogged up"? I can find no good answer anywhere - just links telling you how to set caching to false!

It's not that the server stores requests (though they may do some caching, especially higher volume sites, like SO does for anonymous users).
The issue is that the browser will store the response it gets if instructed to (or in IE's case, even when it's not instructed to). Basically you set cache: false if you don't want to user's browser to show stale data it fetched X minutes ago for example.
If it helps, look at what cache: false does, it appends _=190237921749817243 as a query string pair (random number, the actual one is the current time, so it's always....current). This forces the browser to make the request to the server for data again, since it doesn't know what that query string means, it may be a different page...and since it can't know or be sure, it has to fetch again.

The server won't cache the requests, the browser will. Remember that browsers are built to display pages quickly, so they have a cache that maps URLs to the results last returned by those URLs. Ajax requests are URLs returning results, so they could also be cached.
But usually, Ajax requests are meant to do something, you don't want to skip them ever, even if they look like the same URL as a previous request.
If the browser cached Ajax requests, you'd have stale responses, and server actions being skipped.

If you don't turn it off you'll have issues trying to figure why you AJAX works but your functions aren't responding as you'd like them to. Forced re-validation at the header level is probably the best way to gain a cache-less assimilation of the data being AJAX'd in.

Here's a hypothetical scenario. Say you want the user to be able to click any word on your page and see a tooltip with the definition for that word. The definition is not going to change, so it's fine to cache it.

The main problem with caching requests in any kind of dynamic environment is that you'll get stale data back some of the time. And it can be unpredictable when you'll get a 'fresh' pull vs. a cached pull.
If you're pulling static content via AJAX, you could maybe leave caching on, but how sure are you that you'll never want to change that fetched content?

The problem is, as always, Internet Explorer. IE will usually cache the whole request. So, if you are repeatedly firing the same AJAX request then IE will only do it once and always show the first result (even though subsequent requests could return different results).

The browser caches the information, not the server. The point in using Ajax is usually because you're going to be getting information that changes. If there's a part of a website or something you know isn't going to change, you don't bother with it more than once (in which case, caching is ok), that's the beauty of Ajax. Since you should only be dealing with information that may be changing, you want to get the new information. Therefore, you don't want the browser to cache.
For example, Gmail uses Ajax. If caching was simply left on you wouldn't see your new e-mail for quite awhile, which would be bad.

We Keep Coding

JavaScript is the programming language of the Web.