Cookies

We use cookies to ensure that we give you the best experience on our website. You can change your cookie settings at any time. Otherwise, we'll assume you're OK to continue.

Durham University

Computing and Information Services

Caching of web pages

Why make pages cachable?

If a cache receives a request for a page, then it can serve the page directly from the cache almost instantly, rather than waiting for the main web server to generate it and send it. For static pages (HTML files) and images, this time difference between the cached version and the uncached version is usually quite small, unless the main web server is very busy.

For a dynamically-generated PHP page (especially those that also use MySQL databases), however, it may take seconds to generate the page. However, it will still only take thousandths of a second to retrieve it from the cache. This means that your users get much improved performance, for two reasons.

  1. Your page can be stored in the cache and can be served faster from the cache than it could be if it was regenerated each time.
  2. There is less need to make large requests to the web server itself, which means that any large requests that are still necessary can be served more quickly.

The cache will store the information according to rules based on the document's headers and contents. If it believes the document to be too old, it will check it again against the original web server. It is possible for a browser to force a cache to re-check its cached version, by sending a Pragma: no-cache header as part of its request. Most graphical browsers will do this automatically if you hold 'ctrl' or 'shift' and click the refresh button.

Where are pages cached

Pages may be cached in memory or on a disk by the web browser. Almost all web browsers have this capability. Also, proxies like wwwcache.dur.ac.uk cache pages requested from outside the University - other organisations and ISPs are likely to have their own cache. Finally, the address www.dur.ac.uk points to the University's 'reverse proxy' - a cache that solely caches the University web server.

Therefore, any page on the University web server will be cached at least once (at the reverse proxy) and almost certainly twice (at the browser as well). It may be cached considerably more than this if viewed from outside the University.

How to make a page cachable.

Normal HTML files, images, CSS files and other documents are all cachable by default. You need do nothing special for these.

PHP scripts, on the other hand, are not cachable by default. This means that you must add caching information manually to your pages. You can do this by using PHP to modify the HTTP response headers.

Last-Modified

The last-modified header says when the resource in question was last changed. This can be used by browsers and caches to make a guess as to when it needs checking again - a common setting is that they will check again after 10% of the document's age - i.e. if a document is 10 days old according to the Last-Modified header, they will not check again until the copy in the cache is over 1 day old. The setting used on the proxy in Durham is 50% of the document's age, or 36 hours, whichever is smaller.

Expires

Since Last-Modified only allows caches to make an educated guess, you should where possible send an explicit 'Expires' header. This allows you to say exactly when the document should no longer be considered reliable. Once this time is reached the cache should recheck the document and get a new copy.

Cache-Control

More sophisticated caching can be achieved using the Cache-Control header. However, for most purposes the two headers above should be sufficient.

Setting headers in PHP

To set a header in PHP, you use the header() function. Headers take the form Header-Name: Header-value. The Last-Modified and Expires headers should be in RFC-822 format.

In theory, the 'r' option to the PHP date function should do this - however, we have noticed some caches (such as Apache's mod_proxy) have trouble parsing this date, and recommend that you construct it explicitly yourself.

Sample function to set headers

This sample function sets the Last-Modified time to the modification time of the PHP script file, and the Expires time to a user set parameter measured in minutes.

As with anything that sets HTTP headers, it must be run before the first output from the script.

 function send_caching_headers($expiry_time) { // Find the last modified time as a Unix timestamp $last_modified = filemtime($_SERVER['SCRIPT_FILENAME']); // Set the expiry time $expires = time() + (60 * $expiry_time); // Define the RFC date format $format = "D, d M Y H:i:s O"; // Send the headers header("Expires: ".gmdate($format,$expires)); header("Last-Modified: ".gmdate($format,$last_modified)); } 

What pages can be cached?

The majority of pages can be cached if the proper headers are sent. Exceptions include:

  • Pages inside 'local' or 'password' directories.
  • Pages resulting from a POST request from a form.

Furthermore, any page that gives different results to different users (for example, by cookies) should not be made cachable.

Pages that give different results solely depending on the query string - i.e. the part of the URL after the '?' - can safely be made cachable, as the cache will store http://community.dur.ac.uk/j.bloggs/?a=1 separately to http://community.dur.ac.uk/j.bloggs/?a=2

What if the caching is making development inconvenient?

If you're working on a web page, it may not update after you save it and reload it - most web browsers have a mechanism to tell the caches to update their stored copy. In graphical browsers such as Netscape, holding 'shift' while pressing the reload button (Shift-Reload) will force the caches to get a fresh copy of the page from the original server.

In Internet Explorer, hold 'ctrl' and press the reload button instead.

Likewise, the Unix/Linux shell command 'wget -C off -O/dev/null URL' will update any server-cached copies of that URL.

There is a known problem with many versions of the Opera web browser not refreshing (even on Shift-Reload) CSS files included using @import. A workaround for this is to open the CSS file in a separate tab, and use Shift-Reload there, to get the browser to refresh its internal disk or memory cache. This problem occurs even if there are no caches between the original web server and the browser.