Why use a CDN?

A Content Delivery Network, or CDN, is essentially a system of geographically distributed web servers which serve static content, typically images, video, and other bandwidth intensive files. This serves two purposes: it keeps your servers from having to handle those requests and it serves those files to the end user from a low latency server closer to the user (network-wise). Both of these aspects improve the user’s perception of page and site performance. CDNs can also be extremely useful for things like streaming video or other very high bandwidth uses.

How do CDNs work?

CDNs typically work in one of two ways: for some you have to deploy the files to the CDN manually via FTP or some similar mechanism while others work as a transparent proxy automatically loading the files from the source or origin (your servers) into the CDN as users request them. The latter is preferable as you don’t need to take the CDN into consideration when building your application’s page and referencing media, this also makes handling non-production environments more complex. Also it allows the media to be reloaded from the origin based on cache expiration headers, so you don’t need to do anything special during deployments of new media. However those CDN solutions also seem to be more expensive, so it’s a balance you have to weigh yourself.

Roll Your Own Apache Pseudo CDN

You can also roll a pseudo-CDN yourself using Apache. I call it a pseudo-CDN because unlike Akamai and other large providers you don’t get the advantages of hundreds or thousands of geographically distributed servers. You also don’t get lots of fancy math routing user’s requests to the quickest servers based on location, network congestion, and more. What you do get is transparent proxying and off-loading the request handling from your application servers.

This means you don’t have to do anything special or complex when coding your web application and your JSPs to facilitate the CDN, and it means that your application servers are freed up from having to handle the requests for static media, large and small, which means they have more CPU time available for handling the real dynamic processing of your web application.

Apache makes this simple by way of the mod_disk_cache module. I’d recommend avoiding the mod_mem_cache. Even though it sounds like it would be the preferred caching mechanism, I have had significant problems with mem_cache, and have abandoned it. If you’re using Linux (and you should be) the kernel’s ability to aggressively cache recently accessed files means that when you’re using mod_disk_cache, Apache will cache the files you specify on the local hard drive and will use all available RAM to cache those files in memory for rapid serving. If you plan on using mod_gzip and mod_disk_cache together, please read my post on the issues encountered using them together.