The Problem: Using Apache mod_deflate and mod_disk_cache (or other mod_cache) together can create far too many cached files.
The Background: Apache is a web server with many different modules you can load in to enhance it. Two common ones are mod_deflate and mod_cache (or mod_disk_cache).
Some web browsers are not able to handle gzipped content correctly, therefore it’s important to add in some logic to only send gzipped content to browsers who can handle it. Also, there are different types of files which are already compressed and hence trying to gzip them is a waste of time and resources, such as images, video, etc…
A common configuration may look like this:
# Insert filter
# Netscape 4.x has some problems…
BrowserMatch ^Mozilla/4 gzip-only-text/html
# Netscape 4.06-4.08 have some more problems
BrowserMatch ^Mozilla/4\.0 no-gzip
# MSIE masquerades as Netscape, but it is fine
BrowserMatch \bMSIE no-gzip
# NOTE: Due to a bug in mod_setenvif up to Apache 2.0.48
# the above regex won’t work. You can use the following
# workaround to get the desired effect:
BrowserMatch \bMSIE\s7 !no-gzip !gzip-only-text/html
# Don’t compress images
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png|swf|flv)$ no-gzip dont-vary
# Make sure proxies don’t deliver the wrong content
Header append Vary User-Agent env=!dont-vary
This basically says:
“For files under /”
“Unless it’s Netscape 4.x, then only compress text/html files”
“Or, if it’s Netscape 4.06-4.08, then don’t compress any files”
“But if it’s IE, don’t compress any files” – NOTE: this is different than the common version you see floating around which turns back on compression for IE. If you are loading content from a Flash swf within IE 6, that content can’t be compressed, even though IE 6 handles it fine. Flash doesn’t for some reason. So this setting is safer. If you aren’t using Flash, feel free to change this.
“but if it’s IE7, undo the no compression settings we made before, activating compression”
“but don’t compress already compressed files like images and video”
“Set the response Vary header to User-Agent so that any upstream caching or proxying won’t cache the wrong version and send a compressed version to a browser which can’t handle it, or an uncompressed version to a browser that should have gotten the compressed file”
Confused yet? :)
Mod_disk_cache allows you to specify various files to be cached on the web server and lets you set a cache expiration time, etc… It’s of great value when those files are being served out of a web application, and not coming from the local disk. For instance if Apache is serving files from an ATG instance, mod_disk_cache, lets you have the web server cache images, css, js, videos, etc… from your WAR. There’s also a memory based cache, mod_mem_cache, but it’s more trouble than it’s worth, and you can trust the linux kernel to cache recently accessed files in memory anyhow.
So this is where it gets tricky.
If a response has a Vary header set, mod_disk_cache will cache a different version of that file for each value of the Header that Vary references.
So for a file compressed as above, there will be a different version cached for each User-Agent. In theory this will mean that browsers which support gzip compressed content, will get the compressed content, and browsers which don’t, will get the uncompressed version.
This is a problem for several reasons: Firstly, you end up using far more disk space than you really need. Secondly, you negate the kernel’s in-memory file caching, since those 4,000+ version of the single file are being accessed, it won’t be able to simply keep the two different files (compressed and uncompressed) in memory. Thirdly, you make cleaning out the cache much slower, since you have to delete these thousands of extra files and their containing directories.
The Solution: I’m not sure… Any ideas?
Leave a Reply