Apache Web Serving with Jaguar, Part 4by Kevin Hemenway, coauthor of Mac OS X Hacks: 100 Industrial Strength Tips and Tricks
Editor's note: Kevin Hemenway covered a lot of ground in the first three parts of this web-serving primer, starting with the basics and moving on to topics such as CGI, SSI, PHP, and access control. In his fourth article, he takes a step back from the major features and focuses on what you, the reader, have been asking about.
Whistle a sour ditty! Trumpet a happy tune, pirouette a silly maneuver -- something magical has happened. Your boss, that proponent of Windows dedication and desire, was rather impressed with your Mac OS X web server. In fact, he commissioned the entire GatesMcFarlaneCo staff to poke around "our glorious new intranet" and see what they thought. Naturally, the feature requests and "maybe you should"s came rolling in.
In this, the fourth of the trilogy (Adams would be proud!), we're going to take a step back from the major features and explore a bit into what else you can do with a stock Apache installation. The features below can be applied to any Apache, and most require stopping and starting before they become active.
Default Index Documents
In the last two articles, we talked about using Server Side Includes (SSI) and PHP, and instructed our Apache to parse .shtml files for SSI statements, and .php files for PHP code. We also quickly gave some examples of a working index.shtml, as well as an informational index.php.
Most of you (including Garrett from GatesMcFarlaneCo's accounting department) noticed that when we changed our index.html to one of the names above (index.shtml or index.php), Apache no longer loaded that page by default. This produced an automatically generated listing of all of the files in that directory. Not only is this unfriendly for our visitors, but it can potentially be a security hazard.
Fixing this is easy. As with all of our configuration changes, we want to open the /etc/httpd/httpd.conf file in a normal text editor, like BBEdit or
pico. We're looking for something called
DirectoryIndex, which tells Apache what file to display when one hasn't been specified (which is the case with a URL like
http://127.0.0.1/~morbus/). After searching, we should see a line similar to:
Previously in this series
Apache Web Serving with Jaguar -- Mac OS X Hacks coauthor Kevin Hemenway updates his popular Apache Web Serving series of articles for Jaguar. If you missed the original series, or just need a little brush-up, then be sure to check out this first installment.
Apache Web Serving with Jaguar, Part 2 -- Mac OS X Hacks coauthor Kevin Hemenway continues updating his original Apache Web Serving series of articles for Jaguar. In this installment, Part 2, he explores the world of CGI access.
Apache Web Serving with Jaguar, Part 3 -- In the first part of this series, Kevin Hemenway showed you how to easily start serving web pages from your Mac OS X computer. In the second article, he explored the world of CGI access. Today, he moves forward with a look at PHP and simple access controls.
For Mac OS X, Apache has been configured to automatically display index.html files when only a directory has been supplied, as in the URLs above. When we renamed our index.html to index.shtml or index.php for testing, Apache couldn't find its
DirectoryIndex, and decided to spit out what it could find -- the contents of the directory itself.
We're not restricted to only one possible
DirectoryIndex. We could use index.html all of the time, index.php some of the time, and perhaps insomnia caused the rather suggestive zzzdex.shtml. Apache can be told to look for all of these, in order of preference:
DirectoryIndex index.html index.php zzzdex.shtml
In this case, we're saying "Hey, if someone doesn't request a particular file in a URL, then look for index.html. If it's there, cool, display that. If not, try looking for index.php. If that's not there, try zzzdex.shtml. If that's not there, then yeah, I suppose you can automatically generate an index."
You can add as many entries as you wish to the
DirectoryIndex, but you do want to try to keep the most common filename first. If you're serving thousands of pages a second, a properly ordered
DirectoryIndex will save you a tiny bit of time and processing.
Of course, our trusty Garrett thinks the automatically generated indexes are "ugly and unbecoming of the GatesMcFarlaneCo mystique." While we can certainly question the company's "mystique" (lemmings as mascots?), it's probably simpler just to turn autogeneration off. This is a simple matter of removing the word
Indexes. If you do a search for this in your Apache config file, you'll happen upon:
Options Includes Indexes FollowSymLinks MultiViews
You should remember this as the line that we added
Includes to when we were fiddling with SSI. By removing
Indexes and restarting Apache, you're stopping the index autogeneration for the specified directory and its subdirectories (which, in this case, is anything in /Library/WebServer/Documents/).
With the above
Indexes change, if Apache can't find any of the filenames listed in the
DirectoryIndex, it will complain with an error like "You don't have permission to access / on this server." This may not be exactly what you wanted, either, so let's continue on with ...
Custom Error Pages
Much like ghost sites have become a standard Internet occurrence, custom error pages are also becoming status symbols. There's nothing fancy in creating an error page -- it's just a plain old HTML document that you tell Apache to display instead of its regular response.
Say we created a simple HTML page called oops.html that has a cutesy little "I can't believe it's not butter" error message. We save the file in /Library/WebServer/Documents/ and we want Apache to display this for errors instead of its default. Rip open your Apache configuration file and do a search for
ErrorDocument. You'll see a large blurbage of text, in which the important lines look like:
# ErrorDocument 500 "The server made a boo boo."
# ErrorDocument 404 /missing.html
# ErrorDocument 402 http://some.other_server.com/subscription_info.html
These three commented lines demonstrate the three different methods of defining an error. In the first example, the quoted text is passed directly to the browser (you can use HTML if you wish). The second example tells Apache to display the missing.html file located in the document root (
/). The final example will tell Apache to redirect the user to
The numbers you see above, like
402, are also important. These are error codes (defined in the HTTP 1.1 RFC) that represent the reasons why the error occurred. The most common error is 404, often seen as "404 Not Found." Uncommenting the second line above would tell Apache that you want the missing.html file to be shown each time a 404 error is triggered. Likewise, error 500 is an "Internal Server Error," and often occurs when CGI scripts or other server programming goes awry.
If you recall from above, Apache will spit out a "Forbidden" error message if index autogeneration has been turned off. If we look in the RFC, we can see that the error code for "Forbidden" is 403. With this knowledge, we could configure our
ErrorDocuments like so:
ErrorDocument 500 /oops-500.html
ErrorDocument 404 /oops.html
ErrorDocument 403 /oops.html
With this configuration, we're telling Apache to display oops.html for errors "404 Not Found" and "403 Forbidden," and oops-500.html for "500 Internal Server Error"s. We're leaving
402, "Payment Required," commented, since it's rarely seen in the wild.
Error documents can get pretty smart. For instance, you could send all errors off to a CGI script that would find out what the incorrect URL was, and whether the user clicked on a link from another site. You could then redirect the user to the nearest possible match, based on where they initially tried to go.
Pages: 1, 2