MacDevCenter    
 Published on MacDevCenter (http://www.macdevcenter.com/)
 See this if you're having trouble printing code examples


Apache Web Serving With Mac OS X

Apache Web-Serving with Mac OS X, Part 4

by Kevin Hemenway
01/29/2002

Editor's note: Kevin Hemenway covered a lot of ground in the first three parts of this Web-serving primer, starting with the basics and moving on to topics such as CGI, SSI, PHP and access control. In his fourth article, he takes a step back from the major features and focuses on what you, the reader, have been asking about.

Whistle a sour ditty! Trumpet a happy tune, pirouette a silly maneuver -- something magical has happened. Your boss, that proponent of Windows dedication and desire, was rather impressed with your Mac OS X Web server. In fact, he commissioned the entire GatesMcFarlaneCo staff to poke around "our glorious new intranet" and see what they thought. Naturally, the feature requests and "maybe you should"s came rolling in.

In this, the fourth of the trilogy (Adams would be proud!), we're going to take a step back from the major features and explore a bit into what else you can do with a stock Apache installation. The features below can be applied to any Apache installation, and most require stopping and starting before they become active.

Default Index Documents

In the last two articles, we talked about using Server Side Includes (SSI) and PHP. By doing so, we instructed our beloved Apache to parse .shtml files for SSI statements, and .php files for PHP code. We also quickly gave some examples of a working index.shtml, as well as an informational index.php.

Most of you (including Garrett from GatesMcFarlaneCo's Accounting) noticed that when we changed our index.html to one of the names above (index.shtml or index.php), Apache no longer loaded that page by default. This produced an automatically-generated listing of all of the files in that directory. Not only is this unfriendly for our visitors, but it can potentially be a security hazard.

Fixing this is easy. As with all our Apache configuration changes, we want to open the /etc/httpd/httpd.conf file in a normal text editor, like BBEdit or pico. We're looking for something called "DirectoryIndex," which tells Apache what file to display when one hasn't been specified (like "http://localhost/" or "http://127.0.0.1/~morbus/"). After searching, we should see a line similar to:

    DirectoryIndex index.html

For Mac OS X, Apache has been configured to automatically display index.html files when only a directory has been supplied, like in the URLs above. When we renamed our index.html to index.shtml or index.php for testing, Apache couldn't find its DirectoryIndex, and decided to spit out what it could find -- the contents of the directory itself.

We're not restricted to only one possible DirectoryIndex. We could use index.html all of the time, index.php some of the time, and perhaps insomnia caused the rather suggestive zzzdex.shtml. Apache can be told to look for all of these, in order of preference:

    DirectoryIndex index.html index.php zzzdex.shtml

Apache: The Definitive GuideApache: The Definitive Guide
By Ben Laurie & Peter Laurie
Table of Contents
Index
Sample Chapter
Full Description
Read Online -- Safari

In this case, we're saying "Hey, if someone doesn't request a particular file in a URL, then look for index.html. If it's there, cool, display that. If not, try looking for index.php. If that's not there, try zzzdex.shtml. If that's not there, then yeah, I suppose you can automatically generate an index."

You can add as many entries as you wish to the DirectoryIndex, but you do want to try to keep the most common filename first. If you're serving thousands of pages a second, a properly ordered DirectoryIndex will save you a tiny bit of time and processing.

Of course, our trusty Garrett thinks the automatically-generated indexes are "ugly and unbecoming of the GatesMcFarlaneCo mystique." While we can certainly question the company's "mystique" (lemmings as a mascot?), it's probably simpler just to turn autogeneration off. This is a simple matter of removing the word "Indexes." If you do a search for this in your Apache config file, you'll happen upon:


    Options Includes Indexes FollowSymLinks MultiViews

You should remember this as the line that we added "Includes" to when we were fiddling with SSI. By removing "Indexes" and restarting Apache, you're stopping the index autogeneration for the specified directory and its subdirectories (which, in this case, is anything in /Library/WebServer/Documents).

With the above "Indexes" change, if Apache can't find any of the filenames listed in the DirectoryIndex, it will complain with an error like "You don't have permission to access / on this server." This may not be exactly what you wanted either, so let's continue on with...

Custom Error Pages

Much like ghost sites have become a standard Internet occurrence, custom error pages are also becoming status symbols. There's nothing fancy in creating an error page -- it's just a plain old HTML document that you tell Apache to display instead of its default error page.

Say we created a simple HTML page called oops.html that has a cutesy little "I can't believe it's not butter" error message. We save the file in /Library/WebServer/Documents/ and we want Apache to display this for errors instead of its default. Rip open your Apache configuration file, and do a search for "ErrorDocument." You'll see a large blurbage of text, in which the important lines look like:


    # ErrorDocument 500 "The server made a boo boo."
    # ErrorDocument 404 /missing.html
    # ErrorDocument 402 http://some.other_server.com/subscription_info.html

These three commented lines demonstrate the three different methods of defining an error. In the first example, the quoted text is passed directly to the browser (you can use HTML if you wish). The second example tells Apache to display the missing.html file located in the DocumentRoot (/), The final example will tell Apache to redirect the user to some.other_server.com.

The numbers you see above, like 500, 404, and 402, are also important. These are error codes (defined in the HTTP 1.1 RFC) that represent the reasons why the error occurred. The most common error is 404, often seen as "404 Not Found." Uncommenting the second line above would tell Apache that you want the missing.html file to be shown each time a 404 error is triggered. Likewise, error 500 is an "Internal Server Error," and often occurs when CGI scripts or other server programming goes awry.

If you recall from above, Apache will spit out a "Forbidden" error message if index autogeneration has been turned off. If we look in the RFC, we can see that the error code for "Forbidden" is 403. With this knowledge, we could configure our ErrorDocument's like so:


    ErrorDocument 500 /oops-500.html
    ErrorDocument 404 /oops.html
    ErrorDocument 403 /oops.html

With this configuration, we're telling Apache to display oops.html for errors "404 Not Found" and "403 Forbidden", and oops-500.html for any "500 Internal Server Error." We're leaving 402, "Payment Required," commented, since it's rarely seen in the wild.

Error documents can get pretty smart. For instance, you could send all errors off to a cgi script that would find out what incorrect URL was visited, and if the user clicked on a link from another site. You could then redirect the user to the nearest possible match, based on where they initially tried to go.

User-Based Configurations

Patti. Dear, dear Patti. The cutest secretary in the world, but also a rabid collector of fax cover sheets. Being on the boss' good side has granted her the privilege of running a personal Web site, where she can share the dirt on Californian headers and Alabama footers. We didn't touch upon user-based configurations in the first three articles, but Mac OS X approaches them a bit differently than you'll find in most Apache installations.

Comment on this articleKevin has provided you with more than enough tools to get you into Apache hot water ... how goes it?
Post your comments

In most installations, user-based Web serving like http://127.0.0.1/~patti/ is handled generically -- for every user on the system, be it two or two thousand, the same configuration applies. If an administrator wanted to change the capabilities of user "mimi," he'd usually have to create a specific <Directory> block within the httpd.conf file.

Mac OS X makes this a lot easier by creating a config file for each user of the system - these files are located in /etc/httpd/users/ and take the form of username.conf. If I open patti.conf, for instance, I see:


    <Directory "/Users/patti/Sites/">
        Options Indexes MultiViews
        AllowOverride None
        Order allow,deny
        Allow from all
    </Directory>

Note that this looks very similar to the directory we've been modifying for our GatesMcFarlaneCo. site:


    <Directory "/Library/WebServer/Documents">
        Options Includes Indexes FollowSymLinks MultiViews
        AllowOverride None
        Order deny,allow
        Deny from all
        Allow from gatesmcfarlaneco.org
    </Directory>

Because of the similarities, everything we've learned in the previous articles can also be applied to these user-specific directories. Take a look at the modified patti.conf below. It allows SSIs and CGIs, and will block access from everyone but the local machine:


    <Directory "/Users/patti/Sites/">
        Options Includes Indexes Multiviews
        AllowOverride None
        Order deny,allow
        Deny from all
        Allow from 127.0.0.1
    </Directory>

    ScriptAlias /~patti/cgi-bin/ "/Users/patti/Sites/cgi-bin/"

With the above configuration, Patti can Web serve with the best of 'em, adding message boards or discussion groups to each specimen of her faxtastic collection. By modifying only the patti.conf file, we can turn on or off features for only her directory, without affecting the main GatesMcFarlaneCo configuration.

Changing Your Configuration With .htaccess Files

As you've run through these various tweaks and twiddles of the Apache configuration file, one thing has always remained true: to make the changes active, you've had to stop and start Apache after each edit. Not only is this tedious and subject to forgetfulness, it's also avoidable with a little thing called an .htaccess file.

The .htaccess file, when enabled, allows you to control and override a large portion of the Apache configuration without having to stop and start after every change. Once you've instructed Apache to enable .htaccess control, you no longer have to be a privileged user (like an Administrator) to enact changes.

Think of .htaccess files as user-modifiable Apache configurations that only affect the directories in which they reside. Let's search through our Apache configuration file and see what we find. Our first result for .htaccess is actually a comment:


    # This controls which options the .htaccess files
    # in directories can override. Can also be "All",
    # or any combination of "Options", "FileInfo", 
    # "AuthConfig", and "Limit".

    AllowOverride None

By now, this should be old hat to you -- this "AllowOverride" directive is contained within the <Directory> block we've been messing with for the main GatesMcFarlaneCo intranet.

Since .htaccess files can override a large portion of the Apache Webserver configuration, they're incredibly powerful, but also dangerous. A foolhardy user could easily disable or misconfigure parts of their site due to an incorrect directive. As such, .htaccess files have different levels of control. One of these levels is "None" -- in other words, .htaccess files have no control over any part of the Apache configuration. They're simply ignored. You can find more information about the different levels of control in the AllowOverride documentation at the Apache site.

For now, change the AllowOverride line to:


    AllowOverride All

This allows us to override everything available to us within our .htaccess file. In this case, we're changing the AllowOverride line for the /Library/WebServer/Documents directory. If you're looking to give your user directory .htaccess control, be forewarned -- it's not as perfect as you'd expect. You can turn on the .htaccess feature simply enough, but some directives that rely on Apache's DocumentRoot, like ErrorDocument, will fail. Sometimes, you can cheat -- in the case of ErrorDocument, you can refer to a URL instead of a local file.

For one final time, stop and start the Apache Webserver. Now what?

.htaccess files are plain text files, placed in the directory in which you want them to be active. We're going to create a quick and dirty example now, so open up a text editor and save an empty .htaccess file into the /Library/WebServer/Documents directory. After you've done that, take a look at the example .htaccess file below, which has been commented for the sake of your childlike innocence:


    # override the ErrorDocument defined in our
    # main Apache configuration file. use "404.html"
    # instead. if this .htaccess file is going to be
    # active under a user directory, this line will
    # need to be modified to something like (replaced
    # with your real domain/IP and username, of course):
    # ErrorDocument 404 http://domain.or.ip/~user/404.html

    ErrorDocument 404 /oops-404.shtml

    # hey, someone typo'd our contact page, so we'll
    # permanently redirect "contct.html" to the correct
    # filename, "contact.html". if using this under a
    # user directory, modify to "/~user/contct.html",
    # and be sure to tweak the URL appropriately.

    Redirect /contct.html http://localhost/contact.html

    # RedirectMatch's are useful to do mass redirections
    # based on certain match criteria. in this
    # example, we're redirecting ALL .html files in
    # this directory to .shtml files with matching names.
    # .htaccess files are read from top to bottom, so if
    # someone mistypes "contct.html", they'll be redirected
    # to contact.html with the above line, and then
    # redirected to contact.shtml with this line. 

    RedirectMatch (.*)\.html$ $1.shtml

As mentioned, you can use most directives that you've learned throughout this series. For example, if you wanted to turn on SSI, stop Apache from autogenerating indexes, and block access to only people from oreilly.com, you could add the following:


    Options Includes -Indexes
    Order deny,allow
    Deny from all
    Allow from oreilly.com

.htaccess files apply to the current directory, and all subdirectories, as long as none of the subdirectories have their own .htaccess file. If a subdirectory does have one, the contents of that .htaccess file are used instead.

Password Authentication

One of the most common uses of .htaccess files is password-protecting a directory. When protected directories are accessed, a visitor's browser will prompt for a username and password. If the visitor authenticates correctly, they're allowed in -- if not, an error 401 is triggered, and the visitor is denied.

So yes, Dan from Marketing, we did get your email (and its annoying and frequent follow-ups), and yes, we're going to password protect the "super secret ad campaign" directory you've been working oh-so-hard on (snicker, snicker, reese's pieces).

To start the process, we're first going to create the user database. This database will contain all the usernames and passwords that will be authenticated against -- they're not keyed to any specific directory, so you could use one database for three hundred users spread across two dozen directories. To create the database, get into your Terminal, and gaze blurry eyed at the command below:


    htpasswd -c /Library/WebServer/.htpasswd dan

It's nice and innocent, right? htpasswd is the name of the utility that creates and modifies this user database of ours. The -c flag says "if this database doesn't exist, create it." /Library/WebServer/.htpasswd is the full path to our database file, and you'll want to take special notice that it's outside Apache's DocumentRoot (which, in OS X, is defined as /Library/WebServer/Documents). Sticking the file outside the DocumentRoot ensures that no one can view this database from the Web. Finally, dan is the user that you want to add to the database. An output of this command is below:


    htpasswd -c /Library/WebServer/.htpasswd dan
    New password: ********
    Re-type new password: ********
    Adding password for user dan

You'll want to make sure that when you add new users to an existing database file that you do not use the -c flag. Doing so will overwrite your existing file with a brand new one. Not so good, bub. Adding a user is a simple matter (note the lack of the -c flag):


    htpasswd /Library/WebServer/.htpasswd mishka
    New password: *********
    Re-type new password: *********
    Adding password for user mishka

If you look at /Library/WebServer/.htpasswd, you'll see the added users:


    less /Library/WebServer/.htpasswd
    dan:Vcv7xTIIW6g7U
    mishka:3c4T6IdfWweU

Next, it's really just a matter of telling Apache what directory we want to secure. Open (or create) your .htaccess file, and add the following:


    AuthName "Uber Goober Ad Campaign"
    AuthType Basic
    AuthUserFile /Library/WebServer/.htpasswd

    require valid-user

Previously in the Series

Apache Web-Serving with Mac OS X: Part 1


Apache Web-Serving with Mac OS X: Part 2


Apache Web-Serving with Mac OS X: Part 3

AuthName will be shown as the title or description of the password box that a visitor's browser will show, and in Apache lingo, this is called a "realm". AuthType is set to the standard "Basic" authentication (a "Digest" authentication exists, but is outside the scope of this article). AuthUserFile should be self-explanatory.

The require line affords some discussion. With it, you can tell Apache to allow any user in the AuthUserFile access (as we've done above), or you can tell Apache to allow only certain people. In the example below, only the users "dan" and "mishka" can authenticate to realms with the name "Uber Goober Ad Campaign." Any other users in the AuthUserFile will be denied:


    require user dan mishka

Users can also be defined by groups -- for example, you could place "dan," "mishka," and "morbus" into a group called "Marketing," and "themadman," "ashcraft," and "sprocket" into a group called "Design." From there, you could restrict access by group instead of username. For these configurations and more about Digest authentication, refer to Apache's Authentication, Authorization, and Access Control docs.

Tomcat and Secure Servers

Some of the smarmier developers at GatesMcFarlaneCo (Matt and Jeff, particularly) are fans of Java servlets secured with SSL technology. I could cover those here, but Apple has already released some rather good articles on the subject over at their Internet Developer site. I heartily recommend you check out "Using mod_ssl", and "Java and Tomcat" (parts I and II).

Conclusion

A lot of rather nifty things can be done with a stock Apache install, and we've only touched on a few of the more common features above. We haven't played with how to modify the appearance of Apache's auto-indexes, how to use the mod_speling module to duplicate our spelling Redirect, or even how to set up fake VirtualHosts to more adequately mimic ISP environments.

Yet, we must move on. As we look at the list of requests for the GatesMcFarlaneCo intranet, only two or three remain, and they all involve something spooky called a "database." What is this monstrosity? What's EssQueueEll? Juan es muy guapo [1]. How do I install it, and even worse, what do I do upon success? Find out in part five of our Web Serving trilogy, available a few scant days after you start sweating with impatience.

[1] Bonus points to those who figure out the tenuous connection between this and the Hitchhiker trilogy joke.

Kevin Hemenway is the coauthor of Mac OS X Hacks, author of Spidering Hacks, and the alter ego of the pervasively strange Morbus Iff, creator of disobey.com, which bills itself as "content for the discontented."

Copyright © 2009 O'Reilly Media, Inc.