Like many free software geeks, I run a one-person Web hosting shop, a combination business, hobby, and community service. I've become accustomed to doing complex tasks not only easily, but also as cheaply as possible. Since most of the time my modest Web hosting is more hobby than business, I can't really afford to buy expensive -- or setup and manage -- complex software and hardware monitoring solutions.
I also, like many free software geeks, have a perverse, somewhat mysterious need for uptimes to be as long as possible. Even if it doesn't cost me money, I am bothered by unnecessary service interruption. There is a certain virtue that comes in doing a job excellently, even or especially if one is not doing it as one's primary vocation.
I suppose for many free software users, uptime mania is something of an occupational hazard. There is a kind of Zen-like sysadmin virtue which comes from implementing a clever, efficient, and inexpensive hack, but especially if that hack increases uptime and service quality.
But sometimes things go wrong, whether from being too tired to type the proper command at the proper time, or from the rare application or system bug, or from causes entirely outside of your control. And it's not always possible or practical to sit down at the console of a remotely colocated server in order to fix the problem.
For example, my main server is colocated 75 miles from where I live in Dallas, Texas, and I can't easily drive 150 miles roundtrip to fix a problem. In fact, I've never visited the NOC where my servers are colocated, so I couldn't even find it without navigation help.
And I can't always rely on the techs who work at my colocation host; sometimes they aren't available, or are too busy, or don't know exactly what needs to be done. And some colocation facilities charge additional fees for ad hoc system work like this.
One part of the answer to this common dilemma is to install a daemon-monitoring daemon (hereafter, DMD) and to invest in a wireless sysadmin device. But the real trick is doing that within the confines of a limited budget. If you've got one or a few servers running Linux remotely colocated, especially if they're halfway across the country, where you got a great deal on bandwidth, then this two-part article series is for you.
In this first part, I describe some of the available DMDs, and I
explain how to install and configure monit, the DMD I'm
using. In the next part of the series, I explain how to use a
Palm-enabled cell phone to do remote, wireless sysadmin work from
anywhere you can make or receive cell calls. I’ll also show you how to write a
very simple DMD-message routing mailbot, using a few lines of Python,
to make sure messages from your DMD get to you when and where you need
them. (Note: the system I've used to implement and test these
solutions is a Red Hat Linux box, but as far as I know, all of these
tools would work just as well on any Linux or BSD system.)
The single most indispensable tool of remote Linux or BSD server administration is undoubtedly SSH; actually, SSH is less a tool than it is the tool which makes remote public server admin practical in the first place. As long as I can get an SSH login to my remote machine, I can usually fix most problems fairly quickly.
Recently, though, when some security problems cropped up in SSH itself, I had to spend a few hours one Friday afternoon upgrading it. Which was not a big deal until I accidentally killed the wrong process -- the SSH daemon -- effectively locking myself out of my remote box.
Suffice to say, I convinced a sysadmin to drive to the NOC and
restart the SSH daemon on my box, after which I quickly changed the
root password. It was only then that I realized if I'd had a DMD
running, I would simply have had to wait a few minutes until it
restarted sshd for me.
Soon after that realization I started googling for "daemon
monitoring daemon"; I readily found several solutions, and I finally
chose to implement monit because it fit my situation the
best.
The first tool I evaluated, supervise, is part of Dan
Bernstein's daemontools
package. Bernstein has earned an impressive reputation for writing high quality tools, including qmail;
daemontools is no exception.
Using supervise to monitor Apache, for example, is as
simple as running:
[root@chomsky]# supervise /service/apache
supervise changes to the /service/apache directory and
invokes /service/apache/run, which it will re-invoke if
/service/apache/run dies.
daemontools includes
svstat, which reads status information about services it is
monitoring, which it stores in a binary format. That's a nice feature
since, as we'll see, DMDs can fill up log files quickly. Finally, you
can use svscan in order to more easily direct
supervise to monitor a collection of services.
I had two, mostly non-technical problems with
daemontools. First, compared to some of the other DMDs I
found, it isn't very customizable. It does what it does well, but
that's about all that it does. I couldn't figure out, for example, how
to get supervise to send me email easily -- it's possible,
but more trouble than I wanted to take on -- if it had to restart
Apache, for example. Apache is normally very stable, and I want to
know if it's being restarted often by a DMD.
Second, and this is the
more serious problem, daemontools has very specific ideas
about how services should be managed, ideas which don't jibe well with
Red Hat's approach. I'm not entirely sure Red Hat's approach is
better, but I'm stuck with it for now. If I were building a new Linux
server from scratch, I would likely use Bernstein's
daemontools, especially for supervise and
multilog. As things stand, however, I had to look elsewhere
for a solution easier to integrate with my existing system.
|
Related Reading
Using SANs and NAS |
Jim Trocki's mon, a DMD written in Perl, is very feature-rich and takes a slightly different approach than the other DMDs I review here. It rigorously separates service monitoring into programs which test conditions according to a schedule, called monitors, and programs which invoke actions, called alerts, depending on the outcome of a monitor.
One of the nice things about mon is that, despite being
written in Perl, you can write monitors and alerts in any programming
language you prefer, plop the script or binary into the write place,
and mon will do the rest. That's nice, especially if you
prefer Python to Perl, or Java to Python, or GNU Smalltalk to anything
else. It also allows for a more active user community to contribute
alerts and monitors to mon, which is also a very useful, free
software, Unix kind of thing.
A very long list of monitors
and alerts are
available for use with mon; so long, in fact, that it's very unlikely you'd have to write any monitors or alerts at all.
Another advantage is mon's very well-done Web interface, a
live demo of which you can play with at http://mon.lycos.com/. If only the
Web interface of more free software tools were half as well done as
mon's. Web interfaces, though, are less risky for use over
intranets than the public Web.
However, mon is too customizable, too extensible for my
use. I have rather modest expectations of a DMD, and while
mon could certainly fill the bill, its real sweet spot is
service monitoring on a large scale: dozens or hundreds of services
across dozens or hundreds of machines, including servers, routers,
network-accessible peripherals, and so on. I would not hesitate to use
mon in a large LAN or WAN context, especially given its
parallelization, asynchronous event, interservice dependency, and SNMP trap features.
|
Dms is a client-server DMD, like mon, and
includes a Visual Basic client front-end for use on Windows
machines. The server part appears to be Perl, and it should run on
most, if not all, free Unix variants. Even though it costs about $50
(US), which is more than I'm willing to spend, I would have evaluated
Dms in order to make some recommendation in this article. But
Dms appears to be neither free software nor open source.
I can recommend use of Dms only if you need to monitor a
free Unix variant, and you need to do so from a Windows box but cannot
use a browser as the monitor client. In that unlikely situation,
Dms may be worth the $50.
Finally, I found Jan-Henrik Haukeland's monit on
freshmeat.net. (I installed version 2.2.1 for this article; the newest
version is 2.3, the most significant change of which is the addition
of service grouping.) I chose to implement it because, unlike
supervise, it integrated easily into my existing server, and,
unlike mon, its feature set mapped very neatly onto my
requirement set. monit does just about exactly what I need a
DMD to do, and it doesn't do much else besides, which made it easier
to install, configure, and forget about.
While monit lacks some of the features of mon,
the features missing are ones I decided I could do very nicely
without. Because it is too often overlooked and undervalued, let me
say that chief among monit's virtues is its excellent
documentation, which the author conveniently provides in man form.
As for its feature set, monit runs as a daemon; can start, stop,
and restart the service daemons it monitors; can manage services
individually or in groups; logs either to its own logfile or to
syslog; has a very comfortable configuration and control syntax; can
do runtime and TCP/IP port checking and knows about the protocol of
most common service types, including HTTP, FTP, SMTP, POP, IMAP, NNTP;
can be configured to take actions depending on the stability of a
service over some time slice; will compute and monitor MD5 checksums
of service binaries; and does alert notification via
email. (monit also has a built-in Web server for remote
control, but the author does not recommend using it over the public
Web; I enthusiastically endorse that recommendation, at the very least
until monit gets Digest Authentication, as opposed to
Basic.)
Written in C, installing monit on my Linux box was as
simple as invoking the standard:
[root@chomsky monit-2.2.1]# ./configure; make
[root@chomsky monit-2.2.1]# make install
Once installed, you have two main tasks before you: first, you must gather the information monit needs in order to manage the
services you want it to manage; and, second, you have to configure
monit.
I decided I wanted it to monitor the daemons
which provide SSH, HTTP, and DNS on my remote box. (I also have
monit watching over my RDBMS and SMTP daemons, too, but for
the purposes of this article, I'll only show configuration examples
for the first three.) I subsequently started using monit to
monitor a misbehaving Zope server, which I describe below.
As for the information you need to gather about each service,
that's basic stuff and you shouldn't have any trouble; you need start
and stop scripts and the fully qualified path name of the service PID
file. The basic runtime setup of monit is to create a
/root/.monitrc, which contains configuration information for
each daemon to be monitored, plus general configuration directives in
the prologue. monit is then invoked thus (you can change some
configuration directives via command-line switches, but I like to put
that kind of stuff into the control file, whenever possible):
[root@chomsky monit-2.2.1]# monit -c /path/to/.monitrc
A working monitrc file looks something like this:
(1) #
(2) #$Id: monit.html,v 1.1 2002/04/30 16:52:36 kclark Exp $
(3) #
(4)
(5) set daemon 300
(6) set logfile /var/log/monit
(7)
(8) check apache with pidfile /var/log/httpd/httpd.pid
(9) start = "/root/apache-start"
(10) stop = "/root/apache-stop"
(11) checksum /usr/local/bin/httpd
(12) timeout(3, 3) and alert kendall@monkeyfist.com
(13) host foo.com port 80 protocol http
(14) host bar.org port 80 protocol http
(15)
(16) check sshd with pidfile /var/run/sshd.pid
(17) start = "/root/sshd-start"
(18) stop = "/root/sshd-stop"
(19) timeout(3, 3) and alert kendall@monkeyfist.com
(20) checksum /usr/local/sbin/sshd
(21)
(22) check named with pidfile /var/run/named.pid
(23) start = "/root/named-start"
(24) stop = "/root/named-stop"
(25) checksum /usr/local/sbin/named
(26) timeout(3, 3) and alert kendall@monkeyfist.com
(27) port 53 use type udp
The first few lines are obvious. I use RCS to manage important
config files, especially since I share sysadmin duties with a fellow
geek. That way we don't step on each other's toes. Lines 5 and 6
contain general monit configuration directives; the first
tells monit how often I want it to poll each service, and the
second tells it where to write its logfile.
The other directives that
are legal anywhere in the control file include setting an SMTP server
for monit alerts, setting the port number of its built-in HTTP
server, and specifying host names allowed to use the HTTP
server, including username-password pairs. One important note:
monit uses "localhost" as the SMTP server by default; it may
make sense in some cases to set it to a secondary SMTP server, if you
have one, in case your SMTP daemon is misbehaving. Postfix
has been amazingly reliable for me, so I haven't specified another
SMTP server.
The configuration of daemons for monit to monitor is
fairly straightforward, but there are some features covered in the man page I don't
discuss here, so it's worth a look. Lines 16 through 20 tell
monit how to monitor my SSH daemon, which was my original
reason for installing a DMD.
The format of service-monitoring
statements in the control file is flexible, but monit expects
the first line to be of the form: check [service name] with pidfile
[fully qualified path name of PID file]. The start and stop
declarations are not mandatory, but monit is less useful if
it has no way to restart a daemon when it has died.
I ask
monit to checksum the binary, because it's free and it would
be nice to know if it is tampered with. The timeout(foo,bar) and alert
statement is very useful; it instructs monit that if the
service has to be restarted foo times within bar cycles (in my case, 3
times in 3 cycles, or 900 seconds), I want to be alerted, since that's
usually an indication something needs explicit sysadmin attention.
Lines 13 and 14 are worth mentioning. They tell monit to
not only check the Apache binary but also to check Apache at the HTTP
protocol level, for which port and which virtual host. In the case
that Apache stops being able to answer requests -- because, say, one
of my users has published an article everyone suddenly wants to read
-- but is still running, monit may be able to alert me more
quickly than I would otherwise be alerted.
One of monit's most valuable bits, which wasn't
immediately apparent to me when I installed and configured it, is that
it can be run by any user on my system, which means users can use it
to monitor daemons which they are running, whether short or long
term.
I run the Zope Web application server under a special user
account, and lately it's been falling down more often than I'd like,
sometimes in the middle of the night, which means its sites are
unreachable until the next day. So I created another control file,
installed it in the Zope user account, and spawned another
monit instance to monitor Zope:
(1) set daemon 240
(2) set logfile /home/k/monit.log
(3)
(4) check Zope with pidfile /home/k/Zope-2.4.0-linux2-x86/var/zProcessManager.pid
(5) start = "/home/k/Zope-2.4.0-linux2-x86/start"
(6) stop = "/home/k/Zope-2.4.0-linux2-x86/stop"
(7) alert restart
(8) timeout(3, 3) and alert kendall@monkeyfist.com
(9) host foobarbaz.com port 8080 protocol http
In this case, I want a separate logfile, and I want monit
to check on Zope every 4 minutes, rather than every 5. This capability is useful in more than
production-service monitoring cases. For example, I'm starting work on
a WebDAV server in Python, and I expect it will be very unstable
at first. I will likely use a monit daemon to keep my
prototype WebDAV server running continuously while I iterate through
develop-debug cycles. I can set the daemon polling time very low, to
say 60 seconds, so that I never have to wait more than a minute to
retest the server, and I don't have to continually restart it by
hand.
Installing a DMD made me a better sysadmin and improved the quality
of the services I offer my users. It's also reduced the likelihood
that I'll have to drive 150 miles to restart the SSH daemon of my
remote box, which is worth the installation and configuration efforts
alone. And so far monit has fit nicely within my limited
budget.
Using a DMD also extends the range of services you may be able to offer since you can use it to provide high-quality service from software which might crash more often than you'd otherwise be able to accommodate. Finally, I've found some unexpected uses for a DMD, including easing some of the overhead of remote development work.
In the next article of this series, I will discuss the other part
of the story: how to fix problems that arise when you aren't in front
of your workstation. I'll show you how to write a very simple message
routing mailbot, which I use to make sure that, no matter where I am,
monit alerts can find me. I'll also show you how to use the
Kyocera Palm phone with the GEORDI Palm utilities to remotely
administer a misbehaving free Unix box, wirelessly, from anywhere you
can make or receive a cell phone call.
The only thing that's more geeky-cool than using SSH to fix a problem server, remotely and wirelessly, while you're watching Attack of the Clones in your favorite movie house, is doing it without breaking a poor sysadmin's limited budget.
Kendall Clark is the Managing Editor of XML.com.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.