Sometimes the simple, cheap solution wins out over cool technology.
At my ISP, we still use a tape backup system for long-term backups but we also have two identical disk drives in each server. A RAID-1 mirror would be the obvious way to get the data onto both drives to protect against failures. But what's more common in your experience -- a hard drive failure or accidentally deleting an important file?
Instead of using RAID-1, I use a Perl script called "synchro" to synchronize the drive pairs each night. In this article, I will present the reasons I decided do it this way, and share my script with you.
RAID can increase performance, but only under the right conditions. For best results, more than two drives and SCSI controllers are usually the way to go. In my case, we have EIDE controllers. EIDE requires that the CPU do a lot the work in data transfer so the CPU becomes a bottleneck. In my tests of Linux software RAID-1 with EIDE drives, the performance hit was more than we could live with. Therefore this is not really an option for us.
RAID-0 (striping) can increase available space, but does not provide increased reliability. With RAID-0 (and RAID-4 and -5 for that matter), data is striped across multiple drives to combine several physical partitions into one larger logical one. I use Linux software RAID-0 on two 40-GB drives to create a large filesystem to hold our NNTP news cache. In this application, reliability is not an issue because it's only a cache, so even losing the entire drive pair would only make reading news slower. Performance for the cache is not an issue because total number of people accessing the news server simultaneously is never high. RAID-4 and -5 offer redundancy but require even more CPU time to implement in software.
RAID can increase reliability. With redundant configurations provided by RAID levels greater than 0, data is spread across multiple drives so that a single drive failure does not result in loss. I've used hardware RAID controllers in the past. I'd love to use something like a Vortex SCSI-RAID controller but our ISP operates on a small budget. I have found that being able to run down to a local discount store for replacement parts is far more practical than keeping emergency spares for exotic things like hardware RAID controllers on hand.
The complexity of a RAID setup (hardware or software) also makes more demands on the ISP staff; complicated systems can be very nerve-wracking when the phones are ringing because there is no server running to pick up the modem lines!
My server has survived two drive failures. Both times, Linux started emitting warning messages days before the drives failed, so we were ready with tape backups and replacement parts. Drives can fail suddenly, but they often give you lots of warnings. This reduces our need to have a RAID-1 mirror. Just keep an eye on those log messages!
By far, our most common problem has not been hardware failures. It has been human error. Files are deleted or incorrectly modified both by our own staff and by clients, and need to be restored quickly. In this case, a RAID system will not help. A "delete" command will instantly and efficiently remove the file from both the drives in a mirror. You are still left with spinning the backup tapes, which can take hours.
I try to use revision control (RCS or CVS) for all system files. This allows backing out changes as long as everyone is consistent about checking in changes. Things still sometimes slip past us and this usually does not help with clients' files.
|
Related Reading
|
So my goals are to keep a backup filesystem online at all times to replace files that are accidentally modified or deleted, and to have a complete drive available to deal with less common hardware failures.
The solution that I came up with for my server was to replicate data to the second drive once a day. It's like a RAID-1 mirror that takes a day to copy the files.
This approach is not perfect. With RAID-1, the files on the recovery drive would always be up to date, but this system is as good as a daily tape backup. It also does not help when deleted/changed files go unnoticed for more than a day -- they will end up disappearing from the secondary drive. Just be aware this little script is a supplement to a good tape backup scheme, not a replacement for it.
|
The heart of the synchro script is the rsync command. What synchro does is
automatically pass the right arguments to rsync for any of my servers,
so that I don't have to build an rsync command file for each server.
First, some terms. A partition is a slice of a hard drive and is
referred to by a device name. In Linux, the partition names for the
first IDE drive are usually /dev/hda1, /dev/hda2, and so on. For a
SCSI drive, the names are /dev/sda1, /dev/sda2, etc. A filesystem is
a formatted partition. The mount command is used to mount a
filesystem somewhere in the directory hierarchy and is referred to by
its "mount point." For example, the filesystem located in partition
/dev/hda7 could be mounted at /home and referred to as the /home filesystem.
I refer to a filesystem or partition containing original data as the source and the place to copy it as the destination.
synchro is written in Perl, any recent (5.x or better) version of Perl should work. It calls
some system commands including mount and optionally fsck. You
will need the rsync command which is often not installed by default.
If you use a popular Linux distribution, it is on your CD-ROM. You can
also obtain it from the primary FTP site.
The beauty of using rsync is that it only copies the files that have
changed. If a given filesystem does not change much over a day then it
can be thousands of times faster than using a copy or tar command.
|
'synchro' knows about different filesystems; I have tested it with the
usual Linux
Now when anyone uses the command |
As distributed, synchro assumes that both your hard drives will be
partitioned the same way. I put one drive on /dev/hda and the other
on the second controller at /dev/hdc. So, for example:
Source filesystem Partition Backed up in
/ /dev/hda1 /dev/hdc1
/home /dev/hdc7 /dev/hda7
This system makes it easy for me to remember find things when I need
to recover a file. If a file is removed from /home, I can use
mount to see that /home filesystem lives in /dev/hdc7 and then say
mount /dev/hda7 /mnt/synchro to temporarily make the backup copy
available. Normally all backup filesystems are left unmounted.
I put the code that determines the destination into a subroutine
called get_dest. If you have different requirements (such as different
drives than "a" and "c"), you can change the code in lines [70-94] to
customize it.
You can either explicitly pass the list of filesystems in on the
command line, or you can put them in a list in lines [45-52]. By default, I
look for /boot, /, /var, and /home. The command line overrides the built-in list.
synchro uses a built-in list called "extras" mostly to exclude things
that should not be copied, such as the /dev directory. The rsync
command does not handle the /dev directory gracefully! If you tell it
to copy /dev/hda1, for example, it tries to copy the entire unformatted
partition instead of just replicating the device file. When a
filesystem name matches an "extras+ entry, the right-hand part (after the
=> symbol) is added to the rsync command.
The default extras in lines [55-58] works well for all my systems.
I use /mnt/synchro as a temporary mount point. The script creates this
directory if it does not exist. Change line [68] if you want to use a
different location.
If you run synchro with a -h for help you will get this output:
This script synchronizes the partitions on two hard drives.
Usage: synchro [options] [filesystem...]
-d dryrun - show commands that would be run
without performing any actions
-f fsck - perform fsck commands on destination
partitions
-h show this message and exit
-n pass -n option to rsync so that it will report
without copying files
-v pass -v option to rsync so it will report
while copying
[[/font]]
When I install synchro on a new system, I first run it with -d to see
what commands it will execute. If they look okay, then I run it once
manually to copy everything. Then I run it again with -v. This time, it
will report on what files if any have changed.
Because synchro will never back up the /dev files, I use a tar command
pipeline during setup to copy the /dev files. Usually this is a
one-time thing because /dev files don't normally change unless you
change your hardware. Here is the command:
mount /dev/hdc1 /mnt/synchro
tar cvf - /dev | (cd /mnt/synchro; tar xpf -)
After I am satisfied that it's working correctly, I put an entry into
/etc/crontab to run it once a day. I use the -f option, so that the destination filesystems are checked everytime it's run. I made this a
command-line option so that you aren't forced to run it if you don't
want to.
If I am about to perform major changes, such as removing an account,
sometimes I will make a copy of /home using the command-line mode,
such as
synchro -v /home
The -v is passed on to rsync so that it will list out the files that
are changed.
Here is an outline of what synchro does. Line numbers are in brackets.
fsck command. [132-139]rsync to synchronize content. [146-150]That's it. Also of note in the script is the syscmd() subroutine in
lines [158-176]. All system commands are routed through here to make
it easy to run the script in "dryrun" mode. If -d is given as a
command-line argument, the command will be printed in syscmd, but not
executed.
I will readily admit I'd love to use hardware-supported RAID-1 in
addition using this daily rsync scheme, but my tiny IT budget just
does not allow it. I've used various incarnations of this script for a
number of years now. I hope you find it useful, too.
Brian Wilson wrote most of this article while sitting in the Marin headlands overlooking the Golden Gate Bridge. He claims that bicycles and laptops and corporate downsizing definitely have their advantages.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.