This article is the second and final installment describing my efforts to defend my systems from spam. The first article explains some necessary concepts and terminology. This article will dig more into the details of an actual implementation with my mail system. One thing to note is that I used qmail for my mail system (hence the title), but the information in here could apply to just about any email server in production today.
In the previous article, I covered the history and the protocol used by network level spam defenses but not the existing landscape of RBL Providers that supply blocklists. There are quite a few of them out there, and I needed to select one for my system. At first, I polled a few friends to see what they were doing. Most had tried a blocklist at some point. I found that some people mix various blocklists and usually don't trust them enough for their corporate machines. Some switch providers periodically since the quality isn't stable. One friend had to get new blocks of IP addresses for his service since he was in the same network block as a spammer. This form of collateral damage caused him to be very negative about the whole subject.
The biggest problem with a network level defense has been the providers. Although they all conform to the protocol, they vary wildly on what goes into their blocklist database. This is a direct result of how they came to be. Almost all of them are grassroots efforts by individuals or small groups, each with different opinions and policies about how they operate. Some groups are very private and do not disclose much about their members or their policies. Some of them have very sensitive trigger fingers and others do not. Some groups are very aggressive about adding to their lists while others are extremely lax.
Without a good policy and consistent operation, I was leery of letting anybody control my mail server. It is easy to understand why it's important to be cautious about this. No one wants to disrupt their incoming mail. If an RBL provider consistently blocks the wrong hosts, it might take a while to fix. I might have to build my own special list of exceptions to their blocklist. This seemed like the kind of work I didn't want to add to my already long list of duties.
Towards the end of my research, I had one conversation that turned out to be really interesting. I used to work with a guy named Mark Fletcher at eGroups. He had been working on the spam problem for a little while and we'd come to the same conclusions. However, he had decided that he had a new twist and he started a service called Trustic to provide it.
Fletcher decided to build a trust network for email servers on the Internet. Trust networks are becoming a popular technique for getting good results out of systems that could potentially include untrustworthy individuals (such as spammers). The best and most impressive examples of trust networks today are eBay, Google, and Advogato.
Like other trust systems, Trustic takes recommendations from registered users about IP addresses of other systems. Each user has a level of trust. Users build their trust by making accurate recommendations over time. In order for a host to become untrusted, the cumulative trust level of the recommendations has to be above a certain threshold. For example, if two well-trusted users mark a server as untrustworthy, that server would become untrusted. If a bunch of new users tried to mark a server as untrusted, it would take a large number of them. If someone abuses their trust rating, their trust level is reduced and becomes harder to raise. There are more descriptions on Trustic's web site, but that is the gist of the technique.
The trust system provides a blocklist that allows the many good users of the Internet to mark things as trusted or un-trusted very quickly. Unlike a lot of other RBL providers, Trustic also added a good web interface, email reports, and other features that make this system easy to use. As a result of these policies and the web interface, it was easy to trust blocking email to Mark's system. As an added bonus for me, he incorporated a lot of my feedback into the design before he fully launched in January of 2003.
To use his service, I logged in and got an account. The registration gave me a number to use when making an IP4R query of the system, when forwarding spam to the Trustic system, or when sending and receiving recommendations. The number allows Trustic to tailor its responses to the particular user. Each user can have his own blocklist rules. Originally, Trustic relied on the registrant's IP address for submissions or queries. This didn't work with the dynamic IP addresses assigned by most ISPs.
Now that I have a provider for blocklist information, how do I use it with my mail server? Well, before we get to that, we must go over qmail a bit in case there are any readers who aren't yet comfortable with qmail configuration. Once that little step is out of the way, it is a lot easier to understand the actual steps involved to integrate with Trustic or some other IP4R provider. In fact, it took me much more time to write this article than to install and test this setup. (If you don't run qmail, you can skip this section, but if you are curious, it shouldn't be a hard read.)
qmail, written by Dan Bernstein (DJB), follows a few simple principles
in its design. Once you understand these, understanding what I'm about to
describe becomes much easier. In order of priority, DJB wanted qmail to
be secure, extensible, reliable, and fast. One of the first design
decisions was to divide the email system into several smaller
programs. This is the same successful technique that the original Unix
architects used when building a text processing system from commands like
grep, cut, or cat. Each small
program that makes up qmail does just one particular task and usually runs
as a less-privileged user. Each tries to do the absolute minimum required
for that particular task. This keeps the programs from "becoming too many
things to too many tasks", which usually causes programs to look like
"spaghetti code".
Because the programs and code are smaller and simpler, the code is easier to both debug or audit. Also, when the programs do interact with each other, they use a simple pipeline API. Such a system allows you to insert or remove different pieces from the pipeline for your specific situation. Finally, the programs are designed to be efficient with the operating system's resources (CPU and memory). This has a nice side effect of making the system fast even on older systems.
To summarize,
As a result of qmail's design, it was an easy choice for many of the Internet services I worked at in the past. It was easy to scale the package to handle enormous mail loads. It was free and it worked on our favorite operating systems. It also won the security contest hands down compared to any other mail package. Naturally, I would end up running the system for my own personal mail system. I have been running the same code for about 5 years without recompiling. I can't remember how many times I've seen other mail systems get listed on CERT or Bugtraq. I've never had any issues with qmail, and I like that a lot. It just runs and runs.
|
Now that we have some of that light background on the philosophy of qmail, let's see how it expresses itself through an incoming email.
qmail, like most email systems, is all about moving email into and out
of various queues for delivery. qmail has one queue where it receives all
incoming email to be analyzed for delivery. Email is injected into that
queue via a program called qmail-queue. True to the DJB
philosophy, that program insures that email properly gets into the
incoming queue safely and securely. It will not respond with a success
code unless every step went properly. It is the responsibility of other
small programs to receive mail via SMTP, QMQP, or the command line, and
then properly feed the mail to qmail-queue. For our case,
we're only concerned with the SMTP handler, since that is how systems
receive mail from the internet. The specific qmail program that handles an
SMTP transaction is qmail-smtpd.
Since qmail programs are designed to do the minimum necessary, the
qmail-smtpd program just handles the SMTP conversation on a
socket. To be clear, it will not setup the socket for listening or wait
for connections. It relies on some other program to do those things. This
is an interesting design choice because it allows us to insert other
programs to perform checks in front of qmail-smtpd.
In the past, most servers on Unix would use the
inetddaemon to setup and listen on sockets. qmail, being a
program from that era, was probably just conforming to that norm. These
days, it is rare for a server to use an inetd since it is
poor at handling lots of connections or a hostile Internet. The
interesting thing here is that the older design style allows us to add new
filtering capabilities to qmail without having to change any of the
existing qmail code. So in order to solve the inetd
deficiencies and to keep the system extensible, DJB wrote another small
program called tcpserver. Its sole purpose is to perform
socket setup, filtering, and listening. It has several parameters for
setting TCP options, checking a simple list of IP addresses to block, or
performing anti-spoofing checks. When a connection does arrive,
tcpserver sets some environment variables and starts the
program specified on it's command line. The main environment variable we
are concerned with is TCPREMOTEIP,which contains a string
representation of the remote host's IP address.
Normally, in a qmail installation, tcpserver will run
qmail-smtpd as its next program in the pipeline. However, we
want to filter our incoming SMTP connections before we accept an
email. Therefore we need something to query our Trustic account via IP4R
before we accept an email via TCP. As luck would have it, DJB wrote a
program to do just this. That program is called rblsmtpd.
The rblsmptd program, as you'd expect, does one basic
thing. It checks the incoming connection using IP4R. If all goes well, it
runs a program specified on its command line. This is that same basic
pipeline API of handing off the socket. In order for rblsmtpd
to do its job, it checks the TCPREMOTEIP environment
variable, making an IP4R query against some system specified on the
command line. Depending on the outcome of that and the influence of a few
command line arguments, it will either end the SMTP transaction or run the
next program. In our case the next program to run will be the
qmail-smtpd.
rblsmtpd has some important command line arguments that
have important semantics, so I'll cover them in detail here. (These
descriptions are straight from the documentation). They handle how your
system deals with an IP4R request success or failure. That success or
failure in turn determines how and when mail will be rejected. Obviously,
this is important.
Command line options for rblsmtpd:
-r base: use base as an IP4R/RBL source. An IP
address a.b.c.d is listed by that source if d.c.b.a.base has a TXT record.
rblsmtpd uses the contents of the TXT record as an error message
for the client.-B: (default) use a 451 error code for IP addresses listed in
the RBL.-b: use a 553 error code for IP addresses listed in the
RBL.-C: (default) handle RBL lookups in a "fail-open" mode. If an
RBL lookup fails temporarily, assume that the address is not listed.-c: handle RBL lookups in a "fail-closed" mode. If an RBL
lookup fails temporarily, assume that the address is listed (but use a 451
error code even with -b.For my system, I chose the following change to my tcpserver's
startup line:
rblsmtpd -b -c -r 1234567.query.trustic.com \
/var/qmail/bin/qmail-smtpd
In order to understand the effects of these, let met give a quick primer on mail rejection or bouncing. There are really only two forms of mail bounces, in layman terms, "Hard Bounce" and "Soft Bounce". A Hard Bounce causes the sending mail system to give up immediately on delivering an email, generating an error email or "bounce" to be delivered to the sender. A Soft Bounce causes the sending mail system to give up temporarily on delivering an email. The sending mail server may then try again after an hour or some other specified timeout. If the mail continues to get a "Soft Bounce" code from the receiving mail server for some specified time (between two days and a week), the sending mail server will give up and deliver a bounce message. In both cases, the sending mail server takes responsibility for generating or delivering the bounce.
In the above configuration, I chose the -b and
-c command line parameters. With -b, I'm
choosing to cause a Hard Bounce to any email from an untrusted host. This
is important because I want senders to know immediately if there is a
problem. With the -c parameter, I'm choosing to Soft Bounce
if Trustic should happen to have any downtime. I wouldn't want to allow
spam into my system just because of a small outage and I don't mind
delaying that email. Besides, I monitor my systems so I could change this
if it ever happened for a long time. If someone were blocked, they would
know immediately that the email delivery failed.
Finally, Trustic provides the optional TXT record that the IP4R
protocol specifies (see previous article on the IP4R protocol). The
rblsmtpd program will include that message in the SMTP error
code. The sending mail servers will then copy that SMTP error into the
bounce message to the sender. Trustic puts a URL into the message so the
sender can go to a site and read the details on why the block
occurred.
Here is an example from my logs:
553 Message rejected, please see
http://www.trustic.com/help/bounce?ip=200.163.45.155
As always, test your setup once you have it set up. You wouldn't want to lose mail for half a day because you didn't test something.
After I set the system up, not much happened. I sent recommendations and only some of them blocked spam. The service was too new to be effective. However, after about fifty users joined, things started to happen. Soon I was blocking about 8 to 10 spam a day without making those recommendations. After a few weeks, the real test came. I got hit with 1500 email attempts to my server. The attack wasn't as severe as the previous one, but it could have caused the same problems. This time, however, it produced zero load on my system. I was dropping all of the requests. I didn't even know it was occurring until I checked my logs for that day. Success!
After that, another interesting thing happened. After a while, my recommendations were having a good effect on the other mail systems using Trustic. I could see in the reports that some of my recommendations were blocking hundreds of spam emails from being delivered to other systems. It felt really good to give back to the others that helped block spam for me.
I hope these two articles have been achieved their goal of providing some good coverage of network level spam defenses. From my own recent experiences, I have seen how the use of some simple, existing protocols and a trust network could become a serious deterrent to the new spam attacks. In the future, these defenses may protect us from more than just spam. I look forward to seeing more people joining in and applying these systems to their own networks.
Dru Nelson has been on the Internet since 1988. After starting an ISP in Florida, he moved to the San Francisco Bay area and has been involved with large Internet infrastructure at companies like Four11 (Yahoo Mail), eGroups (Yahoo Groups), and Plaxo. He is now the CTO and co-founder of BrightRoll.com.
Return to the Linux DevCenter.
Copyright © 2009 O'Reilly Media, Inc.