In many ways the testing of incoming messages was the hard part, but the coding was the most simple due to the use of qpsmptd.
We made many tests against each incoming message in order to determine whether it was a piece of SPAM mail, or was a valid message. Each test was performed independantly of any other tests, with the intention that we'd skip further tests once we'd arrived at the conclusion that a message was SPAM.
Common sense dictates that when you're dealing with a large volume of messages it makes sense to arrange the testing such that you can spend as few resources as possible - so we'd always perform tests such as DNS lookups ahead of tests such as virus-scanning (a test which is computationally more expensive).
Assuming none of the tests were disabled for a given domain we'd generally perform:
We'd perform various tests at connection time to see whether the connecting host had a reverse DNS entry, and if it looked like a residential IP address.
We could determine with a pretty high degree of accuracy whether an incoming message was going to be spam in a lot of cases with nothing more than the IP address of the connecting host.
Similar to the home-made IP lookup tests mentioned previously each domain could list a number of zones to perform dnsbl lookups.
Each client connecting to our servers would identify itself with HELO, or EHLO, and we'd test to see if that were valid. (See Section 5.6 for details of our specific HELO tests.).
Many badly coded clients would attempt to send a complete SMTP transaction via the connection - without waiting for a greeting from our server, and this was a definitive test of SPAM.
For each incoming connection we'd keep track of the use of unrecognised SMTP-commands, and invalid protocol usage. Invalid protocol included specifying RCPT TO without a valid MAIL FROM.
Each domain had an associated list of valid local-pars (i.e. The part before the @ sign in the email address).
Immediately discarding mail sent to non-existant addresses was a very cheap SPAM test, which thwarted dictionary attacks with minimal overhead.
We used the spambayes package to perform bayasian analysis and filtering of each incoming message. (See Appendix D for details of problems this ran into.)
We used the ClamAV virus scanner to detect viral content sent by email.
Similar to the use of ClamAV to detect virus we also examined the MIME types of incoming messages and were able to reject messages with malformed data, or containing executable attachemnts.
As described in Section 5.5 we examined the hyperlinks inside message bodies to detect SPAM.
Although we didn't perform global DKIM signature checking we did perform it upon a small list of domains known to use it, including:
gmail.com + googlemail.com
This helped prevent spoofing & phishing attacks common especially against the latter two domains.
If a sender domain listed SPF records in DNS we'd validate those to help determine if a message was legitimate or not.
TODO: Document more tests. Document explicit test ordering. Document the local global tests which wouldn't be disabled. (eg. badmailfrom, date-header, etc.)