Many existing pieces of technology were used to build our service, the major pieces of which are listed in Appendix H.
At a very high level our service relied upon four key applications:
qpsmtpd is, in its own words, "a flexible SMTP daemon written in Perl".
Approximately 70% of our service implementation was compromised of a set of plugins for qpsmtpd which worked in a unified fashion to process mail. (The remaining 30% is split between the control panel, and the glue to bind the master machine together with the MX-only hosts).
Had qpsmtpd not already existed our service would never have been created.
Exim is the MTA, or mail transport agent, which we used to deliver the valid non-SPAM messages to their final destination. This is discussed in the section delivering non-SPAM email.
Exim was chosen primarily because it was the default MTA upon our platform of choice; Debian GNU/Linux, but also because it was flexible enough to allow the deliveries to be made without the use of MX records (because each domain we hosted would have MX records pointing at our servers).
rsync is an application which makes allows synchronising files and directories across machines in an efficient fashion.
We used rsync internally to copy the control panel data from the master machine to the satellite MX hosts, and also to synchronise the archives of rejected SPAM mail.
MySQL is a popular database engine, in use by many companies and individuals around the world. We used MySQL for three things:
To store user-accounts, and account details.
To store session details for users who logged into the online control panel.
To store an index of the rejected messages in the quarantine
It is worth being explicit that the list of domains hosted by our service was not stored in MySQL, and nor were the list of checks for each hosted domain.
Our service was provided by a number of hosts in different locations operated by different hosting companies. In total we used about six machines for different purposes, but for the purposes of simplicity we'll limit our documentation to the following hosts:
Looking back at the overview of our service it should be clear that the machine master.mail-scanning.com was the cornerstone of our service - that was the machine that hosted the control panel and was also place where the online quarantine of the rejected mail was stored.
To offer redundancy we ensured that we had multiple hosts configured to act as the satellite MX machines. These MX-only hosts were where the mail would be routed, and where the SPAM testing would occur. Each MX machine acted independently from any other host when it came to testing mail. If a given email was judged to be valid it would be delivered to the eventual destination directly - There was no need for a constant link to the master machine to be present, and it was explicitly designed to operate without one.
Users would begin using our service by pointing their MX records at two names in DNS - these names would be :
These names would point to the actual MX machines, so for example incoming.m-s.com would resolve to the machine incoming0.m-s.com and incoming1.m-s.com. That would leave the name backup.m-s.com to resolve to the machine incoming2.m-s.com and incoming3.m-s.com.
In terms of risks things were setup pretty cleanly:
If the master machine failed then users couldn't make changes to the setup of their domains, or view their quarantine.
If the master machine failed the MX machines would still continue to capture SPAM, and deliver valid mail. The service would essentially continue to run.
If a single MX machine failed then the others would continue to run, and incoming email would continue to be accepted and processed.
In short a single machine, or pair of machines, failed then the service as a whole would be unaffected. The failure of the central master machine would have had a more significant impact upon operations but it could be re-imaged from scratch in a very short amount of time.
We could make changes to domain settings without needing the control panel host to be online, by applying them directly to each satellite MX machine via rsync or ssh.
We had an off-site backup of the quarantine data which would be no more than 24 hours out of date. We'd restore this easily, then add the missing data directly from the local backups on the satellite MX hosts.