6.2. The quarantine structure

The layout of the quarantine evolved over time but didn't need to be terribly complex. Each rejected SPAM message would be kept on disk as a file in a Maildir hierarchy, and we used a MySQL database table to store an index of them.

In most cases the only time the user would interact with the rejected messages was via the quarantine - so we only needed to show the messages in a date-based order, and allow basic searching. With that in mind our database table was as naive and simple as it could be.

The table holding the index looked little different than this:

Example 6-3. Index of quarantine contents.


CREATE TABLE `q_archive` (
  `id`        int(11) NOT NULL auto_increment,
  `domain`    int(11) NOT NULL,
  `subject`   varchar(100) default '',
  `sender`    varchar(100) default '',
  `ondate`    datetime default NULL,
  `recipient` varchar(100) default NULL,
  `filename`  varchar(100) default NULL,
  PRIMARY KEY  (`id`),
  KEY `domain` (`domain`),
  KEY `subject` (`subject`),
  KEY `sender` (`sender`),
  KEY `recipient` (`recipient`),
  KEY `idx_i_s_s` (`id`,`sender`,`subject`)
);
 

This structure was sufficient to allow us to search the quarantine for messages destined to a particular recipient, or sent by a given sender address. The quarantine didn't allow the body of messages to be searched, although that was an obvious extension.

When a message was imported to the quarantine the message would be copied from the location it was initially synchronized to (which would be something like /home/secondaries/incomingN.mail-scanning.com/1-2-2009/hosted.org/new/abcdef123) to its final destination beneath /reject where it would live until it rotated itself out of the quarantine.

Like most of the archives we maintained we ensured that the rejection hierarchy was managed with subdirectories containing both the date and the domain name. This made it easy to expire the quarantine - keeping a consistent 31 days of history.

Given a message addressed to user@hosted.org we'd insert the record into the database, which would give us an ID. That ID would be a number such as 12345. From that we'd place the message into the file:

(Assuming the current date was the 10th of March 2009.)

This would leave us with a /reject hierarchy looking something like this:

Example 6-4. Sample layout.


/reject/
|-- 1-8-2009
|   |-- hosted.org
|   `-- user.org
|
|-- 2-8-2009
|   |-- hosted.org
|   `-- user.org
..
...
 

This layout made it very easy to expire the copies of old messages from disk, a job which happened once a day at around 4AM. Once the files were removed from disk we'd update the q_archive table running a query such as: