Appendix B. Significant directories

As noted throughout this document our service was largely configured and operated via a collection of flat files and directory hierarchies.

The most important directory located upon any of the m-s.com hosts was /mf - that was the prefix for all of our code, templates, and plugins.

The reason for a common prefix was partially an implementation detail as the the code behind our control panel and the collection of qpsmtpd plugins were written in Perl, and each needed to be able to load and use a series of library modules which abstracted away the direct use of the /srv directory.

Using a common prefix for our library code we could easily load them from our code like so:

Example B-1. Loading our modules from a qpsmtpd plugin.


#!/usr/bin/perl -Tw

use strict;
use warnings;
use lib "/mf/lib/";

use MF::Domain::Virus;
use Qpsmtpd::Constants;

...

Similarly our CGI applications could load the same modules :

Example B-2. Loading our modules from a CGI application.


#!/usr/bin/perl -I/mf/lib

use strict;
use warnings;
use lib "/mf/lib/";

use CGI::Carp qw(fatalsToBrowser);
use DBI;
use HTML::Template;
use MIME::Lite;
use Net::DNS;
use MailScanning::Base;
use MF::Archive;
use MF::Delivery;
use MF::User;
use Singleton::Config;

...

Having all of our code located beneath a single directory tree also made it significantly easier to deploy from its master location. The full project was deployed on each host, regardles of that hosts type. So even though Apache was never running upon the satellite MX machines each one still contained a full copy of the CGI applications, associated HTML::Template, and administrative commands.

In short there were dependency issues between both the modules, the plugins, and the CGI applications we created and the simplest solution was to place all our library modules beneath a common and known prefix.

The actual layout looked like this:

Example B-3. The layout of our installed codebase.


/mf/
|-- admin                 [Administrative-only command line scripts.]
|-- bin                   [Common command line scripts.]
|-- cgi-bin               [Where our CGI instance scripts were stored.]
|-- conf                  [Configuration files for our CGI applications.]
|-- etc                   [Configuration files for exim & qpsmtpd.]
|-- htdocs                [Web root for our CGI applications.]
|   |-- css
|   |-- images
|   `-- jquery
|-- lib                   [Common prefix for our libraries.]
|   |-- Base
|   |-- MF
|   |   |-- Archive
|   |   |-- Domain
|   |   `-- User
|   |-- MailScanning      [Library prefix for the CGI applications.]
|   `-- Singleton         [Library prefix for singleton objects.]
|-- logs                  [HTTP access logs.]
|-- plugins               [Qpsmtpd plugins.]
|-- templates             [Templates for use by the CGI applications.]
|   |-- admin             [Admin-Only templates.]
|   |-- emails            [Email templates.]
|   |-- invoices          [Invoice templates.]
|   |-- include           [Snippets included, e.g. common headers.]
|   |-- pages             [Templates for different "pages".]
|   `-- quarantine        [Templates for the quarantine.]
`-- tests                 [Test cases for our libraries and plugins.]
 

This document hasn't really delved into the implemention of the control panel CGI application, because this was largely a matter of interfacing our abstraction libraries (which dealt with domain lists, and domain-specific tests) and the user. Allowing the user to view their domains, and view/modify the tests applied to each one via an attractive online control panel was and is pretty standard CGI coding.

Although it wasn't obvious from our URL scheme we split the implementation of the control panel application into three parts, each of which was a distinct application written using the CGI::Application framework:

The name mf was a historical artifact from before our service was launched commercially. At that time it was just used for personal friends as a means of consolidating the SPAM filtering setup I'd managed for too many people individually. The site/service was hosted at mf.steve.org.uk then, and "mf" was a simple abbreviation for "mail filtering".

It should be clear at this point that when this document has talked about testing for files beneath /srv/hosted.org/checks neither of the plugin code nor the CGI applications ever actually ran a stat() directly. Instead all code worked indirectly via the use of an MF::Domain abstraction object, or a subclass such as MF::Domain::DNSBL.

(In mail-scanning-lite we do stat() directly. But thats just for clarity.)

B.1. Significant directories on the master host

The following directories were of particular importance to the operation of the service on the master machine:

/srv

This directory contained the master copy of all configuration settings for each hosted domain.

/home/spam

This directory contained the global Bayesian spam databases for each hosted domain. Any changes that were made due to training had to be made upon the master machine, and these would be synced to the slave MX machines every few hours.

/reject

This directory contained the actual email messages which were rejected by the satellite hosts.

In order to allow easy expiration of messages we organised the contents by date. A sample directory tree would look like this:

Example B-4. Sample layout.


/reject/
|-- 1-8-2009
|   |-- hosted.org
|   `-- user.org
|
|-- 2-8-2009
|   |-- hosted.org
|   `-- user.org
..
...
 

B.1.1. Significant directories on the satellites

In addition to having replicated /srv and /home/spam directories from the master host the secondaries also had some directories which were important:

/home/good

This directory contained copies of each message which was judged to be non-SPAM and which were forwarded on to end users.

/home/rejected

This directory contained copies of each message which was judged to be SPAM and which were rejected.

These contents of these two directories were pulled by the master machine on a regular basis, such that the messages could be counted and the SPAM ones imported into the quarantine. The way that these directories were combined is discussed in Chapter 6.