Mail Filtering on Kluge.Net


Kluge.net's mail server has various ways of filtering mail that may be of use to you, even if you're not a user on kluge.net. The "(non-)optional" tags are for kluge.net users, some things are handled by the mail server, others are handled by each user's delivery configuration (procmail).

I've tried to put the filters that have a low occurance of false positives at the SMTP level. This will allow the mail server to bounce incoming spam as early as possible. Other filters are available for mail which makes it through the first-level filtering.


Sendmail-based filtering

On kluge.net, these filters are mandatory since they occur on the mail server itself.

Sendmail's accessdb (This is no longer in use)
Accessdb allows sendmail to reject mail based on domain (domain.com), IP network (10.*.*.*), email address (user@domain.com), or the user portion of an email address (user@).

The kluge.net accessdb is available for perusal: accessdb source file (Note: The file size is 11745.73 KB, Last updated today at 07:16pm.)

We reject mail for various reasons (spam, spam relay, spam relay w/out a reverse DNS lookup for the server, etc.) If you are on this list and feel that you shouldn't be, please send mail to the postmaster explaining why you feel this way.


Procmail-based filtering

All of these filters are optional and depend on each user account setting up procmail. Doing this is simple on kluge.net, just create a ".procmailrc" file in your home directory. To find out the format of the file and to see examples, please read the man pages for "procmail", "procmailrc", and "procmailex". There is also useful information on the procmail website.

Note: For the filtering rules below, you probably don't want to send the messages to /dev/null (ie: the bit bucket, erasing the message, etc.) automatically. No matter how good the filter system is, it will likely get false positives and you will lose non-spam messages. It is typically recommended to have the suspected spam saved off into a mailbox that gets check periodically. The false positives can then easily be found and handled appropriately.

Use SpamAssassin to match likely spam (Optional, but HIGHLY RECOMMENDED)

Spamassassin is a powerful filtering tool that will, by default, run hundreds of checks against a message and assign a weighted score for each check that matches. At the end of the checks, if the calculated score is higher than a configured value, the message is considered to be spam and can be handled as you wish.

Configuring procmail to use Spamassassin is fairly simple:

  1. In your home directory, create a file called ".procmailrc"
  2. Edit that file and insert the following section:
    # Check this message via spamassassin
    :0fW
    | /usr/bin/spamc -f
    
    # Return the exit code if we error out
    EXITCODE=$?
    
  3. Run "razor-admin -create", which will configure your account for using Razor. You will also want to go edit ~/.razor/razor-agents.conf after that previous command has finished and change the debug level to 0. That way razor won't try to log everything it does and suck up space in your home directory. (You can optionally symlink ~/.razor/razor-agents.log to /dev/null, but it's up to you.)

Any incoming messages that are likely to be spam will still be delivered to your INBOX, but will be marked up in such a way (the subject will be changed among other things,) that you will be able to spot it easily. You can be more creative with the procmail rules (have the spam filter into a different folder, etc,) as well, but that's beyond the scope of this webpage. Please check the man page for procmailrc via "man procmailrc". You can also send mail to felicity for help.

The default for Spamassassin is to add several Headers (X-Spam.*), and a section at the top of the message body to indicate why it the message was considered spam. The message subject will also be modified. If you want to change this or other behaviour, you can create a configuration file for yourself. To do this, edit the ~/.spamassassin/user_prefs file (it should be created for you after setting up the spamassassin procmail rules, if not, run "spamassassin -h" which should generate the apppropriate files for you.) The main source of information is the Mail::SpamAssassin::Conf man page, which you can access via "man Mail::SpamAssassin::Conf". There are also man pages for spamassassin, and spamc to give you more help. You can also find information on the Spamassassin website.

BTW: On kluge.net, one of the default checks is to use Vipul's Razor which checks a distributed database to see whether or not the scanned message was reported as spam by other people. If you receive spam on kluge.net, please bounce (NOT forward) the message to spam@kluge.net, which will take the message and insert it into the Razor system.

BTW2: Spamassassin on kluge.net will also, by default, check the servers which have sent the message via various DNS blacklists (IP "databases") and flagging mail if a server is listed. We utilize several different blacklists to help flag the large amount of spam going through our server:

If you don't want these to be checked for your mails, you can shut it off by adding "skip_rbl_checks 1" to your spamassassin user_prefs file.

BTW3: If you are having mails flagged as spam that aren't (typically newsletters and non-spam bulk email), you can easily add entries to your user_prefs file ($HOME/.spamassassin/user_prefs) to whitelist those addresses. You can either look in the Mail::SpamAssassin::Conf man page or in the default user_prefs file.

Bounce mail that has a high pecentage of 8-bit characters (Optional)
If you don't know how to read chinese/korean/another "uses 8-bit character" language, you may be interested in the procmail rule available from Walter Dnes. By default, it checks the contents of each mail message, and if >5% of the characters are 8-bit, it'll trigger the recipe and you can send the message elsewhere.

Bounce emails with fake IPs in the "Received:" headers (Optional)
Some spammers try to confuse people about where the mail has actually been sent through. The "Received:" headers will tell you every mail system that the mail has been forwarded through, and so spammers add spurious and misleading "Received:" headers. This rule will bounce messages with obviously fake lines using procmail:

# Received lines from IPs that aren't valid (255+.x.x.x) are spam.
:0
* ^Received:.*\[(25[6-9]|2[6-9][0-9]|[3-9][0-9][0-9])\.
spam-work

Bounce emails with all numeral user ids (Optional)
Most sites don't have users that are all integers ([0-9]+, like 0950967@yahoo.com). You can use the following procmail rule to bounce those messages. Note: The Spamassassin filter will do this as well by default.

# \d+@domain.com is invalid (or not usual) at most domains...
:0
* ^From [0-9][0-9]*@
spam-work

Removing the HTML portion of an email w/ HTML alternate copy (Optional)
The standard for Internet email is ASCII text. However, a lot of mail readers now have a "feature" which can send out your mail in both text and HTML, or just HTML. While some people consider this to be cute, others (myself included) are annoyed by this behavior since we have to parse through MIME headers and random crap in the message before we get to the actual text of the message. (not to mention that the HTML typically more than doubles the size of the message.)

I found the strip_html script by Randal Schwartz (SysAdmin Mag. Vol 10, Num 4, P41) which procmail can use as a filter to remove the HTML alternate section of these mails and leave only the text/plain section. To use this script, add the following to your ".procmailrc" file:

# If a message is MIME and under 100k, remove the HTML version if neccesary.
:0 fw
* ^Content-type:.*boundary
* <102400
|/usr/local/bin/strip_html.pl
Mail that is filtered with this script will have the following text at the bottom of the message so you know it's been filtered: "[[HTML alternate version deleted]]".

If you want to use this script on another box, make sure you have the prerequisites installed:

Perl
IO::Stringy
MIME::Tools
MIME::Base64
MIME::QuotedPrint	# Part of MIME::Base64
Mail::Header    	# MailTools
Mail::Internet
Mail::Field
libnet          	# MailTools requires libnet (Net::SMTP, etc.)
File::Spec		# Standard w/ Perl 5?
File::Path

Bounce emails that are HTML-only (Optional)
Much like the script/procmail rule above, I tend to bounce all HTML-only mails since they're probably spam (or from someone that I probably don't care to hear from):
# Any HTML-only mail gets bounced...
:0
* ^Content-type:[ ]*text/html
spam-work

Another option is to convert the HTML-only message into text using a text-based browser (I use lynx by default). To do this, setup the following procmail rule which uses the strip_html2txt script that was written by Theo Van Dinter:

# If a message is HTML-only and under 100k, convert to text.
:0 fw
* ^Content-type:[ ]*text/html
* <102400
|/usr/local/bin/strip_html2txt.pl
Mail that is filtered with this script will have the following text at the top of the message so you know it's been converted: "[[HTML-only version converted]]".

See the above instructions for strip_html.pl if you want to run this script on another machine.


Links to useful filtering projects


[We use the MAPS network!] [We use SpamAssassin!] [We use Spamcop!]

By: Theo Van Dinter, © 2001-2009.