I've tried to put the filters that have a low occurance of false positives at the SMTP level. This will allow the mail server to bounce incoming spam as early as possible. Other filters are available for mail which makes it through the first-level filtering.
The kluge.net accessdb is available for perusal: accessdb source file (Note: The file size is 11745.73 KB, Last updated today at 07:16pm.)
We reject mail for various reasons (spam, spam relay, spam relay w/out a reverse DNS lookup for the server, etc.) If you are on this list and feel that you shouldn't be, please send mail to the postmaster explaining why you feel this way.
All of these filters are optional and depend on each user account setting up procmail. Doing this is simple on kluge.net, just create a ".procmailrc" file in your home directory. To find out the format of the file and to see examples, please read the man pages for "procmail", "procmailrc", and "procmailex". There is also useful information on the procmail website.
Note: For the filtering rules below, you probably don't want to send the messages to /dev/null (ie: the bit bucket, erasing the message, etc.) automatically. No matter how good the filter system is, it will likely get false positives and you will lose non-spam messages. It is typically recommended to have the suspected spam saved off into a mailbox that gets check periodically. The false positives can then easily be found and handled appropriately.
Spamassassin is a powerful filtering tool that will, by default, run hundreds of checks against a message and assign a weighted score for each check that matches. At the end of the checks, if the calculated score is higher than a configured value, the message is considered to be spam and can be handled as you wish.
Configuring procmail to use Spamassassin is fairly simple:
# Check this message via spamassassin :0fW | /usr/bin/spamc -f # Return the exit code if we error out EXITCODE=$?
Any incoming messages that are likely to be spam will still be delivered to your INBOX, but will be marked up in such a way (the subject will be changed among other things,) that you will be able to spot it easily. You can be more creative with the procmail rules (have the spam filter into a different folder, etc,) as well, but that's beyond the scope of this webpage. Please check the man page for procmailrc via "man procmailrc". You can also send mail to felicity for help.
The default for Spamassassin is to add several Headers (X-Spam.*), and a section at the top of the message body to indicate why it the message was considered spam. The message subject will also be modified. If you want to change this or other behaviour, you can create a configuration file for yourself. To do this, edit the ~/.spamassassin/user_prefs file (it should be created for you after setting up the spamassassin procmail rules, if not, run "spamassassin -h" which should generate the apppropriate files for you.) The main source of information is the Mail::SpamAssassin::Conf man page, which you can access via "man Mail::SpamAssassin::Conf". There are also man pages for spamassassin, and spamc to give you more help. You can also find information on the Spamassassin website.
BTW: On kluge.net, one of the default checks is to use Vipul's Razor which checks a distributed database to see whether or not the scanned message was reported as spam by other people. If you receive spam on kluge.net, please bounce (NOT forward) the message to spam@kluge.net, which will take the message and insert it into the Razor system.
BTW2: Spamassassin on kluge.net will also, by default, check the servers which have sent the message via various DNS blacklists (IP "databases") and flagging mail if a server is listed. We utilize several different blacklists to help flag the large amount of spam going through our server:
BTW3: If you are having mails flagged as spam that aren't (typically newsletters and non-spam bulk email), you can easily add entries to your user_prefs file ($HOME/.spamassassin/user_prefs) to whitelist those addresses. You can either look in the Mail::SpamAssassin::Conf man page or in the default user_prefs file.
# Received lines from IPs that aren't valid (255+.x.x.x) are spam. :0 * ^Received:.*\[(25[6-9]|2[6-9][0-9]|[3-9][0-9][0-9])\. spam-work
# \d+@domain.com is invalid (or not usual) at most domains... :0 * ^From [0-9][0-9]*@ spam-work
I found the strip_html script by Randal Schwartz (SysAdmin Mag. Vol 10, Num 4, P41) which procmail can use as a filter to remove the HTML alternate section of these mails and leave only the text/plain section. To use this script, add the following to your ".procmailrc" file:
# If a message is MIME and under 100k, remove the HTML version if neccesary. :0 fw * ^Content-type:.*boundary * <102400 |/usr/local/bin/strip_html.plMail that is filtered with this script will have the following text at the bottom of the message so you know it's been filtered: "[[HTML alternate version deleted]]".
If you want to use this script on another box, make sure you have the prerequisites installed:
Perl IO::Stringy MIME::Tools MIME::Base64 MIME::QuotedPrint # Part of MIME::Base64 Mail::Header # MailTools Mail::Internet Mail::Field libnet # MailTools requires libnet (Net::SMTP, etc.) File::Spec # Standard w/ Perl 5? File::Path
# Any HTML-only mail gets bounced... :0 * ^Content-type:[ ]*text/html spam-work
Another option is to convert the HTML-only message into text using a text-based browser (I use lynx by default). To do this, setup the following procmail rule which uses the strip_html2txt script that was written by Theo Van Dinter:
# If a message is HTML-only and under 100k, convert to text. :0 fw * ^Content-type:[ ]*text/html * <102400 |/usr/local/bin/strip_html2txt.plMail that is filtered with this script will have the following text at the top of the message so you know it's been converted: "[[HTML-only version converted]]".
See the above instructions for strip_html.pl if you want to run this script on another machine.