SPAM Filtering

I get a lot of SPAM. I’ve had the same e-mail address for 10 years, and I don’t hide it.

In general, I’m very happy with a combination of spamassassin running on the server, and OS X Mail.app’s SPAM filtering on the client. In order to avoid losing false positives I have a Junk folder (I use IMAP). Spamassassin re-writes the subject lines of the e-mails to be prefixed with “[fusion_builder_container hundred_percent=”yes” overflow=”visible”][fusion_builder_row][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”][SPAM]”. Mail.app sorts those messages into the Junk folder and marks them as read. Just like it does with the messages it determines are SPAM.

The problem with this, is that until Mail.app checks my inbox, all that SPAM is sitting there, in my inbox. This shows up on my iPhone, and webmail. Lately, I’ve been working from coffeeshops, outside, the kitchen, etc… with the net result being that my laptop is spending more and more time sleeping (hence: not running Mail.app). So my iPhone alerts me that I have 20 new e-mail, but they’re all SPAM.

So I decided to see if I could get spamassassin to not just mark SPAM, but also file it away in the Junk folder. While spamassassin can’t do this, procmail can.

I added this to my user’s .~/procmailrc file:

# Mark spam as read
:0
* ^X-Spam-Status: Yes
{
	:0 fhw
	| formail -I"Status: RO"

	:0:
	mail/Junk
}

after my existing spamassassin invocation:

# Run everything through spamassassin
:0fw
| /usr/bin/spamassassin

What that is, is a conditional rule, based on the Spam-Status header being set to Yes (which is set by spamassassin). It then executes two actions. The first uses formail to mark the e-mail as read. The second moves the mail into the Junk folder (I use mbox – if you use maildir you need to change this action to a slightly more complex one which you can Google for).

This works nicely. Now the SPAM found by spamassassin is marked as read, and moved into my Junk folder on the server, instead of waiting for Mail.app to do that.

However, once I got this working, the number of e-mails which slip by spamassassin to be caught by Mail.app, began to bother me. With the old system, it really didn’t matter who caught the SPAM, as long as it was caught. With the new system, any SPAM not caught by spamassassin, ended up polluting my inbox.

I discovered a couple of things. First, I installed razor and pyzor to help with scoring. I also increased the spamassassin scores of some ED drug rules in my spamassassin user_prefs:

score DRUG_ED_CAPS 15.00
score DRUGS_ERECTILE 10.00
score DRUG_ED_COMBO 10.00
score VIA_GAP_GRA 10.00
score NO_PRESCRIPTION 10.00

This helped, but by testing on individual items of spam which were being missed by spamassassin (culled from my Junk box, without the [SPAM} subject addition i.e. those caught by Mail.app), using the following test command:

spamassassin -t -D < /tmp/spam

Where /tmp/spam is a file containing the raw message text from a single spam e-mail.

I discovered that the auto-whitelist (a misnomer, it's actually an automatic scoring system designed to allow past history to average out any score spikes from the same sender), was pushing the SPAM score DOWN on many of these e-mails. Often past the spamassassin threshold, so they were mistakenly considered HAM instead of SPAM.

While the AWL can do some odd things, at least on my box it's clearly broken. Testing with a new SPAM mail, where the first run had zero input from the AWL rules, and the SPAM ended up with a SPAM score of 20 (which is definitely SPAM), I found that immediate subsequent runs against the SAME mail, had the AWL contributing a -6.9 score, against the positive 20 SPAM score. Clearly, that's wrong. Why it was doing that, I dont''know, so I just turned it off.

Again, in my spamassassin, user_prefs:

use_auto_whitelist 0

All is well. So far 100% of SPAM has been caught by spamassassin, on the server, tagged, marked as read, and moved into the Junk folder. With no false positives or false negatives.

So I'm happy.[/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]

Comments

Leave a Reply Cancel reply