A Bayesian Filter Killer

For the non-geeks in the audience this morning, Bayesian Filtering is what keeps your inbox as spam-free as it is. A relatively simple non-tech defintion of Bayesian Filtering* is here:

Particular words have particular probabilities of occurring in spam email and in legitimate email. For instance, most email users will frequently encounter the word [common drug] in spam email, but will seldom see it in other email. The filter doesn't know these probabilities in advance, and must first be trained so it can build them up. To train the filter, the user must manually indicate whether a new email is spam or not. For all words in each training email, the filter will adjust the probabilities that each word will appear in spam or legitimate email in its database. For instance, Bayesian spam filters will typically have learned a very high spam probability for the words “[common drug]” and “[financial transaction]”, but a very low spam probability for words seen only in legitimate email, such as the names of friends and family members.

Alright, everyone got it? Basically there are words that generally show up in your email and words that almost never. The filter learns which are which and then treats the email accordingly.

But what happens if someone can poison the filter?

I don't mean the normal process of generating whole lists of random words. I mean injecting a bit of intelligence into the system and choosing from a dictionary of words that are known (or strongly believed) to be good. I know this is probably a few magnitudes of additional effort, but stop and think about it for a second.

Now realize that it may be out there.

No, I'm not kidding. At the time of this writing (Sunday evening), I'm sitting here looking at an email which is obviously spam but made it past all my filters beautifully. It's split into two columns. The right column is all about how to “enhance” myself… but the left colum… the left column talks about Design Patterns, “reinventing the wheel”, project management, and a variety of other terms and phrases that show up on this site and/or Codesnipers. For years, I've had a number of email keywords that trigger other actions (normally sorting or forwarding, sometimes other things), but this could completely blow that out of the water.

I hope that this is just a bizarre once-in-a-lifetime coincidence that I never see again. The level of effort involved in making a per-target dictionary has to be enormous, but it seems that most bloggers could be likely targets for this one.

Thoughts?

* Interestingly enough, I didn't realize that Paul Graham was one of the people that brought it to the public's attention. I guess you learn something new every day.

Related Posts