[This site will look better in browsers that support web standards, but should be legible in all.]

Saturday, 08/17/02

Wesley Felter points to this immensely clever Paul Graham essay on Bayesian approaches to spam management:

But it doesn't mean much to be able to filter out most present-day spam, because spam evolves. Indeed, most antispam techniques so far have been like pesticides that do nothing more than create a new, resistant strain of bugs.

I'm more hopeful about Bayesian filters, because they evolve with the spam. So as spammers start using "c0ck" instead of "cock" to evade simple-minded spam filters based on individual words, Bayesian filters automatically notice. Indeed, "c0ck" is far more damning evidence than "cock", and Bayesian filters know precisely how much more.

That'd fit nicely into Mail::Audit. The remarkable thing, which Graham doesn't address, is that this approach should work just as well for all email filtering. You start to file mail from your Aunt Judy into a particular mailbox, the filters notice. You sign up to a new mailing list and put those messages in a particular place, the filters notice. The filters don't know the significance of the Aunt Judy corpus and the spam corpus, but they know the difference, and they take different action.

If this works for spam filtering, it's going to put ever other email filtering strategy right out of business. 08:55PM «


Bits pushed by Movable Type