Mailing List Archive: Re: fighting spam paper & links / naive bayes / anybody ?

[REBOL] Re: fighting spam paper & links / naive bayes / anybody ?

From: brett:codeconscious at: 26-Aug-2002 11:30


> Someone posted that on an IOS server and I think Brett Handley has taken
an
> initial crack at it. Maybe he'll jump in here.

*jump* *stumble* *ahem*

Yes I had a go. I naively tried to translate the LISP code (without knowing
LISP) from the Paul Graham article into some REBOL code and just put it
together into something that ran. I did not give a lot of thought to make a
nicely structured and fast solution - I just wanted to understand what was
going on.  I ran it on a small set of spam and good emails - and it worked
beautifully until I realised that my logic was different to Paul's. :^)
Then I fixed it and it didn't work so good :^(

Paul Graham quoted 4000 messages, I only worked with a couple of hundred
good emails and 14 bad (all I've kept) so with such a low sample size it is
likely that my tests of the filter will be suspect.

It would be nice if a LISP knowledgeable person could check that my
implementation of the logic reasonably follows Paul's.

I've uploaded my prototype script on to my site at the address below, be
warned it is not thoroughly tested and I'm certainly not letting it be final
arbiter of my email just yet:

    http://www.codeconscious.com/rebol/mlscripts/spam-filter.r

Regards,
Brett.