[REBOL] Re: fighting spam paper & links / naive bayes / anybody ?
From: gchiu:compkarori at: 15-Sep-2002 23:19
On Mon, 26 Aug 2002 11:30:30 +1000
"Brett Handley" <[brett--codeconscious--com]> wrote:
>I've uploaded my prototype script on to my site at the
>address below, be
>warned it is not thoroughly tested and I'm certainly not
>letting it be final
>arbiter of my email just yet:
>
> http://www.codeconscious.com/rebol/mlscripts/spam-filter.r
>
I've taken Brett's code from the IOS server ( I'm not sure
it's the same as the one above ), and created a "web
service" out of it just so that you can see what it does.
http://207.8.27.211/spam/index.html
Just paste into the box a complete email with all the
headers, and "test" it to see if it is considered spam or
not.
The database I'm using is from 2597 good email, and 876
spam.
At the moment it does not update itself ie. does not
learn, as I have to consider the issue of file locking
etc. What I would like to do, is to tokenise the email
locally, and just send the tokens to the web service (
perhaps SOAP or a Rugby service ). Trouble is I don't
know whether what I consider spam is what others consider
spam.
I would be interested to see what results people get.
--
Graham Chiu