D-Spam: 50% false positives...

Smirk · Dec 30, 2006

Just wanted to let you know that after setting up my email accounts according to the instructions given for dspam setup roughly 50% of my email is getting flagged as spam.

These are typically newsletter subscriptions and subscribed merchant advertisements (I have set up many *@domain.com in the spam white-list but they are still getting trapped)

Just thought you should know.

Mark

Yash · Dec 30, 2006

that is typically an issue with most spam filters.. its hard to distinguish between newsletters and advertisements from regular spam..

If you whitelist those email addresses in hsphere, they shouldn't get trapped. If they are, please open a ticket

Stephen · Dec 30, 2006

You can also let us know how they need to be trained. This is a beta period because we are having to tune the mail spam and ham.

However newsletters are a real pain, and we had major problems at postini and mailwise with newsletters, I'd say only 2-3% made it through others stayed quarantied. Newsletters can have a lost of "spam like" content even not being spam, with paid ads and the like.

SubSpace · Dec 30, 2006

Newsletters often get marked as spam not only because of the content, but also due to the way that some RBLs work.

People that receive legit newsletters they subscribed to get tired of receiving them, and start clicking the "spam" button in their e-mail webinterfaces instead of unsubscribing (which should be possible for any legit newsletter). As a result, the servers responsible for sending the newsletters and/or the URLs mentioned in the newsletters get listed on RBLs.

If you really want to receive a newsletter, whitelisting is the safest way to go. And for the love of God, don't report legit newsletters as spam, but use the unsubscribe link provided. Contrary to the unsubscribe links in spam, they actually work

Of course, the latter won't really help you unless the rest of the world sees the light as well.. yeah right 8)

tanmaya · Jan 2, 2007

Smirk said:
These are typically newsletter subscriptions and subscribed merchant advertisements (I have set up many *@domain.com in the spam white-list but they are still getting trapped)

Have a look at "reply-to" header of these messages and see if they match with your whitelist.

CyberSpy · Jan 6, 2007

Stephen said:
You can also let us know how they need to be trained. This is a beta period because we are having to tune the mail spam and ham.

However newsletters are a real pain, and we had major problems at postini and mailwise with newsletters, I'd say only 2-3% made it through others stayed quarantied. Newsletters can have a lost of "spam like" content even not being spam, with paid ads and the like.

Stephen,

Trying to spot spam using long lists of rules is very prone to this problem, unless the rule set is trained for the specific person. For example, I use, very successfully, a solution called PopFile (POPFile: POPFileDocumentationProject) which is a Perl based Pop3 Proxy server, with a Baysian filter. I've trained it, and it's quite accurate.

This doesn't work for ISP's though, as everyone has a different idea of what spam is - newsletters being a good example.

One really good solution to this though is BrightMail (recently taken over by Symantec - see Symantec Brightmail AntiSpam: Overview - Symantec Corp.) It works by setting up 'honeypot' email addresses, which attract REAL spam, but as they are not used for any real email, no genuine mailing lists send them mail. The system then uses the signatures of the spam it recieves to spot spam in the mail boxes it scans.

It does have a few false negatives (some spam sneaks through, but very little - certainly far less than your current solution!!) but more importantly, it does claim to only mark 1 genuine email in a million as spam - 99.9999% accurate.

I'd recommend you have a good look at this system, as it really is effective.

Regards

Adam

Stephen · Jan 6, 2007

Adam,

Check at what dspam is, it just needs to be trained what is spam and ham to be correct, it uses much more advanced filtering than popfile even

The simple fact is what is spam to one may be ham to another.

D-Spam: 50% false positives...

Smirk

Guppy

Yash

Bass

Stephen

US Operations

SubSpace

Bass

tanmaya

APAC Operations

CyberSpy

Guest

Stephen

US Operations