Recording of all DSPAM False-Positives. (Sticky this?)

antic · Nov 11, 2007

Given that DSPAM is still marking false-positives, and for seemingly ridiculous reasons, perhaps this thread can be used to keep a record of it, for us and for Jodo, so if DSPAM ever needs to be re-trained, it's all here.

If you receive email marked as spam by DSPAM, please post the email header lines here which start with "X-DSPAM".

It would perhaps be useful if this thread was made a sticky for the sake of creating a definitive list.

antic · Nov 11, 2007

A bunch of seemingly random words like "Jackie" and "to+James"..? ?(

Code:

X-DSPAM-Result: Spam
X-DSPAM-Processed: Sun Oct 28 18:00:18 2007
X-DSPAM-Confidence: 0.5000
X-DSPAM-Probability: 0.9900
X-DSPAM-Signature: 47250672305471825385791
X-DSPAM-User: [my email address]
X-DSPAM-Factors: 15,
	X-OriginalArrivalTime*Oct, 0.99000,
	Foundations, 0.99000,
	Bernie, 0.01000,
	X-OriginalArrivalTime*Oct+2007, 0.99000,
	<mark, 0.01000,
	Received*dspam2.m****here.biz+(Postfix), 0.01000,
	Received*by+dspam2.m****here.biz, 0.01000,
	Date*12+0900, 0.99000,
	to+David, 0.01000,
	Jackie, 0.99000,
	to+James, 0.99000,
	to+Steve, 0.01000,
	of+Grant, 0.99000,
	au>, 0.01000,
	mp, 0.99000

antic · Nov 11, 2007

Another bunch of weird triggers. This has "smarttags" in it, from an MS Outlook email.

Code:

X-OriginalArrivalTime: 09 Nov 2007 08:37:45.0145 (UTC) FILETIME=[CCD6F290:01C822AB]
X-DSPAM-Result: Spam
X-DSPAM-Processed: Fri Nov  9 03:37:51 2007
X-DSPAM-Confidence: 0.5000
X-DSPAM-Probability: 0.9900
X-DSPAM-Signature: 47341c5f206238364880209
X-DSPAM-User: *********
X-DSPAM-Factors: 15,
	Narrow", 0.99000,
	Received*9+Nov, 0.01000,
	name="address"/>, 0.99000,
	name="Street"/>+<o, 0.99000,
	family+Arial'>Please, 0.01000,
	smarttags"+name="Street"/>, 0.99000,
	State+w, 0.01000,
	Received*0900, 0.99000,
	Date*9+Nov, 0.01000,
	Date*39+0900, 0.99000,
	size=2+color=black, 0.01000,
	title="mailto, 0.01000,
	name="Street"/>, 0.99000,
	smarttags"+name="address"/>, 0.99000,
	Arial'>Please, 0.01000

antic · Nov 11, 2007

More DSPAM weirdness. Phrases like "walk+and", "myself+Do", "it+deserves", "&nbsp+Well"...

Why does DSPAM pick on little things like this? It's all part of common language.

Code:

X-DSPAM-Result: Spam
X-DSPAM-Processed: Sat Nov  3 04:34:10 2007
X-DSPAM-Confidence: 0.5144
X-DSPAM-Probability: 0.9767
X-DSPAM-Signature: 472c3282227806332716315
X-DSPAM-User: ***********
X-DSPAM-Factors: 15,
	Subject+Re, 0.00427,
	got+here, 0.01000,
	that+It's, 0.01000,
	walk+and, 0.99000,
	26+PM, 0.01000,
	it+deserves, 0.99000,
	up+going, 0.01000,
	Why+did, 0.01000,
	myself+Do, 0.99000,
	&nbsp+Well, 0.99000

antic · Nov 11, 2007

Another false-posititive based on everyday language.

Code:

X-DSPAM-Result: Spam
X-DSPAM-Processed: Sat Nov  3 04:29:02 2007
X-DSPAM-Confidence: 0.5158
X-DSPAM-Probability: 0.9917
X-DSPAM-Signature: 472c314e227801031421134
X-DSPAM-User: *********
X-DSPAM-Factors: 15,
	up+Not, 0.99174,
	you+even, 0.99000,
	or+down, 0.99000,
	hoped+that, 0.99000,
	really+great, 0.01000
	jokes+and, 0.99000,
	even+used, 0.01000,
	never+give, 0.99000,
	me+do, 0.01000,
	just+then, 0.99000,
	us+I, 0.99000,
	at+Yahoo!, 0.01000,

antic · Nov 18, 2007

Here's another false positive:

Code:

X-DSPAM-Result: Spam
X-DSPAM-Processed: Thu Nov 15 00:58:45 2007
X-DSPAM-Confidence: 0.4981
X-DSPAM-Probability: 0.9997
X-DSPAM-Signature: 473be01598211269054297
X-DSPAM-User: ***********
X-DSPAM-Factors: 15,
	Narrow", 0.99000,
	name="address"/>, 0.99000,
	name="Street"/>+<o, 0.99000,
	smarttags"+name="Street"/>, 0.99000,
	State+w, 0.01000,
	Date*42+0900, 0.99000,
	should+really, 0.01000,
	size=2+color=black, 0.01000,
	title="mailto, 0.01000,
	name="Street"/>, 0.99000,
	smarttags"+name="address"/>, 0.99000,
	"+title=", 0.99000,
	au>, 0.01000,
	Received*dspam2.m****here.biz+(Postfix), 0.05315,
	Received*by+dspam2.m****here.biz, 0.05315

tanmaya · Nov 21, 2007

Certainly can't be made a sticky. You are only posting the false positives and the situation will change as dspam is trained further. Proper was would have been to post them in a ticket on weekends.
Please note that in case of such learning systems, it may suit some client more than it suits you. So if you continue to see such FPs repeatedly from some senders, it may be just better to whitelist them.

antic · Nov 21, 2007

Ok, thanks Tanmaya. I assumed these kinds of FP's would be bothering most people as they don't seem to make sense at first glance.

jph · Dec 4, 2007

Here's one that's crazy and typical. About 4.9 of the score is because THE DATE IS CORRECT on the message. (5 of the â€œfactorsâ€ flag the fact that the correct date is found in the message headers).

One is because the message contains the name Kate (horrors!). One because it contains the abbreviation for Massachusetts (MA).

...And 2 of the factors are because the name of the dspam server itself were found in the headers!!

DSPAM NEEDS TO BE FIXED. I get constant complaints about bouncebacks.

X-DSPAM-Result: Spam

X-DSPAM-Processed: Mon Dec 3 13:59:20 2007

X-DSPAM-Confidence: 0.5269

X-DSPAM-Probability: 0.9998

X-DSPAM-Signature: 47545208106471110564442

X-DSPAM-User: xxxxxxxxxxxxxxxxxxxxxx

X-DSPAM-Factors: 15,

Date*3+Dec, 0.99000,

Date*Mon+3, 0.99000,

DomainKey-Signature*h=Received, 0.99000,

Received*Mon+3, 0.99000,

Subject*talk, 0.01000,

down+an, 0.99000,

20+at, 0.01000,

on+December, 0.97607,

Kate, 0.02775,

MA, 0.04681,

at+10, 0.05870,

Received*3+Dec, 0.92508,

Received*03+Dec, 0.89980,

Received*dspam2.m****here.biz, 0.10092,

Received*by+dspam2.m****here.biz, 0.10092

bro · Dec 17, 2007

Dspam certainly seems to be creating a lot more false positives in the last couple of days. I'm getting mail marked spam that has never been before, even mail from our own websites.

This one gave it 3 points for having the right date!
Very poor....

Content analysis details: (7.1 points, 4.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
7.0 DSPAM_SPAM "dspam marked spam"
0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS
...
imeOLE: Produced By Microsoft MimeOLE V6.00.3790.4073
Message-ID:...
X-OriginalArrivalTime: 17 Dec 2007 20:52:10.0298 (UTC) FILETIME=[B16A31A0:01C840EE]
X-DSPAM-Result: Spam
X-DSPAM-Processed: Mon Dec 17 15:52:09 2007
X-DSPAM-Confidence: 0.5281
X-DSPAM-Probability: 0.9998
X-DSPAM-User: ....
X-DSPAM-Factors: 15,
Received*17+Dec, 0.99272, <-correct date
Date*17+Dec, 0.99200, <-- correct date again
Date*Mon+17, 0.99000, <-- correct date again!
request+will, 0.99000, <-- part of the english language
Received*jillc, 0.01000,
Cc*juno.com>, 0.99000, <-- wow, copied. must be spam.
Received*dspam2.m****here.biz, 0.10092, <- sent by our own mail server to a regular email address in the normal way
Received*by+dspam2.m****here.biz, 0.10092,
Received*dspam2.m****here.biz+(Postfix), 0.10092,
Date*15+52, 0.11089, <-what?
Rental, 0.87946, <-- yes, it's a rental site
invoice, 0.12869, <-- how unusual
Received*mail, 0.13593, <- um
Initial, 0.13820,
$27, 0.85493 <-- and the invoice mentioned money....
X-DSPAM-Result: Innocent <---huh?
X-DSPAM-Processed: Mon Dec 17 15:52:10 2007
X-DSPAM-Confidence: 0.5603
X-DSPAM-Probability: 0.0000
X-DSPAM-Factors: 15,
Received*17+Dec, 0.99272,
Date*17+Dec, 0.99200,
Received-SPF*client+ip=204.14.107.93, 0.01000,
Date*Mon+17, 0.99000,
Received*(mail.m****here.biz+[204.14.107.1]), 0.01000,
Received*mail.m****here.biz+(mail.m****here.biz, 0.01000,
Received*(dspam2.m****here.biz, 0.01000,
X-Virus-Scan*ClamAV, 0.01000,
Received*localhost+(dspam2.m****here.biz, 0.01000,
Received*(dspam2.m****here.biz+[127.0.0.1]), 0.01000,
X-Virus-Scan*by+ClamAV, 0.01000,
request+will, 0.99000,
Received*(204.14.107.93), 0.01000,
Received*(mail.m****here.biz, 0.01000,
Received*jillc, 0.01000

snooper · Dec 17, 2007

i too am getting LOADS more F's in the last few days.
its quite annoying, i must say

bro · Dec 18, 2007

"size=2>+<font, 0.99000,
&nbsp+by, 0.99000,
face=Verdana><font, 0.99000,"

3 points for using size 2 verdana and a space ?

This is seriously broken. If it classes emails as spam because they use the most common font and size, then it's useless.

cdog · Dec 18, 2007

At a customers request I've had to turn off the spam filter for all of their mailboxes due to these kind of false positives.

Will the dspam still catch their legitimate emails anyway?

nzkiwi · Dec 18, 2007

cdog said:
Will the dspam still catch their legitimate emails anyway?

As I understand it, the actual filtering is done with Spam Assassin, using the dspam tags as part of its evaluation. So if Spam filtering is disabled, all mail will pass through. However, I've been wrong before.....

nzkiwi · Dec 18, 2007

I've just noticed that my mail appears to have no dspam tags. here's a typical spam header:

Code:

Return-Path: <[email protected]>
Delivered-To: x
Received: (qmail 27044 invoked by uid 399); 18 Dec 2007 13:14:19 -0000
Received: from unknown (HELO ?151.60.21.131?) (151.60.21.131)
  by gw-mail4.m****here.biz with ESMTP; 18 Dec 2007 13:14:19 -0000
Received-SPF: none (gw-mail4.m****here.biz: domain at jmm74.mail.yale.edu does not designate permitted sender hosts)
	identity=mailfrom; client-ip=151.60.21.131;
	envelope-from=<[email protected]>;
Received: from fmpckb7bzon0rt9 ([115.180.2.61] helo=fmpckb7bzon0rt9)
	by [151.60.21.131] ( sendmail 8.13.3/8.13.1) with esmtpa id 1wUzlj-000GCT-Fm
	for x; Tue, 18 Dec 2007 14:14:22 +0100
Message-ID: <[email protected]>
Date: Tue, 18 Dec 2007 14:14:04 +0100
From: "Liora pinocci" <[email protected]>
User-Agent: Thunderbird 2.0.0.0 (Windows/20070326)
MIME-Version: 1.0
To: x
Subject: kassman
Content-Type: multipart/mixed;
 boundary="------------ComodoEmailScanner060606"
Content-Transfer-Encoding: 7bit

The last message with dspam tags was back on 19-Nov-2007. Is dspam not working with mail4?

tanmaya · Dec 19, 2007

cdog said:
Will the dspam still catch their legitimate emails anyway?

You are part of our new cluster where we are still to launch dspam.

nzkiwi said:
The last message with dspam tags was back on 19-Nov-2007. Is dspam not working with mail4?

You are right. We still haven't turned it on for mail4 since the dspam1 incident.
dspam1 is already rebuild and should be soon serving mail4.

Others, we do are aware that dspam hasn't been doing well in past few days.
Reason is that the spam corpus we fed to dspam had mails from spammers with date in future(includes dates of Dec). Other than that, please don't look at individual words or patterns. It is a whole combination of patterns, plus overall score that these patterns form combined for dspam to call a mail spam.
If you wish to know more about it, please read here:
http://en.wikipedia.org/wiki/Bayesian_filtering
http://www.paulgraham.com/spam.html

Lastly, while we train dspam only on weekends normally, we have re-enabled dspam training data collection. Please feel free to submit any false positives received in last 48 hours via a support ticket.
Please create only a single support ticket and attach all false positives (incorrectly marked emails) in this ticket. We will not consider HTML newsletters.

Recording of all DSPAM False-Positives. (Sticky this?)

antic

Perch

antic

Perch

antic

Perch

antic

Perch

antic

Perch

antic

Perch

tanmaya

APAC Operations

antic

Perch

jph

Perch

bro

Perch

snooper

Perch

bro

Perch

cdog

Perch

nzkiwi

Perch

nzkiwi

Perch

tanmaya

APAC Operations