Recording of all DSPAM False-Positives. (Sticky this?)

antic

Perch
Given that DSPAM is still marking false-positives, and for seemingly ridiculous reasons, perhaps this thread can be used to keep a record of it, for us and for Jodo, so if DSPAM ever needs to be re-trained, it's all here.

If you receive email marked as spam by DSPAM, please post the email header lines here which start with "X-DSPAM".

It would perhaps be useful if this thread was made a sticky for the sake of creating a definitive list.
 
A bunch of seemingly random words like "Jackie" and "to+James"..? ?(

Code:
X-DSPAM-Result: Spam
X-DSPAM-Processed: Sun Oct 28 18:00:18 2007
X-DSPAM-Confidence: 0.5000
X-DSPAM-Probability: 0.9900
X-DSPAM-Signature: 47250672305471825385791
X-DSPAM-User: [my email address]
X-DSPAM-Factors: 15,
	X-OriginalArrivalTime*Oct, 0.99000,
	Foundations, 0.99000,
	Bernie, 0.01000,
	X-OriginalArrivalTime*Oct+2007, 0.99000,
	<mark, 0.01000,
	Received*dspam2.m****here.biz+(Postfix), 0.01000,
	Received*by+dspam2.m****here.biz, 0.01000,
	Date*12+0900, 0.99000,
	to+David, 0.01000,
	Jackie, 0.99000,
	to+James, 0.99000,
	to+Steve, 0.01000,
	of+Grant, 0.99000,
	au>, 0.01000,
	mp, 0.99000
 
Another bunch of weird triggers. This has "smarttags" in it, from an MS Outlook email.

Code:
X-OriginalArrivalTime: 09 Nov 2007 08:37:45.0145 (UTC) FILETIME=[CCD6F290:01C822AB]
X-DSPAM-Result: Spam
X-DSPAM-Processed: Fri Nov  9 03:37:51 2007
X-DSPAM-Confidence: 0.5000
X-DSPAM-Probability: 0.9900
X-DSPAM-Signature: 47341c5f206238364880209
X-DSPAM-User: *********
X-DSPAM-Factors: 15,
	Narrow", 0.99000,
	Received*9+Nov, 0.01000,
	name="address"/>, 0.99000,
	name="Street"/>+<o, 0.99000,
	family+Arial'>Please, 0.01000,
	smarttags"+name="Street"/>, 0.99000,
	State+w, 0.01000,
	Received*0900, 0.99000,
	Date*9+Nov, 0.01000,
	Date*39+0900, 0.99000,
	size=2+color=black, 0.01000,
	title="mailto, 0.01000,
	name="Street"/>, 0.99000,
	smarttags"+name="address"/>, 0.99000,
	Arial'>Please, 0.01000
 
More DSPAM weirdness. Phrases like "walk+and", "myself+Do", "it+deserves", "&nbsp+Well"...

Why does DSPAM pick on little things like this? It's all part of common language.

Code:
X-DSPAM-Result: Spam
X-DSPAM-Processed: Sat Nov  3 04:34:10 2007
X-DSPAM-Confidence: 0.5144
X-DSPAM-Probability: 0.9767
X-DSPAM-Signature: 472c3282227806332716315
X-DSPAM-User: ***********
X-DSPAM-Factors: 15,
	Subject+Re, 0.00427,
	got+here, 0.01000,
	that+It's, 0.01000,
	walk+and, 0.99000,
	26+PM, 0.01000,
	it+deserves, 0.99000,
	up+going, 0.01000,
	Why+did, 0.01000,
	myself+Do, 0.99000,
	&nbsp+Well, 0.99000
 
Another false-posititive based on everyday language.

Code:
X-DSPAM-Result: Spam
X-DSPAM-Processed: Sat Nov  3 04:29:02 2007
X-DSPAM-Confidence: 0.5158
X-DSPAM-Probability: 0.9917
X-DSPAM-Signature: 472c314e227801031421134
X-DSPAM-User: *********
X-DSPAM-Factors: 15,
	up+Not, 0.99174,
	you+even, 0.99000,
	or+down, 0.99000,
	hoped+that, 0.99000,
	really+great, 0.01000
	jokes+and, 0.99000,
	even+used, 0.01000,
	never+give, 0.99000,
	me+do, 0.01000,
	just+then, 0.99000,
	us+I, 0.99000,
	at+Yahoo!, 0.01000,
 
Here's another false positive:

Code:
X-DSPAM-Result: Spam
X-DSPAM-Processed: Thu Nov 15 00:58:45 2007
X-DSPAM-Confidence: 0.4981
X-DSPAM-Probability: 0.9997
X-DSPAM-Signature: 473be01598211269054297
X-DSPAM-User: ***********
X-DSPAM-Factors: 15,
	Narrow", 0.99000,
	name="address"/>, 0.99000,
	name="Street"/>+<o, 0.99000,
	smarttags"+name="Street"/>, 0.99000,
	State+w, 0.01000,
	Date*42+0900, 0.99000,
	should+really, 0.01000,
	size=2+color=black, 0.01000,
	title="mailto, 0.01000,
	name="Street"/>, 0.99000,
	smarttags"+name="address"/>, 0.99000,
	"+title=", 0.99000,
	au>, 0.01000,
	Received*dspam2.m****here.biz+(Postfix), 0.05315,
	Received*by+dspam2.m****here.biz, 0.05315
 
Certainly can't be made a sticky. You are only posting the false positives and the situation will change as dspam is trained further. Proper was would have been to post them in a ticket on weekends.
Please note that in case of such learning systems, it may suit some client more than it suits you. So if you continue to see such FPs repeatedly from some senders, it may be just better to whitelist them.
 
Ok, thanks Tanmaya. I assumed these kinds of FP's would be bothering most people as they don't seem to make sense at first glance.
 
Here's one that's crazy and typical. About 4.9 of the score is because THE DATE IS CORRECT on the message. (5 of the “factors” flag the fact that the correct date is found in the message headers).

One is because the message contains the name Kate (horrors!). One because it contains the abbreviation for Massachusetts (MA).

...And 2 of the factors are because the name of the dspam server itself were found in the headers!!

DSPAM NEEDS TO BE FIXED. I get constant complaints about bouncebacks.


X-DSPAM-Result: Spam

X-DSPAM-Processed: Mon Dec 3 13:59:20 2007

X-DSPAM-Confidence: 0.5269

X-DSPAM-Probability: 0.9998

X-DSPAM-Signature: 47545208106471110564442

X-DSPAM-User: xxxxxxxxxxxxxxxxxxxxxx

X-DSPAM-Factors: 15,

Date*3+Dec, 0.99000,

Date*Mon+3, 0.99000,

DomainKey-Signature*h=Received, 0.99000,

Received*Mon+3, 0.99000,

Subject*talk, 0.01000,

down+an, 0.99000,

20+at, 0.01000,

on+December, 0.97607,

Kate, 0.02775,

MA, 0.04681,

at+10, 0.05870,

Received*3+Dec, 0.92508,

Received*03+Dec, 0.89980,

Received*dspam2.m****here.biz, 0.10092,

Received*by+dspam2.m****here.biz, 0.10092
 
Dspam certainly seems to be creating a lot more false positives in the last couple of days. I'm getting mail marked spam that has never been before, even mail from our own websites.

This one gave it 3 points for having the right date!
Very poor....


Content analysis details: (7.1 points, 4.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
7.0 DSPAM_SPAM "dspam marked spam"
0.1 RDNS_NONE Delivered to trusted network by a host with no rDNS
...
imeOLE: Produced By Microsoft MimeOLE V6.00.3790.4073
Message-ID:...
X-OriginalArrivalTime: 17 Dec 2007 20:52:10.0298 (UTC) FILETIME=[B16A31A0:01C840EE]
X-DSPAM-Result: Spam
X-DSPAM-Processed: Mon Dec 17 15:52:09 2007
X-DSPAM-Confidence: 0.5281
X-DSPAM-Probability: 0.9998
X-DSPAM-User: ....
X-DSPAM-Factors: 15,
Received*17+Dec, 0.99272, <-correct date
Date*17+Dec, 0.99200, <-- correct date again
Date*Mon+17, 0.99000, <-- correct date again!
request+will, 0.99000, <-- part of the english language
Received*jillc, 0.01000,
Cc*juno.com>, 0.99000, <-- wow, copied. must be spam.
Received*dspam2.m****here.biz, 0.10092, <- sent by our own mail server to a regular email address in the normal way
Received*by+dspam2.m****here.biz, 0.10092,
Received*dspam2.m****here.biz+(Postfix), 0.10092,
Date*15+52, 0.11089, <-what?
Rental, 0.87946, <-- yes, it's a rental site
invoice, 0.12869, <-- how unusual
Received*mail, 0.13593, <- um
Initial, 0.13820,
$27, 0.85493 <-- and the invoice mentioned money....
X-DSPAM-Result: Innocent <---huh?
X-DSPAM-Processed: Mon Dec 17 15:52:10 2007
X-DSPAM-Confidence: 0.5603
X-DSPAM-Probability: 0.0000
X-DSPAM-Factors: 15,
Received*17+Dec, 0.99272,
Date*17+Dec, 0.99200,
Received-SPF*client+ip=204.14.107.93, 0.01000,
Date*Mon+17, 0.99000,
Received*(mail.m****here.biz+[204.14.107.1]), 0.01000,
Received*mail.m****here.biz+(mail.m****here.biz, 0.01000,
Received*(dspam2.m****here.biz, 0.01000,
X-Virus-Scan*ClamAV, 0.01000,
Received*localhost+(dspam2.m****here.biz, 0.01000,
Received*(dspam2.m****here.biz+[127.0.0.1]), 0.01000,
X-Virus-Scan*by+ClamAV, 0.01000,
request+will, 0.99000,
Received*(204.14.107.93), 0.01000,
Received*(mail.m****here.biz, 0.01000,
Received*jillc, 0.01000
 
"size=2>+<font, 0.99000,
&nbsp+by, 0.99000,
face=Verdana><font, 0.99000,"

3 points for using size 2 verdana and a space ?

This is seriously broken. If it classes emails as spam because they use the most common font and size, then it's useless.
 
At a customers request I've had to turn off the spam filter for all of their mailboxes due to these kind of false positives.

Will the dspam still catch their legitimate emails anyway?
 
Will the dspam still catch their legitimate emails anyway?
As I understand it, the actual filtering is done with Spam Assassin, using the dspam tags as part of its evaluation. So if Spam filtering is disabled, all mail will pass through. However, I've been wrong before..... :D
 
I've just noticed that my mail appears to have no dspam tags. here's a typical spam header:
Code:
Return-Path: <[email protected]>
Delivered-To: x
Received: (qmail 27044 invoked by uid 399); 18 Dec 2007 13:14:19 -0000
Received: from unknown (HELO ?151.60.21.131?) (151.60.21.131)
  by gw-mail4.m****here.biz with ESMTP; 18 Dec 2007 13:14:19 -0000
Received-SPF: none (gw-mail4.m****here.biz: domain at jmm74.mail.yale.edu does not designate permitted sender hosts)
	identity=mailfrom; client-ip=151.60.21.131;
	envelope-from=<[email protected]>;
Received: from fmpckb7bzon0rt9 ([115.180.2.61] helo=fmpckb7bzon0rt9)
	by [151.60.21.131] ( sendmail 8.13.3/8.13.1) with esmtpa id 1wUzlj-000GCT-Fm
	for x; Tue, 18 Dec 2007 14:14:22 +0100
Message-ID: <[email protected]>
Date: Tue, 18 Dec 2007 14:14:04 +0100
From: "Liora pinocci" <[email protected]>
User-Agent: Thunderbird 2.0.0.0 (Windows/20070326)
MIME-Version: 1.0
To: x
Subject: kassman
Content-Type: multipart/mixed;
 boundary="------------ComodoEmailScanner060606"
Content-Transfer-Encoding: 7bit
The last message with dspam tags was back on 19-Nov-2007. Is dspam not working with mail4?
 
Will the dspam still catch their legitimate emails anyway?
You are part of our new cluster where we are still to launch dspam.

The last message with dspam tags was back on 19-Nov-2007. Is dspam not working with mail4?
You are right. We still haven't turned it on for mail4 since the dspam1 incident.
dspam1 is already rebuild and should be soon serving mail4.

Others, we do are aware that dspam hasn't been doing well in past few days.
Reason is that the spam corpus we fed to dspam had mails from spammers with date in future(includes dates of Dec). Other than that, please don't look at individual words or patterns. It is a whole combination of patterns, plus overall score that these patterns form combined for dspam to call a mail spam.
If you wish to know more about it, please read here:
http://en.wikipedia.org/wiki/Bayesian_filtering
http://www.paulgraham.com/spam.html

Lastly, while we train dspam only on weekends normally, we have re-enabled dspam training data collection. Please feel free to submit any false positives received in last 48 hours via a support ticket.
Please create only a single support ticket and attach all false positives (incorrectly marked emails) in this ticket. We will not consider HTML newsletters.
 
Back
Top