- Package:
- lists.debian.org
- Source:
- lists.debian.org
- Submitter:
- Santiago Vila
- Date:
- 2025-02-08 08:09:01 UTC
- Severity:
- wishlist
- Tags:
This is a summary of the spam and legitimate messages I received from
the debian lists I'm subscribed during last month, ordered by
spamassassin score:
Legitimate spam % of spam
Below 0 5884 5 0.08%
Between 0 and 1 689 123 15.15%
Between 1 and 2 236 227 49.03%
Between 2 and 3 150 174 53.70%
Between 3 and 4 20 164 89.13%
Total 6979 693 9.03%
As you will see, the higher the spamassassin score, the greater the
probability of a message being spam is (I do not expect this to
change in the near future even if we use better filtering methods).
This suggests that a very simple but effective way to get rid of a lot
of spam would be to moderate messages over a certain threshold.
For example, by moderating messages having a spamassassin score over
2.0, we would have to moderate only 2.4% of all legitimate traffic,
but we would get rid of 48.8% of all the current spam, i.e. for very
little cost, we could get a very high benefit.
Additionaly, if we assume that most (if not all) of the spam comes
from non-subscribers, we could let messages from subscribers to pass
(assuming they aren't caught by spamassassin, that is) and moderate
only messages coming from non-subscribers. This way moderation would
be even easier.
I proposed this on debian-devel about two months ago and there were
people willing to moderate lists in this way, so I think this would work.
As a side effect, we would not have to worry so much about some
spamassassin scores. For example, we could assign NO_REAL_NAME its
original score of 1.285, avoiding a lot of spam to the list subscribers
(what is usual for a standard spamassassin installation) and moderators
could care about the extremely low number of false positives we would
obtain in the range from 2.0 to 4.0.
Thanks.
Hello.
These are the statistics for the approximate amount of ham/spam I
received in January, ordered by spamassassin score:
ham spam % of spam
Below 0 5704 14 0.24%
Between 0 and 1 625 27 4.14%
Between 1 and 2 278 62 18.24%
Between 2 and 3 172 83 32.55%
Between 3 and 4 20 44 68.75%
Total 6799 230 3.27%
Comments:
* Compared to last month, spam is now approximately 1/3 of what it
used to be. Congratulations.
* There is an increasing number of bogus virus warnings which I have
excluded from these figures. This is a problem that should be
addressed also.
* Even if we are now using more effective anti-spam filters, it's
still true that as the spamassassin score increases, so does the
probability of the messages being spam; so I still suggest that we
start moderating messages having a high spamassassin score.
Thanks.
Hi Santiago, I'm assuming ham is good messages -- I haven't heard that terminology before. Thanks for those figures; I'll be sure to use them in the next update we send out. I agree, now that I'm back I'll try to work on these some more. Currently I am thinking of soliciting for moderators for each list and bouncing messages ovr 3.5 to them for approval. I'm working on the SA tags and moderating the chinese lists first though. Regards, Anand
If you do implement moderation of posts by non-subscribers, I think it would be a good idea to have some sort of whitelist mechanism for subscribers that regularly post from an address other than their subscription address. I'm one such person. My From header says liw@iki.fi. IKI is a non-commercial forwarding service that has been in operation since 1995 and which has a high probability of staying in operation for the next decade, at least. One way to ensure this is to avoid flooding the mail server with unnecessary mails. For example, I do not subscribe to mailing lists via liw@iki.fi, since mailing list subscriptions are easy to change. Thus, whenever I point liw@iki.fi to another mailbox, all my personal, regular mail goes there and then I change my mailing list subscriptions as well. Other people have other reasons for having the subscriber address and the From header be different. For example, it is somewhat popular, it seems, to subscribe to lists with addresses of the form liw+listname@example.com, to ease filtering. Thus, if Debian lists will become moderated for posts from non-subscribers, a whitelist feature would be much appreciated by me and my ilk, and probably also by the moderators, since it would lessen their workload.
A whitelist mechanism is described in Bug #175477. The report includes implementation.
I'd only extend the courtesy to subscribers if the SA score of their messages was < -2 or so. We definitely don't want spammers to subscribe to the whitelist and then freely and easily spam every list.
Hi, You wrote: I was just wondering, can I hassle you to post your current statistics? :) BTW, while we're at stats, there's a ~ 1.6% false positive rate on spamassassinated messages with score between 4.5 and 10 in listmaster mail (this includes smartlist-generated mail such as failed subscription requests). Unfortunately I didn't make a note of their exact scores, but IIRC they were all well under 7. I wonder if there's a similar rate in the 4 to 5 range...
The reason I stopped temporarily giving you statistics is that June was extremely bad. Among the mail I received there were 463 spam messages and 5961 legitimate ones. 463/(463+5961) = 7.2% of spam. There will be better statistics at the end of this month, but I wish you really consider using some good DNSBLs in murphy for the cases spamd fails as it did in June, so that we do not only rely on spamassassin to stop spam. Using some good DNSBLs in murphy would also remove some of the spam which is sent to @debian.org accounts, many spammers prefer murphy or gluck over master as their MX for debian.org just because they think a non-primary MX will have less anti-spam controls.
Hi,
in the meantime we reworked the filtering mechanism so we now use
amavis with spamassassin and we have a whitelist.
A Moderation for some 'grey' articles would be a nice thing, but
as that would need Moderators and Implementation this is a wontfix
now.
As this bug is rather old i close it now, feel free to re-open it with
some words about your opinion of the current situation.
Yours,
Cord, Debian Listmaster of the day
reopen 175744 thanks If you are still using spamassassin, then it is almost sure that the initial message in this report still holds: The more the spamicity of a message, the greater the probability that the message is spam. Of course this proposal would need moderators, but there are already volunteer people who report "this message is spam" in the listarchives. So it's not Moderators what we don't have but willingness to code an implementation. Why not tag this bug "help" instead of "wontfix" then? I thought it was pretty obvious that it's better to stop the spam before it's sent than to remove it from the list archives after it has been distributed to thousands of people. So if you are doing campaign for people to click in the "report spam" button, please start thinking about this proposal of moderating messages, as it would be a lot more optimal way to avoid spam. Thanks.
Hallo! Du (Santiago Vila) hast geschrieben:
The 'report as spam'-users are unqualified, they are search engines
and other Joe Random Listarchiveusers.
The real Reviewers (which have to be DDs) currently aren't enough to
do this in the needed quantity and speed.
There would be a need to review ~100 postings every day atm.
So from my point of view this is a wontfix, because I don't see that
we get a reliable and fast Moderation and i also don't see anyone who
would implement it.
Yours,
Cord, Debian Listmaster of the day
Hello, I am contacting you with regards to using your name for funds claim of long overdue dormant funds for Investment belonging to a late depositor. Let me know if you are interested for more details. Best Regards,
Hello, I am contacting you with regards to using your name for funds claim of long overdue dormant funds for Investment belonging to a late depositor. Let me know if you are interested for more details. Best Regards,
Final Notice. You are among the beneficiaries of 2024/2025 grant for all scam victims and relatives reconfirm your email if active for more details Thank You. Regards Mr. Rowland Cole ( Financial Crimes Enforcement Network)
Final Notice. You are among the beneficiaries of 2024/2025 grant for all scam victims and relatives reconfirm your email if active for more details Thank You. Regards Mr. Rowland Cole ( Financial Crimes Enforcement Network)