#632438 popularity-contest: a way to exclude certain packages

#632438#5
Date:
2011-07-02 08:43:34 UTC
From:
To:
It would be nice if I could configure popularity-contest to ignore
certain packages when collecting data.

I would like to use meta-packages to manage the list of installed
packages on my machines, but I don't want to leak the names of those
packages since their names will probably be based on my hostnames, which
are already public, meaning that people could easily find out which
packages I have installed.

The first one should be enough for me but multiple ways of excluding
packages might be useful for other folks:

      * individual packages: foo bar
      * package globs: foo-*
      * package regexs: foo.*
      * based on the Origin in the Release file of the mirror
      * based on an option in the sources.list:

deb [popcon=no] http://ftp.debian.org/debian/ stable main contrib non-free

#632438#10
Date:
2012-10-29 08:57:55 UTC
From:
To:
I think the problem is worse than Paul Wise outlines. The package
description claims anonymity. This is only true if it cannot be
trivially defeated.

The common use case for equivs is to create a package based on the
hostname. Gladly popcon gives us numbers[1]. So about 8% of the
submitters are using equivs. (Some machines will use packages generated
using equivs without actually having installed equivs.) Let's assume
that half of them employ a metapackage based on the hostname. The
hostname is kind of public. It occurs in message-ids, bug reports, etc.
So using this scheme we can almost trivially deanonymize 4% of the
users.

Another case is looking at packages whose versions are newer than sid or
experimental. Most likely the machine owner is the maintainer or an
uploader. This also works for mentors and for them probably even better,
because their packages tend to wait for a long time until being
uploaded. A quick grep on the maintainer field shows about 2000
different maintainer addresses. Let's guess every fourth maintainer is
using using pop-con and can be deanonymized using this technique.
Another 0.5%.

These numbers are low for the general but still alarming. The risk of
being deanonymized is way higher for maintainers or developers unless
they are aware of the problem an work around[2] it or simply remove
popcon.

Please remove the false anonymity claim until this is fixed as it leads
users into wrong beliefs. I therefore suggest upgrading severity to
rc-ness.

Imo the default for popcon should be only listing packages that
originate from Debian. Everything else is none of our business.

Unfortunately I cannot provide a solution or patch. For instance the
Origin field (in dpkg-query --showformat) does not help here. An option
might be to use aptitude search '~i ~ODebian' -F '%p'. (Thanks Paul!)
This would introduce a dependency on aptitude.

Helmut

[1] http://qa.debian.org/popcon.php?package=equivs
[2] http://bonedaddy.net/pabs3/log/2012/10/29/thoughts-on-debian-testing/

#632438#13
Date:
2012-10-29 08:57:55 UTC
From:
To:
I think the problem is worse than Paul Wise outlines. The package
description claims anonymity. This is only true if it cannot be
trivially defeated.

The common use case for equivs is to create a package based on the
hostname. Gladly popcon gives us numbers[1]. So about 8% of the
submitters are using equivs. (Some machines will use packages generated
using equivs without actually having installed equivs.) Let's assume
that half of them employ a metapackage based on the hostname. The
hostname is kind of public. It occurs in message-ids, bug reports, etc.
So using this scheme we can almost trivially deanonymize 4% of the
users.

Another case is looking at packages whose versions are newer than sid or
experimental. Most likely the machine owner is the maintainer or an
uploader. This also works for mentors and for them probably even better,
because their packages tend to wait for a long time until being
uploaded. A quick grep on the maintainer field shows about 2000
different maintainer addresses. Let's guess every fourth maintainer is
using using pop-con and can be deanonymized using this technique.
Another 0.5%.

These numbers are low for the general but still alarming. The risk of
being deanonymized is way higher for maintainers or developers unless
they are aware of the problem an work around[2] it or simply remove
popcon.

Please remove the false anonymity claim until this is fixed as it leads
users into wrong beliefs. I therefore suggest upgrading severity to
rc-ness.

Imo the default for popcon should be only listing packages that
originate from Debian. Everything else is none of our business.

Unfortunately I cannot provide a solution or patch. For instance the
Origin field (in dpkg-query --showformat) does not help here. An option
might be to use aptitude search '~i ~ODebian' -F '%p'. (Thanks Paul!)
This would introduce a dependency on aptitude.

Helmut

[1] http://qa.debian.org/popcon.php?package=equivs
[2] http://bonedaddy.net/pabs3/log/2012/10/29/thoughts-on-debian-testing/

#632438#18
Date:
2012-10-29 09:13:38 UTC
From:
To:
I strongly disagree with this. The unknown packages index of popcon is
one of the most useful parts of it. It is useful sorting RFPs by number
of existing users on wnpp.debian.net. It is useful because all the
derivatives other than Ubuntu are currently submitting to popcon.d.o,
many of them include new packages that we might want to package and it
would be a good idea to gauge popularity before doing so. They also
reveal the extent to which Debian is not meeting the needs of our users
as well as the extent to which Debian users use external non-free
packages. IMO restricting the package set needs to be an explicit choice
on the part of the user.

#632438#23
Date:
2013-05-05 12:57:12 UTC
From:
To:
I agree with the risk of deanonymization, however you have to look at the
consequence: we only publish agregated results, not individual reports, so this
is only leaking whether someone is reporting or not, this does not leak the
full list of packages, or the popcon UUID.

Cheers,

#632438#28
Date:
2013-05-05 12:57:12 UTC
From:
To:
I agree with the risk of deanonymization, however you have to look at the
consequence: we only publish agregated results, not individual reports, so this
is only leaking whether someone is reporting or not, this does not leak the
full list of packages, or the popcon UUID.

Cheers,

#632438#31
Date:
2013-05-05 12:57:12 UTC
From:
To:
I agree with the risk of deanonymization, however you have to look at the
consequence: we only publish agregated results, not individual reports, so this
is only leaking whether someone is reporting or not, this does not leak the
full list of packages, or the popcon UUID.

Cheers,

#632438#36
Date:
2013-05-08 16:07:36 UTC
From:
To:
You are missing a few pieces. There is a general principle of not
collecting data that you don't need.

Believe it or not, the popcon server may be compromised at a future
time. We can defend now by not even collecting data that is not needed.

What about the actual data transfer? It usually works via http or smtp.
Anyone sniffing the traffic can learn a lot from those little extra
packages not to be found in the archive. Of course the traffic could be
encrypted. Turning it harmless is another viable option though.

Finally I did find a number of corporate packages in popcon already.
Packages that clearly belong to a particular institution or company. Now
you learn that said institution uses Debian and popcon from the publicly
visible popcon reports.

Sorry, but given these issues I currently recommend not using popcon to
people who ask me.

Helmut

#632438#39
Date:
2013-05-08 16:07:36 UTC
From:
To:
You are missing a few pieces. There is a general principle of not
collecting data that you don't need.

Believe it or not, the popcon server may be compromised at a future
time. We can defend now by not even collecting data that is not needed.

What about the actual data transfer? It usually works via http or smtp.
Anyone sniffing the traffic can learn a lot from those little extra
packages not to be found in the archive. Of course the traffic could be
encrypted. Turning it harmless is another viable option though.

Finally I did find a number of corporate packages in popcon already.
Packages that clearly belong to a particular institution or company. Now
you learn that said institution uses Debian and popcon from the publicly
visible popcon reports.

Sorry, but given these issues I currently recommend not using popcon to
people who ask me.

Helmut

#632438#44
Date:
2013-05-09 06:39:22 UTC
From:
To:
Quoting Helmut Grohne (helmut@subdivi.de):


This discussion starts to annoy me, to say the least.

Could please ultra-paranoid people propose patches instead of telling
the popcon maintainer what he should do but not help home doing so?

I feel like my hair has been cut in four pieces many many many times
since I read popcon PTS....and my bike has been painted in different
colours a few gazillion times. But I haven't seen many proposed
patches.

#632438#49
Date:
2013-05-09 10:10:23 UTC
From:
To:
I completly agree with that, but if you look at the list of bug report, you
will see half of them ask for more information to be reported, and the other
half to report less information. So my only viable option is to keep the status
quo. This at least has the benefit of providing consistency and do not require
users to make new security/privacy deicision with each new popcon release.

Yes there is plan to encrypt traffic. Mainly now it depends whether Debian is
willing to "pay" for the extra CPU time decrypting the reports.

Could you give me some pointer to such packages (even privately if you prefer) ?
I have been considering allowing some packages to opt-out of popcon.

If you deal with people with strict security/privacy requirement, you are correct
to do so. I would do the same.

Cheers,

#632438#52
Date:
2013-05-09 10:10:23 UTC
From:
To:
I completly agree with that, but if you look at the list of bug report, you
will see half of them ask for more information to be reported, and the other
half to report less information. So my only viable option is to keep the status
quo. This at least has the benefit of providing consistency and do not require
users to make new security/privacy deicision with each new popcon release.

Yes there is plan to encrypt traffic. Mainly now it depends whether Debian is
willing to "pay" for the extra CPU time decrypting the reports.

Could you give me some pointer to such packages (even privately if you prefer) ?
I have been considering allowing some packages to opt-out of popcon.

If you deal with people with strict security/privacy requirement, you are correct
to do so. I would do the same.

Cheers,

#632438#57
Date:
2014-07-11 23:17:54 UTC
From:
To:
These two bugs seem identical:

  #632438: popularity-contest: a way to exclude certain packages
  #681721: popularity-contest: option to limit the list of packages sended to popcon

Hopefully you'll agree.

live well,
  vagrant

#632438#68
Date:
2014-07-12 18:46:34 UTC
From:
To:
It is not me who should agree but the submitters, which you did not CC.

I am sure you meant well, but as far as I am concerned this is a distraction.
There is too much diverging discussion to handle them as a single report.

What I am more interested is what are the proposed use cases for the feature
and whether this is something popularity-contest should support.

I offered 'X-Popcon-report: no' but the reporters do not seem interested.

Cheers,

#632438#71
Date:
2014-07-12 18:46:34 UTC
From:
To:
It is not me who should agree but the submitters, which you did not CC.

I am sure you meant well, but as far as I am concerned this is a distraction.
There is too much diverging discussion to handle them as a single report.

What I am more interested is what are the proposed use cases for the feature
and whether this is something popularity-contest should support.

I offered 'X-Popcon-report: no' but the reporters do not seem interested.

Cheers,

#632438#76
Date:
2020-09-20 04:12:56 UTC
From:
To:
Hi,

I've attached a simple PoC patch to exclude certain packages.
patch is generated against 1.70.

It may be better spec or implementation, but just works for me.

#632438#83
Date:
2020-09-20 09:04:18 UTC
From:
To:
Could you explain why you want to exclude some package ?
I am concerned this will squew the statistic, if used indiscriminately.

Instead I would suggest to add a new dpkg control field 'X-Popcon: private' and
have popularity-contest skip packages having this field.

This way users will be able to create private packages that will never
register even on misconfigured hosts.

Cheers,
Bill

#632438#88
Date:
2020-09-20 09:21:17 UTC
From:
To:
[Bill Allombert]

The use case was explained by Paul Wise in the opening of this issue.

This sound like a good solution to the use case from Paul, about
personal metapackages with hostname in their name.

#632438#93
Date:
2020-09-20 09:41:23 UTC
From:
To:
Indeed, and I suggested the control field solution already, but so far nobody
answered they were interested in this feaure so I did not implement it.

Thanks, this is encouraging!

Cheers,

#632438#98
Date:
2020-09-20 11:08:35 UTC
From:
To:
Hi,

I want to exclude installed packages from 3rd party vendor
such as /etc/apt/sources.list.d/*.list. (some may be proprietary software)
If the packages are under my control, I could add 'X-Popcon: private',
but not for 3rd party vendor's package every updates, I think.

So I think that BOTH of approach is appropriate.

1. Add (popularity-contenst) configuration option to exclude specific
   packages (such as 3rd party packages)
   I've atached a PoC patch.
2. Use 'X-Popcon: private' for private packages.


Regards,

#632438#103
Date:
2020-09-20 11:41:53 UTC
From:
To:
OK, but what is your purpose in excluding them from popcon ?

Cheers,

#632438#108
Date:
2020-09-20 14:56:02 UTC
From:
To:
Personally I'm excluding packages created by mk-build-deps as well as
the metapackages I create for my own systems, using a simple `grep -v`
in the popcon submission cron job.

The patch posted by Kentaro is not sufficient for my use-case, which
relies on excluding packages via regex instead of full package name.

This isn't going to be useful for users who want to exclude packages
from repos that they do not control or packages built by mk-build-deps
or other tools that do not allow adding extra dpkg control fields.

#632438#113
Date:
2020-09-20 17:53:51 UTC
From:
To:
But again why would they want that ? The only thing popcon report
is the package names  (which would be public anyway) and some timing
data. If they do not trust popcon anonymization, then it is safer to
disable popcon entirely.

It is a given users can mess with popcon reports in any way then want.
However randomly hiding packages from popcon report is not something
that should be sanctionned by the popularity-contest package.

I suppose mk-build-deps and other tools could then be updated to
support this feature. This is not really an objection. In fact it
would make this much easier.

Cheers,

#632438#118
Date:
2020-09-23 12:23:13 UTC
From:
To:
Personally, I think that popcon data from 3rd party packages
is just a noise because there is nothing to do with Debian.
Therefore, I think that it seems better to exclude.

But this is my personal opinion, so I don't mean to force others
to do so. I'm happy if I have an option to exclude them.

#632438#123
Date:
2020-09-23 14:18:11 UTC
From:
To:
[Kentaro Hayashi]

I've used these lists in the past to find packages we should have a look
at for inclusion in Debian, so I do not believe it is noise.

#632438#128
Date:
2020-09-23 14:33:54 UTC
From:
To:
Not always. Sometimes that points to packages that are missing in Debian
and should be packaged.

But as soon as a single system report a package name, it appears in the
statistics. So unless everyone set up popcon to discard it, there is the
same amount of noise with less accurate statistics.

One other option would be for popularity-contest to detect third-party packages, but
this is difficult to do client-side. However this is done server-side,
see <https://popcon.debian.org/unknown/by_inst.gz>

Cheers,

#632438#133
Date:
2020-11-08 13:10:10 UTC
From:
To:
On Wed, 23 Sep 2020 16:33:54 +0200 Bill Allombert <ballombe@debian.org> wrote:
snip

I've missed this point of view.

As you mentioned, it may be meaningless unless everyone set up popcon
to discard it.
So, to make statistics accurately, it may be a bad idea to filter them.


Regards,

#632438#138
Date:
2020-11-13 16:33:47 UTC
From:
To:
Thanks for getting back to us!

So how about the proposal to use a dpkg field to identify packages than
need filtering ?

Cheers,

#632438#139
Date:
2022-07-18 22:50:46 UTC
From:
To:
We believe that the bug you reported is fixed in the latest version of
popularity-contest, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 681721@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Bill Allombert <ballombe@debian.org> (supplier of updated popularity-contest package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Mon, 18 Jul 2022 12:57:02 +0200
Source: popularity-contest
Architecture: source
Version: 1.74
Distribution: unstable
Urgency: medium
Maintainer: Popularity Contest Developers <debian-popcon@lists.debian.org>
Changed-By: Bill Allombert <ballombe@debian.org>
Closes: 681721 999319 1001956
Changes:
 popularity-contest (1.74) unstable; urgency=medium
 .
   * debian/rules: add missing targets. Closes: #999319 Thanks Lucas Nussbaum
   * debian-popcon.gpg: use new submission key.
     The key is back to RSA4096. Closes: #1001956.
   * popularity-contest: New feature: skip private packages that declare
     XB-Popcon-Reports: no in debian/control. This is to be used solely
     by tools that generate packages with unique names, to avoid the
     unique name to leak. Closes: #681721.
   * debian/control:
     - Build-Depends: debhelper-compat (= 13)
Checksums-Sha1:
 f2feaf044d0ec3602a00430ad2fb6980dd234d02 1731 popularity-contest_1.74.dsc
 2f16cc110a6ae92e8b99d13a22c2e4790c2b21d4 79544 popularity-contest_1.74.tar.xz
 33acc558ea1b20156279208b5126792f7f31d881 5624 popularity-contest_1.74_source.buildinfo
Checksums-Sha256:
 15652667dbeb527326b0420cd9a7a2c024c6e7e7d99fc0d298341298aacd5599 1731 popularity-contest_1.74.dsc
 4b2d7db55a84d100c1b5995a881971cf604eeb3a6d9562cc9570e8caed035069 79544 popularity-contest_1.74.tar.xz
 66ffa05b6eeced877eec44ddac05b3f2bb9caaf9b6b8f44959f358679fffff7e 5624 popularity-contest_1.74_source.buildinfo
Files:
 d2e880f2c63eb86a3fcbcf78571a97da 1731 misc optional popularity-contest_1.74.dsc
 0ba66137a2ae1b90dd8a0371b5772665 79544 misc optional popularity-contest_1.74.tar.xz
 4ef62e75641f36e4d8bc757d51a9e7b9 5624 misc optional popularity-contest_1.74_source.buildinfo
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEQgKOpASi6dgKxFMUjw58K0Ui44cFAmLV4kcACgkQjw58K0Ui
44d/AhAAn6llSJyy0tMa6rUwrkDr53oECOyUcs+SKT6M9ztVFohAw/NiHB+Ja/ho
gYgeQq4P9KAoFwPAK/sHQL7gp3ZHKxOlZd4w2ZPCmDKu1Mzocs0tQ7y5TRutyOzc
QVgyIWFBZQ/z/aERmy7+WD5+F7/h4+ZxGjMNDOgKVuSPs4b5pZeQTf7xI3/NTEk5
PzNAS+7SiNhQTsrcldtVmBjgMTxSG8tiyfbKYNjmEs78PFpxqc49wudp5ftwXEf7
VWKYTsnFlUSF3oKhz2HEvEw7ulL9C21yEEMbZFIHvXSAVH3mBrS6W+wzKvQTcoCS
ETeJGm0Leg4+Y/RetE4FXynRMemhgqWqX7NJof6NYvw+CgM/bhQnnh5qzO/b+Ed+
obZrEzCXcPRtN0VTPcHOZjYBEaFrYumn2RRvfOVhP4geJ9Oqfxw8alecEvePQ7KV
QDpYKN6bbAO7LAHksRnRuoE195YTK8pxjZpdXOGZQRydXJ7o9euoGqG/5ACVU7MB
uf6VRuCt22t9gFIktSwMaAlqWrmbF1jU2+uZsmcr54JkY3kvk52L6GfcmCgQxf1x
z7YuTBcNZ+ax7aKtYn0VQAYQLiG9brUHGYaPGEQMH2D+Iaq4vTAgOEpwZIoTcKVp
rYHmDH7FjWmaq6fwX+fFVFC1W0foBDQ9tTbk5IFm40MODs2Phfk=
=Ny6J
-----END PGP SIGNATURE-----