- Package:
- popularity-contest
- Source:
- popularity-contest
- Submitter:
- Paul Wise
- Date:
- 2022-07-18 22:54:08 UTC
- Severity:
- wishlist
- Tags:
It would be nice if I could configure popularity-contest to ignore
certain packages when collecting data.
I would like to use meta-packages to manage the list of installed
packages on my machines, but I don't want to leak the names of those
packages since their names will probably be based on my hostnames, which
are already public, meaning that people could easily find out which
packages I have installed.
The first one should be enough for me but multiple ways of excluding
packages might be useful for other folks:
* individual packages: foo bar
* package globs: foo-*
* package regexs: foo.*
* based on the Origin in the Release file of the mirror
* based on an option in the sources.list:
deb [popcon=no] http://ftp.debian.org/debian/ stable main contrib non-free
I think the problem is worse than Paul Wise outlines. The package description claims anonymity. This is only true if it cannot be trivially defeated. The common use case for equivs is to create a package based on the hostname. Gladly popcon gives us numbers[1]. So about 8% of the submitters are using equivs. (Some machines will use packages generated using equivs without actually having installed equivs.) Let's assume that half of them employ a metapackage based on the hostname. The hostname is kind of public. It occurs in message-ids, bug reports, etc. So using this scheme we can almost trivially deanonymize 4% of the users. Another case is looking at packages whose versions are newer than sid or experimental. Most likely the machine owner is the maintainer or an uploader. This also works for mentors and for them probably even better, because their packages tend to wait for a long time until being uploaded. A quick grep on the maintainer field shows about 2000 different maintainer addresses. Let's guess every fourth maintainer is using using pop-con and can be deanonymized using this technique. Another 0.5%. These numbers are low for the general but still alarming. The risk of being deanonymized is way higher for maintainers or developers unless they are aware of the problem an work around[2] it or simply remove popcon. Please remove the false anonymity claim until this is fixed as it leads users into wrong beliefs. I therefore suggest upgrading severity to rc-ness. Imo the default for popcon should be only listing packages that originate from Debian. Everything else is none of our business. Unfortunately I cannot provide a solution or patch. For instance the Origin field (in dpkg-query --showformat) does not help here. An option might be to use aptitude search '~i ~ODebian' -F '%p'. (Thanks Paul!) This would introduce a dependency on aptitude. Helmut [1] http://qa.debian.org/popcon.php?package=equivs [2] http://bonedaddy.net/pabs3/log/2012/10/29/thoughts-on-debian-testing/
I think the problem is worse than Paul Wise outlines. The package description claims anonymity. This is only true if it cannot be trivially defeated. The common use case for equivs is to create a package based on the hostname. Gladly popcon gives us numbers[1]. So about 8% of the submitters are using equivs. (Some machines will use packages generated using equivs without actually having installed equivs.) Let's assume that half of them employ a metapackage based on the hostname. The hostname is kind of public. It occurs in message-ids, bug reports, etc. So using this scheme we can almost trivially deanonymize 4% of the users. Another case is looking at packages whose versions are newer than sid or experimental. Most likely the machine owner is the maintainer or an uploader. This also works for mentors and for them probably even better, because their packages tend to wait for a long time until being uploaded. A quick grep on the maintainer field shows about 2000 different maintainer addresses. Let's guess every fourth maintainer is using using pop-con and can be deanonymized using this technique. Another 0.5%. These numbers are low for the general but still alarming. The risk of being deanonymized is way higher for maintainers or developers unless they are aware of the problem an work around[2] it or simply remove popcon. Please remove the false anonymity claim until this is fixed as it leads users into wrong beliefs. I therefore suggest upgrading severity to rc-ness. Imo the default for popcon should be only listing packages that originate from Debian. Everything else is none of our business. Unfortunately I cannot provide a solution or patch. For instance the Origin field (in dpkg-query --showformat) does not help here. An option might be to use aptitude search '~i ~ODebian' -F '%p'. (Thanks Paul!) This would introduce a dependency on aptitude. Helmut [1] http://qa.debian.org/popcon.php?package=equivs [2] http://bonedaddy.net/pabs3/log/2012/10/29/thoughts-on-debian-testing/
I strongly disagree with this. The unknown packages index of popcon is one of the most useful parts of it. It is useful sorting RFPs by number of existing users on wnpp.debian.net. It is useful because all the derivatives other than Ubuntu are currently submitting to popcon.d.o, many of them include new packages that we might want to package and it would be a good idea to gauge popularity before doing so. They also reveal the extent to which Debian is not meeting the needs of our users as well as the extent to which Debian users use external non-free packages. IMO restricting the package set needs to be an explicit choice on the part of the user.
I agree with the risk of deanonymization, however you have to look at the consequence: we only publish agregated results, not individual reports, so this is only leaking whether someone is reporting or not, this does not leak the full list of packages, or the popcon UUID. Cheers,
I agree with the risk of deanonymization, however you have to look at the consequence: we only publish agregated results, not individual reports, so this is only leaking whether someone is reporting or not, this does not leak the full list of packages, or the popcon UUID. Cheers,
I agree with the risk of deanonymization, however you have to look at the consequence: we only publish agregated results, not individual reports, so this is only leaking whether someone is reporting or not, this does not leak the full list of packages, or the popcon UUID. Cheers,
You are missing a few pieces. There is a general principle of not collecting data that you don't need. Believe it or not, the popcon server may be compromised at a future time. We can defend now by not even collecting data that is not needed. What about the actual data transfer? It usually works via http or smtp. Anyone sniffing the traffic can learn a lot from those little extra packages not to be found in the archive. Of course the traffic could be encrypted. Turning it harmless is another viable option though. Finally I did find a number of corporate packages in popcon already. Packages that clearly belong to a particular institution or company. Now you learn that said institution uses Debian and popcon from the publicly visible popcon reports. Sorry, but given these issues I currently recommend not using popcon to people who ask me. Helmut
You are missing a few pieces. There is a general principle of not collecting data that you don't need. Believe it or not, the popcon server may be compromised at a future time. We can defend now by not even collecting data that is not needed. What about the actual data transfer? It usually works via http or smtp. Anyone sniffing the traffic can learn a lot from those little extra packages not to be found in the archive. Of course the traffic could be encrypted. Turning it harmless is another viable option though. Finally I did find a number of corporate packages in popcon already. Packages that clearly belong to a particular institution or company. Now you learn that said institution uses Debian and popcon from the publicly visible popcon reports. Sorry, but given these issues I currently recommend not using popcon to people who ask me. Helmut
Quoting Helmut Grohne (helmut@subdivi.de): This discussion starts to annoy me, to say the least. Could please ultra-paranoid people propose patches instead of telling the popcon maintainer what he should do but not help home doing so? I feel like my hair has been cut in four pieces many many many times since I read popcon PTS....and my bike has been painted in different colours a few gazillion times. But I haven't seen many proposed patches.
I completly agree with that, but if you look at the list of bug report, you will see half of them ask for more information to be reported, and the other half to report less information. So my only viable option is to keep the status quo. This at least has the benefit of providing consistency and do not require users to make new security/privacy deicision with each new popcon release. Yes there is plan to encrypt traffic. Mainly now it depends whether Debian is willing to "pay" for the extra CPU time decrypting the reports. Could you give me some pointer to such packages (even privately if you prefer) ? I have been considering allowing some packages to opt-out of popcon. If you deal with people with strict security/privacy requirement, you are correct to do so. I would do the same. Cheers,
I completly agree with that, but if you look at the list of bug report, you will see half of them ask for more information to be reported, and the other half to report less information. So my only viable option is to keep the status quo. This at least has the benefit of providing consistency and do not require users to make new security/privacy deicision with each new popcon release. Yes there is plan to encrypt traffic. Mainly now it depends whether Debian is willing to "pay" for the extra CPU time decrypting the reports. Could you give me some pointer to such packages (even privately if you prefer) ? I have been considering allowing some packages to opt-out of popcon. If you deal with people with strict security/privacy requirement, you are correct to do so. I would do the same. Cheers,
These two bugs seem identical: #632438: popularity-contest: a way to exclude certain packages #681721: popularity-contest: option to limit the list of packages sended to popcon Hopefully you'll agree. live well, vagrant
It is not me who should agree but the submitters, which you did not CC. I am sure you meant well, but as far as I am concerned this is a distraction. There is too much diverging discussion to handle them as a single report. What I am more interested is what are the proposed use cases for the feature and whether this is something popularity-contest should support. I offered 'X-Popcon-report: no' but the reporters do not seem interested. Cheers,
It is not me who should agree but the submitters, which you did not CC. I am sure you meant well, but as far as I am concerned this is a distraction. There is too much diverging discussion to handle them as a single report. What I am more interested is what are the proposed use cases for the feature and whether this is something popularity-contest should support. I offered 'X-Popcon-report: no' but the reporters do not seem interested. Cheers,
Hi, I've attached a simple PoC patch to exclude certain packages. patch is generated against 1.70. It may be better spec or implementation, but just works for me.
Could you explain why you want to exclude some package ? I am concerned this will squew the statistic, if used indiscriminately. Instead I would suggest to add a new dpkg control field 'X-Popcon: private' and have popularity-contest skip packages having this field. This way users will be able to create private packages that will never register even on misconfigured hosts. Cheers, Bill
[Bill Allombert] The use case was explained by Paul Wise in the opening of this issue. This sound like a good solution to the use case from Paul, about personal metapackages with hostname in their name.
Indeed, and I suggested the control field solution already, but so far nobody answered they were interested in this feaure so I did not implement it. Thanks, this is encouraging! Cheers,
Hi, I want to exclude installed packages from 3rd party vendor such as /etc/apt/sources.list.d/*.list. (some may be proprietary software) If the packages are under my control, I could add 'X-Popcon: private', but not for 3rd party vendor's package every updates, I think. So I think that BOTH of approach is appropriate. 1. Add (popularity-contenst) configuration option to exclude specific packages (such as 3rd party packages) I've atached a PoC patch. 2. Use 'X-Popcon: private' for private packages. Regards,
OK, but what is your purpose in excluding them from popcon ? Cheers,
Personally I'm excluding packages created by mk-build-deps as well as the metapackages I create for my own systems, using a simple `grep -v` in the popcon submission cron job. The patch posted by Kentaro is not sufficient for my use-case, which relies on excluding packages via regex instead of full package name. This isn't going to be useful for users who want to exclude packages from repos that they do not control or packages built by mk-build-deps or other tools that do not allow adding extra dpkg control fields.
But again why would they want that ? The only thing popcon report is the package names (which would be public anyway) and some timing data. If they do not trust popcon anonymization, then it is safer to disable popcon entirely. It is a given users can mess with popcon reports in any way then want. However randomly hiding packages from popcon report is not something that should be sanctionned by the popularity-contest package. I suppose mk-build-deps and other tools could then be updated to support this feature. This is not really an objection. In fact it would make this much easier. Cheers,
Personally, I think that popcon data from 3rd party packages is just a noise because there is nothing to do with Debian. Therefore, I think that it seems better to exclude. But this is my personal opinion, so I don't mean to force others to do so. I'm happy if I have an option to exclude them.
[Kentaro Hayashi] I've used these lists in the past to find packages we should have a look at for inclusion in Debian, so I do not believe it is noise.
Not always. Sometimes that points to packages that are missing in Debian and should be packaged. But as soon as a single system report a package name, it appears in the statistics. So unless everyone set up popcon to discard it, there is the same amount of noise with less accurate statistics. One other option would be for popularity-contest to detect third-party packages, but this is difficult to do client-side. However this is done server-side, see <https://popcon.debian.org/unknown/by_inst.gz> Cheers,
On Wed, 23 Sep 2020 16:33:54 +0200 Bill Allombert <ballombe@debian.org> wrote: snip I've missed this point of view. As you mentioned, it may be meaningless unless everyone set up popcon to discard it. So, to make statistics accurately, it may be a bad idea to filter them. Regards,
Thanks for getting back to us! So how about the proposal to use a dpkg field to identify packages than need filtering ? Cheers,
We believe that the bug you reported is fixed in the latest version of
popularity-contest, which is due to be installed in the Debian FTP archive.
A summary of the changes between this version and the previous one is
attached.
Thank you for reporting the bug, which will now be closed. If you
have further comments please address them to 681721@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.
Debian distribution maintenance software
pp.
Bill Allombert <ballombe@debian.org> (supplier of updated popularity-contest package)
(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Mon, 18 Jul 2022 12:57:02 +0200
Source: popularity-contest
Architecture: source
Version: 1.74
Distribution: unstable
Urgency: medium
Maintainer: Popularity Contest Developers <debian-popcon@lists.debian.org>
Changed-By: Bill Allombert <ballombe@debian.org>
Closes: 681721 999319 1001956
Changes:
popularity-contest (1.74) unstable; urgency=medium
.
* debian/rules: add missing targets. Closes: #999319 Thanks Lucas Nussbaum
* debian-popcon.gpg: use new submission key.
The key is back to RSA4096. Closes: #1001956.
* popularity-contest: New feature: skip private packages that declare
XB-Popcon-Reports: no in debian/control. This is to be used solely
by tools that generate packages with unique names, to avoid the
unique name to leak. Closes: #681721.
* debian/control:
- Build-Depends: debhelper-compat (= 13)
Checksums-Sha1:
f2feaf044d0ec3602a00430ad2fb6980dd234d02 1731 popularity-contest_1.74.dsc
2f16cc110a6ae92e8b99d13a22c2e4790c2b21d4 79544 popularity-contest_1.74.tar.xz
33acc558ea1b20156279208b5126792f7f31d881 5624 popularity-contest_1.74_source.buildinfo
Checksums-Sha256:
15652667dbeb527326b0420cd9a7a2c024c6e7e7d99fc0d298341298aacd5599 1731 popularity-contest_1.74.dsc
4b2d7db55a84d100c1b5995a881971cf604eeb3a6d9562cc9570e8caed035069 79544 popularity-contest_1.74.tar.xz
66ffa05b6eeced877eec44ddac05b3f2bb9caaf9b6b8f44959f358679fffff7e 5624 popularity-contest_1.74_source.buildinfo
Files:
d2e880f2c63eb86a3fcbcf78571a97da 1731 misc optional popularity-contest_1.74.dsc
0ba66137a2ae1b90dd8a0371b5772665 79544 misc optional popularity-contest_1.74.tar.xz
4ef62e75641f36e4d8bc757d51a9e7b9 5624 misc optional popularity-contest_1.74_source.buildinfo
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEQgKOpASi6dgKxFMUjw58K0Ui44cFAmLV4kcACgkQjw58K0Ui
44d/AhAAn6llSJyy0tMa6rUwrkDr53oECOyUcs+SKT6M9ztVFohAw/NiHB+Ja/ho
gYgeQq4P9KAoFwPAK/sHQL7gp3ZHKxOlZd4w2ZPCmDKu1Mzocs0tQ7y5TRutyOzc
QVgyIWFBZQ/z/aERmy7+WD5+F7/h4+ZxGjMNDOgKVuSPs4b5pZeQTf7xI3/NTEk5
PzNAS+7SiNhQTsrcldtVmBjgMTxSG8tiyfbKYNjmEs78PFpxqc49wudp5ftwXEf7
VWKYTsnFlUSF3oKhz2HEvEw7ulL9C21yEEMbZFIHvXSAVH3mBrS6W+wzKvQTcoCS
ETeJGm0Leg4+Y/RetE4FXynRMemhgqWqX7NJof6NYvw+CgM/bhQnnh5qzO/b+Ed+
obZrEzCXcPRtN0VTPcHOZjYBEaFrYumn2RRvfOVhP4geJ9Oqfxw8alecEvePQ7KV
QDpYKN6bbAO7LAHksRnRuoE195YTK8pxjZpdXOGZQRydXJ7o9euoGqG/5ACVU7MB
uf6VRuCt22t9gFIktSwMaAlqWrmbF1jU2+uZsmcr54JkY3kvk52L6GfcmCgQxf1x
z7YuTBcNZ+ax7aKtYn0VQAYQLiG9brUHGYaPGEQMH2D+Iaq4vTAgOEpwZIoTcKVp
rYHmDH7FjWmaq6fwX+fFVFC1W0foBDQ9tTbk5IFm40MODs2Phfk=
=Ny6J
-----END PGP SIGNATURE-----