* Package name : popcon-stats-data Version : 0.20211114 * URL or Web page : https://popcon.debian.org/ * License : Public Domain (data) Description : Debian's Popularity Contest statistics --- The shipped data would let package managers show the popularity of packages which could let users make more informed decisions when choosing between packages to install. I don't believe this will change the Vim vs. Emacs battle, but when I looked for a DICOM viewer I found a crazy amount of programs of various quality and knowing which ones were the most widely used would have sped up picking a good one. Ideally the stats would be shipped in a format from which APT and other package managers could efficiently look up the percentage of Debian systems a particular binary package was used. Cheers, Balint
[Forwarded to and CCing the debian-popcon mailing list] ... This package would be very Debian specific and would give the wrong data when installed in Ubuntu, I think a better approach would be to ship this data in the Debian apt repository metadata, either in the Packages files or in Popularity files in the dists/ dir (similar to the Contents files used by apt-file) so that the data is directly available to apt clients like aptitude/etc. This way Ubuntu and other derivatives could also ship popularity data for their users too.
I note that debtags.debian.org uses this approach, data is gathered on the site, then uploaded to ftp-master, which integrates the data and distributes it via the Packages files. So it should work if the FTP Team and Popcon teams are willing to support the idea.
The popularity of packages is heavily skewed by how the distribution is structured, in particular by the set of packages installed by default, so alas it is not always an indication of user preferences... Cheers,
What is the idea exactly ? Several questions come to mind: How often the popcon data are going to be refreshed ? Which exact set of data are going to be used ? Cheers,
Bálint's idea was to ship popcon data in a popcon-stats-data package in the Debian archive. I suggested to instead ship that in the apt metadata present in the Packages files. I would assume with the same frequency as the existing data on the popcon.d.o website is refreshed. Anything faster than that would just be refreshing unchanged data. Anything slower than that would be providing outdated data. Outdated data is fine though, so maybe weekly. https://qa.debian.org/popcon.php?package=iotop Package: iotop Popcon: 30314 7962 21197 1143 12 If I massage the by_inst file into the same format as this, I calculate that the extra Popcon fields would add 3.7 MB to the Packages files and that data would change often, making the apt updating process slower. So probably the data should go into new files instead and there should be a config file snippet to enable downloading them, a tool to query and index them and a way for apt clients to get that data. Since the Debian repository splits the metadata by suite and component, these new statistics should probably do the same. So the raw popcon submissions would need to be individually mapped to a suite based on the popcon version in the submission, and then each item in the submission attributed to that suite/component. For popcon versions that don't match a suite, if they match a known Debian version, attribute them to the next highest suite and discard submissions with popcon versions that were never in Debian, or maybe attribute them to the relevant vendor separately. popcon submissions that don't have Debian as the vendor probably should be discarded, or maybe attribute them to the relevant vendor separately.
So the idea is to have a Popcon file for each suite ? So let say bookworm is released today. What bookworm/Popcon will contain ? We release a new popularity-contest package. What sid/Popcon will contain ? The package migrate to testing; What testing/Popcon will contain ? As I understand, the metadata for stable are only updated with point releases. Would that be the same for stable/Popcon ? I still do not quite see how this would work... We do not want to provide data generated from a very small subset of reports for accuracy and privacy reasons. The current all-popcon-result.gz/stable-popcon-result.gz split is middle ground between competing constraints. What not instead write a tool to download all-popcon-result.gz or stable-popcon-result.gz when needed, and cache them ? This can then be processed by a tool that makes suggestions. Cheers,