#1014431 popularity-contest: automatically create hostid if not specified in popularity-contest.conf #1014431
- Package:
- popularity-contest
- Source:
- popularity-contest
- Submitter:
- Ansgar
- Date:
- 2025-02-27 14:21:02 UTC
- Severity:
- wishlist
It would be nice if it was not required to have host-specific data in the configuration file (MY_HOSTID). If no MY_HOSTID is specified, popularity-contest could for example generate an application-specific MY_HOSTID from the machine-id as described for libsystemd's `sd_id128_get_machine_app_specific` function (i.e., HMAC-SHA256 of a application ID for popularity-contents keyed by the machine-id). Ansgar
Hello Ansgar, What is the life cycle of the machine-id ? How would that work while fulfilling all the goal of MY_HOSTID ? What would be the advantage ? Cheers,
From man:machine-id(5): +--- | The machine ID is usually generated from a random source during | system installation or first boot and stays constant for all | subsequent boots. Optionally, for stateless systems, it is generated | during runtime during early boot if necessary. +--- That looks like it fulfills what I guess popcon needs. (But please do not use the machine-id directly, but hashed as described above.) No host-specific configuration would be required for popcon. One can just install an identical popularity-contest.conf on several machines instead of having to deal with a host-specific setting (MY_HOSTID). Ansgar
Almost. Is it possible to detect stateless system so that they do not report to popcon ? Cheers,
[Ansgar] Is the use case mirrored installations, thin clients with identical disks, cloud installations, or what?
Does popcon detect such systems currently? I don't think anything changes for such systems with the proposed change? I'm not sure how one would reliably detect such systems; after all depending on ones definition any VM in a cloud may be such a system (depending on how it is used). Ansgar
It's easier to handle with configuration management if one just has to ship the same configuration file to all systems. One could have logic editing an existing config file and taking care not to change the MY_HOSTID or have a something generate the MY_HOSTID value (e.g., have the configuration management compute a MY_HOSTID derived from the machine-id), but having popcon do so itself seems like a nicer solution. Ansgar
Currently, if a system does not have a valid MY_HOSTID, it will not report to the popcon server. If two systems have identical MY_HOSTID, they are considered as one and the same by the popcon server, and only the last received report is kept. There are two issues we want to avoid: 1/ system images: a master image is generated and used for hundred of hosts. All system will have the same package list and probably the same usage pattern. The cost to popcon of receiving hundreds of identical reports is far higher than the benefit it provides to the dataset, and it biais the data toward the system images package selection. 2/ randomly changing MY_HOSTID: the time-to-live of a MY_HOSTID is 20 days. That means that if a system get a new random MY_HOSTID once a day, it will be counted 20 times by the server. Of course 1/ and 2/ can happen at the same time, if a single system image is used to generate short living stateless systems, then it can easily generate a thousand of identical reports. There is nothing that prevent users to set up systems in a way that flood popcon, but at least we should not make it the default behavior. Cheers,
The paragraph you quote says 'optionally', so maybe it is possible to know whether this option is in effect ? Cheers,