#1014431 popularity-contest: automatically create hostid if not specified in popularity-contest.conf

#1014431#5
Date:
2022-07-05 22:05:49 UTC
From:
To:
It would be nice if it was not required to have host-specific data in
the configuration file (MY_HOSTID).

If no MY_HOSTID is specified, popularity-contest could for example
generate an application-specific MY_HOSTID from the machine-id as
described for libsystemd's `sd_id128_get_machine_app_specific`
function (i.e., HMAC-SHA256 of a application ID for
popularity-contents keyed by the machine-id).

Ansgar

#1014431#10
Date:
2022-07-13 16:26:07 UTC
From:
To:
Hello Ansgar,

What is the life cycle of the machine-id ? How would that work while
fulfilling all the goal of MY_HOSTID ? What would be the advantage ?

Cheers,

#1014431#15
Date:
2022-07-14 16:29:55 UTC
From:
To:
From man:machine-id(5):

+---
| The machine ID is usually generated from a random source during
| system installation or first boot and stays constant for all
| subsequent boots. Optionally, for stateless systems, it is generated
| during runtime during early boot if necessary.
+---

That looks like it fulfills what I guess popcon needs.

(But please do not use the machine-id directly, but hashed as described
above.)

No host-specific configuration would be required for popcon. One can
just install an identical popularity-contest.conf on several machines
instead of having to deal with a host-specific setting (MY_HOSTID).

Ansgar

#1014431#20
Date:
2022-07-14 17:59:31 UTC
From:
To:
Almost. Is it possible to detect stateless system so that they do not report to
popcon ?

Cheers,

#1014431#25
Date:
2022-07-14 18:01:32 UTC
From:
To:
[Ansgar]

Is the use case mirrored installations, thin clients with identical
disks, cloud installations, or what?

#1014431#30
Date:
2022-07-16 08:18:11 UTC
From:
To:
Does popcon detect such systems currently?  I don't think anything
changes for such systems with the proposed change?

I'm not sure how one would reliably detect such systems; after all
depending on ones definition any VM in a cloud may be such a system
(depending on how it is used).

Ansgar

#1014431#35
Date:
2022-07-16 08:21:15 UTC
From:
To:
It's easier to handle with configuration management if one just has to
ship the same configuration file to all systems.

One could have logic editing an existing config file and taking care
not to change the MY_HOSTID or have a something generate the MY_HOSTID
value (e.g., have the configuration management compute a MY_HOSTID
derived from the machine-id), but having popcon do so itself seems like
a nicer solution.

Ansgar

#1014431#40
Date:
2022-07-16 13:00:48 UTC
From:
To:
Currently, if a system does not have a valid MY_HOSTID, it will not report to
the popcon server. If two systems have identical MY_HOSTID, they are considered as
one and the same by the popcon server, and only the last received report
is kept.

There are two issues we want to avoid:
1/ system images: a master image is generated and used for hundred of
hosts. All system will have the same package list and probably the same
usage pattern. The cost to popcon of receiving hundreds of identical
reports is far higher than the benefit it provides to the dataset, and
it biais the data toward the system images package selection.

2/ randomly changing MY_HOSTID: the time-to-live of a MY_HOSTID is 20
days. That means that if a system get a new random MY_HOSTID once a day,
it will be counted 20 times by the server.

Of course 1/ and 2/ can happen at the same time, if a single system
image is used to generate short living stateless systems, then it can
easily generate a thousand of identical reports.

There is nothing that prevent users to set up systems in a way that
flood popcon, but at least we should not make it the default behavior.

Cheers,

#1014431#45
Date:
2025-02-27 14:16:41 UTC
From:
To:
The paragraph you quote says 'optionally', so maybe it is possible to
know whether this option is in effect ?

Cheers,