Hi, During a rebuild of all packages in sid, your package failed to build on amd64. Relevant part (hopefully): The full build log is available from: http://qa-logs.debian.net/2023/09/25/krb5_1.20.1-4_unstable.log All bugs filed during this archive rebuild are listed at: https://bugs.debian.org/cgi-bin/pkgreport.cgi?tag=ftbfs-20230925;users=lucas@debian.org or: https://udd.debian.org/bugs/?release=na&merged=ign&fnewerval=7&flastmodval=7&fusertag=only&fusertagtag=ftbfs-20230925&fusertaguser=lucas@debian.org&allbugs=1&cseverity=1&ctags=1&caffected=1#results A list of current common problems and possible solutions is available at http://wiki.debian.org/qa.debian.org/FTBFS . You're welcome to contribute! If you reassign this bug to another package, please mark it as 'affects'-ing this package. See https://www.debian.org/Bugs/server-control#affects If you fail to reproduce this, please provide a build log and diff it with mine so that we can identify if something relevant changed in the meantime.
control: severity -1 normal
Lucas> Hi,
Lucas> During a rebuild of all packages in sid, your package failed
Lucas> to build on amd64.
Lucas> Relevant part (hopefully):
So, according to the build log, the make check failed because it could
not contact a KDC on the local system. The krb5 build is more sensitive
to the environment on which it is built thanks to 1017763. In
particular, I believe it will require that getaddrinfo(gethostname())
work, an that address is a valid address for contacting the local host.
It will also require that a service can bind to that address and be
contacted from within the build. In addition, it requires various
capabilities related to access to the keyring; I find that debspawn's
containers do not have sufficient capabilities to run make check.
It appears all these attributes are satisfied by the build hosts.
So, unless I'm violating some written policy somewhere, my claim is that
this is all good.
That said, I realize this is an area where things are underspecified,
and I'd be happy to engage with debian-policy or the TC to further
refine what builds are allowed to do if you think that something krb5 is
doing is not reasonable.
I suspect this is a case where your build environment does not mirror
the buildds enough for the tests to succeed, but I'm leaving the bug
open for your input.
El 26/9/23 a las 16:18, Sam Hartman escribió: This could be simply a race condition. I've seen many packages to fail their tests randomly because of that. In many cases the problem has been solved by adding a "sleep 1" somewhere (for example, after the service is started and before the tests try to use the service). You mention a number of things in which the environment of Lucas autobuilders "might" differ from the official buildds. However, you seem to be automatically assuming that the failure is due to some of those things you enumerated, i.e. a "difference of build environments". But such thing is not necessarily true. My theory here is that this failure happens randomly, and it may happen randomly in the buildds as well. But the same way we can't tell that a coin is biased by throwing it only once, we should not consider that a package is ok just because it built ok once or twice in the buildds. This is what it says: If build-time dependencies are specified, it must be possible to build the package and produce working binaries on a system with only essential and build-essential packages installed and also those required to satisfy the build-time relationships (including any implied relationships). In other words: It has to work. I don't think debian-policy or the TC need to be involved in this. If the tests contain a race condition which could be avoided by simply adding a sleep (or some other easy fix), then that's all that we need to do. The end user must be able to rebuild packages as well. Here, Lucas is merely doing what any end user could do at home, of course following certain standards. We surely would not tell an end user who tries to build a package without success "your system does not mirror buildd.debian,org well enough". We would surely point at whatever the user is doing wrong regarding existing policies. However, if the user is not doing anything wrong, then it's definitely our fault. Thanks.
Santiago> This could be simply a race condition.
Santiago> I've seen many packages to fail their tests randomly
Santiago> because of that.
It could be a race, but given what I know of the tests, I doubt it is.
Take a look at util/k5test.py in _start_daemon
In particular, it waits for a particular string to be printed before
declaring the service running.
Santiago> However, you seem to be automatically assuming that the
Santiago> failure is due to some of those things you enumerated,
Santiago> i.e. a "difference of build environments".
Santiago> But such thing is not necessarily true.
Agreed.
Based on a long history with the package and the tests in question I am
making some assumptions.
You're absolutely right that these assumption may end up being
inaccurate.
Certainly if I do see failures on a buildd that suggest a race, I will
be very concerned.
If there is a race on the buildds, I think that will become clear over
time.
>> So, unless I'm violating some written policy somewhere, my claim
>> is that this is all good.
Santiago> I think you forgot Policy 4.2, which is also "written
Santiago> policy somewhere". This is what it says:
I'm aware of policy 4.2.
I'm also aware that there has been some disagreement over the years
about what that means for things beyond packaged dependencies.
In particular, how that relates to available memory, cpu, etc.
And for example related to where builds can write, etc, etc.
And some of that has been resolved, and I suspect some issues have not.
It would not surprise me if there are some corner cases surrounding what
builds can do with regard to the network on the local system are
ambiguous.
We all agree that builds cannot access the internet, but beyond that I
think there is ambiguity.
But yes, if this ends up being a race, I will absolutely be interested
in fixing the race or disabling the tests.
Hi, As an additional data point, I can still reproduce this failure. Lucas
Lucas> Hi,
Lucas> As an additional data point, I can still reproduce this
Lucas> failure.
So, my understanding is that so far for you it always fails, and the
evidence so far suggests that it generally (or always, but I am not sure
we have long enough evidence for that) succeeds on the builds.
What is your environment like?
Chroot? Container? If any namespaces are involved, how do you build the
namespaces, and what non-default capability settings do you have on top
of the defaults of containerization software you use.
Here is a more verbose output: /tmp/krb5-1.20.1/build/lib/krb5/ccache# PYTHONPATH=/tmp/krb5-1.20.1/src/util /usr/bin/python3 ../../../../src/lib/krb5/ccache/t_cccol.py -v *** [1] Executing: /tmp/krb5-1.20.1/build/clients/klist/klist -c KEYRING:process:abcd klist: Credentials cache keyring 'process:abcd:abcd' not found *** [1] Completed with return code 1 *** [2] Executing: ./t_cccol DIR:/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir/cc *** [2] Completed with return code 0 *** [3] Executing: keyctl list @s 1 key in keyring: 862920460: --alswrv 0 65534 keyring: _uid.0 *** [3] Completed with return code 0 *** [4] Executing: keyctl list @s 1 key in keyring: 862920460: --alswrv 0 65534 keyring: _uid.0 *** [4] Completed with return code 0 *** [5] Executing: keyctl list @u keyring is empty *** [5] Completed with return code 0 *** [6] Executing: ./t_cccol KEYRING:/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir *** [6] Completed with return code 0 *** [7] Executing: keyctl list @s 2 keys in keyring: 862920460: --alswrv 0 65534 keyring: _uid.0 924062711: --alswrv 0 0 keyring: _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir *** [7] Completed with return code 0 *** [8] Executing: keyctl search @s keyring _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir 924062711 *** [8] Completed with return code 0 *** [9] Executing: keyctl unlink 924062711 @s *** [9] Completed with return code 0 *** [10] Executing: ./t_cccol KEYRING:legacy:/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir *** [10] Completed with return code 0 *** [11] Executing: keyctl list @s 2 keys in keyring: 862920460: --alswrv 0 65534 keyring: _uid.0 554948115: --alswrv 0 0 keyring: _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir *** [11] Completed with return code 0 *** [12] Executing: keyctl search @s keyring _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir 554948115 *** [12] Completed with return code 0 *** [13] Executing: keyctl unlink 554948115 @s *** [13] Completed with return code 0 *** [14] Executing: ./t_cccol KEYRING:session:/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir *** [14] Completed with return code 0 *** [15] Executing: keyctl list @s 2 keys in keyring: 862920460: --alswrv 0 65534 keyring: _uid.0 1042415883: --alswrv 0 0 keyring: _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir *** [15] Completed with return code 0 *** [16] Executing: keyctl search @s keyring _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir 1042415883 *** [16] Completed with return code 0 *** [17] Executing: keyctl unlink 1042415883 @s *** [17] Completed with return code 0 *** [18] Executing: ./t_cccol KEYRING:user:/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir *** [18] Completed with return code 0 *** [19] Executing: keyctl list @u 1 key in keyring: 80739909: --alswrv 0 0 keyring: _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir *** [19] Completed with return code 0 *** [20] Executing: keyctl search @u keyring _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir 80739909 *** [20] Completed with return code 0 *** [21] Executing: keyctl unlink 80739909 @u *** [21] Completed with return code 0 *** [22] Executing: ./t_cccol KEYRING:process:abcd *** [22] Completed with return code 0 *** [23] Executing: ./t_cccol KEYRING:thread:abcd *** [23] Completed with return code 0 *** [24] Executing: /tmp/krb5-1.20.1/build/kadmin/dbutil/kdb5_util create -s -P master Initializing database '/var/lib/krb5kdc/principal' for realm 'KRBTEST.COM', master key name 'K/M@KRBTEST.COM' *** [24] Completed with return code 0 *** [25] Executing: /tmp/krb5-1.20.1/build/kadmin/cli/kadmin.local addprinc -pw user119606 user@KRBTEST.COM *** [25] Completed with return code 0 *** [26] Executing: /tmp/krb5-1.20.1/build/kadmin/cli/kadmin.local addprinc -pw admin119606 user/admin@KRBTEST.COM *** [26] Completed with return code 0 *** [27] Starting: /tmp/krb5-1.20.1/build/kdc/krb5kdc -n krb5kdc: starting... *** [27] Started with pid 119633 *** [28] Executing: /tmp/krb5-1.20.1/build/clients/kinit/kinit user@KRBTEST.COM kinit: Cannot contact any KDC for realm 'KRBTEST.COM' while getting initial credentials *** [28] Completed with return code 1 *** Failure: /tmp/krb5-1.20.1/build/clients/kinit/kinit failed with code 1. *** Last command (#28): /tmp/krb5-1.20.1/build/clients/kinit/kinit user@KRBTEST.COM *** Output of last command: kinit: Cannot contact any KDC for realm 'KRBTEST.COM' while getting initial credentials Use --debug=NUM to run a command under a debugger. Use --stop-after=NUM to stop after a daemon is started in order to attach to it with a debugger. Use --help to see other options. I can reproduce it locally on my laptop, with similar versions/setup (bookworm, sbuild+schroot). I cannot reproduce it on a Grid5000 node running Debian12 (outside of sbuild/schroot). And I also cannot reproduce it on the same node, inside sbuild/schroot. Lucas
Lucas> On 26/10/23 at 07:45 -0600, Sam Hartman wrote:
>> >>>>> "Lucas" == Lucas Nussbaum <lucas@debian.org> writes:
Lucas> Hi,
>>
Lucas> As an additional data point, I can still reproduce this
Lucas> failure.
>>
>> So, my understanding is that so far for you it always fails, and
>> the evidence so far suggests that it generally (or always, but I
>> am not sure we have long enough evidence for that) succeeds on
>> the builds.
>>
>> What is your environment like? Chroot? Container? If any
>> namespaces are involved, how do you build the namespaces, and
>> what non-default capability settings do you have on top of the
>> defaults of containerization software you use.
Lucas> It's a standard sbuild setup, on an AWS EC2 VM.
Yep, it's some sort of DNS issue. A kind developer gave me access to a
similarly configured machine on which I can reproduce the problem.
Outside the chroot:
PING ip-10-84-234-64(ip-10-84-234-64 (fe80::816:edff:fe35:ded1%ens5)) 56
data bytes
64 bytes from ip-10-84-234-64 (fe80::816:edff:fe35:ded1%ens5):
icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from ip-10-84-234-64 (fe80::816:edff:fe35:ded1%ens5):
icmp_seq=2 ttl=64 time=0.035 ms
^C
--- ip-10-84-234-64 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1010ms
rtt min/avg/max/mdev = 0.022/0.028/0.035/0.006 ms
(bookworm)hartmans@ip-10-84-234-64:~$ ping ip-10-84-234-64
ping: ip-10-84-234-64: Name or service not known
The test framework uses gethostname() to find the name to use to contact
the KDC.
kinit is trying to look up ip-10-84-234-64 and contact a KDC on that
address.
But because that name is not resolvable within the chroot, the test
fails.
The configuration difference between inside the chroot and outside the
chroot appears to be the use of systemd-resolved.
In particular libnss-resolve is installed on the host.
Resolved can resolve ip-10-84-234-64,
but the EC2 nameserver referenced by resolv.conf in the chroot cannot
resolve that name without qualification.
I'm not entirely sure what the right fix is. Krb5 tests really do want
a name that they can use to refer to the local machine (and it would be
significant work to get that name from a source other than
gethostname()).
Possibilities include:
* Use 127.0.0.53 inside the chroot and let systemd-resolved resolve the
name.
* Find some way to detect this situation and bypass the tests.
* include the appropriate search domain in the chroot's resolv.conf so
that EC2 can resolve the machine's name.
El 27/10/23 a las 19:36, Sam Hartman escribió: Hi. The /etc/hosts file for the host machine was like this: 127.0.0.1 localhost ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters and I always wondered what's the minimal file which is ok to build packages. In my own test rebuilds, the machines have a minimalistic /etc/hosts file which at least assigns an IP to the hostname: 127.0.0.1 localhost 192.168.122.101 thehostname which I guess it helps to avoid problems like this. So, I wonder if it would feasible to do something similar in the mass-rebuilds done by Lucas. inside the chroot makes the non-FQDN ping to work again. Certainly, not because any service is started at all, as they are disabled inside the chroot, as usual: All runlevel operations denied by policy but maybe because of additional support files that allow the chroot to communicate with the DNS in the host machine. Would it make sense to add systemd-resolved as a build-dependency? In either case, I don't quite understand the need to resolve the hostname to obtain the IP to which the connection should take place in the test suite, as connecting to the outside world inside a package build is forbidden by policy. In the context of a test suite, should 127.0.0.1 not be enough, or it has to be done with a different IP? Thanks.
Hi, I changed my setup slightly to use a hostname that resolves, and can confirm this does not fail anymore. I'll let you decide whether you want to keep this bug open (with a lower severity) or not. I'm fine with either. Lucas
El 29/10/23 a las 21:14, Lucas Nussbaum escribió: Hi. Since this build failure has been solved by having a "sane" /etc/hosts, and debian-installer itself always tries to make sure that there is a line in /etc/hosts for the hostname, I personally do not consider this as a bug in krb5 anymore, so I would recommend Sam to close it without doing anything else. Thanks.