#1052863 krb5: FTBFS: dh_auto_test: error: cd build && make -j1 check "TESTSUITEFLAGS=-j1 --verbose" VERBOSE=1 returned exit code 2

#1052863#5
Date:
2023-09-26 13:18:48 UTC
From:
To:
Hi,

During a rebuild of all packages in sid, your package failed to build
on amd64.


Relevant part (hopefully):
The full build log is available from:
http://qa-logs.debian.net/2023/09/25/krb5_1.20.1-4_unstable.log

All bugs filed during this archive rebuild are listed at:
https://bugs.debian.org/cgi-bin/pkgreport.cgi?tag=ftbfs-20230925;users=lucas@debian.org
or:
https://udd.debian.org/bugs/?release=na&merged=ign&fnewerval=7&flastmodval=7&fusertag=only&fusertagtag=ftbfs-20230925&fusertaguser=lucas@debian.org&allbugs=1&cseverity=1&ctags=1&caffected=1#results

A list of current common problems and possible solutions is available at
http://wiki.debian.org/qa.debian.org/FTBFS . You're welcome to contribute!

If you reassign this bug to another package, please mark it as 'affects'-ing
this package. See https://www.debian.org/Bugs/server-control#affects

If you fail to reproduce this, please provide a build log and diff it with mine
so that we can identify if something relevant changed in the meantime.

#1052863#10
Date:
2023-09-26 14:18:43 UTC
From:
To:
control: severity -1 normal

    Lucas> Hi,

    Lucas> During a rebuild of all packages in sid, your package failed
    Lucas> to build on amd64.


    Lucas> Relevant part (hopefully):


So, according to the build log, the make check failed because it could
not contact a KDC on the local system.  The krb5 build is more sensitive
to the environment on which it is built thanks to 1017763.  In
particular, I believe it will require that getaddrinfo(gethostname())
work, an that address is a valid address for contacting the local host.
It will also require that a service can bind to that address and be
contacted from within the build.  In addition, it requires various
capabilities related to access to the keyring; I find that debspawn's
containers do not have sufficient capabilities to run make check.

It appears all these attributes are satisfied by the build hosts.
So, unless I'm violating some written policy somewhere, my claim is that
this is all good.
That said, I realize this is an area where things are underspecified,
and I'd be happy to engage with debian-policy or the TC to  further
refine what builds are allowed to do if you think that something krb5 is
doing is not reasonable.

I suspect this is a case where your build environment does not mirror
the buildds enough for the tests to succeed, but I'm leaving the bug
open for your input.

#1052863#17
Date:
2023-09-26 15:58:19 UTC
From:
To:
El 26/9/23 a las 16:18, Sam Hartman escribió:

This could be simply a race condition.

I've seen many packages to fail their tests randomly because of that.

In many cases the problem has been solved by adding a "sleep 1"
somewhere (for example, after the service is started and before
the tests try to use the service).

You mention a number of things in which the environment of Lucas
autobuilders "might" differ from the official buildds.

However, you seem to be automatically assuming that the failure
is due to some of those things you enumerated, i.e. a "difference
of build environments".

But such thing is not necessarily true.

My theory here is that this failure happens randomly,
and it may happen randomly in the buildds as well.

But the same way we can't tell that a coin is biased by
throwing it only once, we should not consider that a package
is ok just because it built ok once or twice in the buildds.
This is what it says:

If build-time dependencies are specified, it must be possible to build the package
and produce working binaries on a system with only essential and build-essential
packages installed and also those required to satisfy the build-time relationships
(including any implied relationships).

In other words: It has to work.

I don't think debian-policy or the TC need to be involved in this.

If the tests contain a race condition which could be avoided by
simply adding a sleep (or some other easy fix), then that's all that
we need to do.

The end user must be able to rebuild packages as well. Here, Lucas
is merely doing what any end user could do at home, of course
following certain standards.

We surely would not tell an end user who tries to build a package
without success "your system does not mirror buildd.debian,org
well enough". We would surely point at whatever the user is
doing wrong regarding existing policies. However, if the user
is not doing anything wrong, then it's definitely our fault.

Thanks.

#1052863#22
Date:
2023-09-26 16:55:22 UTC
From:
To:

    Santiago> This could be simply a race condition.

    Santiago> I've seen many packages to fail their tests randomly
    Santiago> because of that.

It could be a race, but given what I know of the tests, I doubt it is.
Take a look at util/k5test.py in _start_daemon
In particular, it waits for a particular string to be printed before
declaring the service running.


    Santiago> However, you seem to be automatically assuming that the
    Santiago> failure is due to some of those things you enumerated,
    Santiago> i.e. a "difference of build environments".

    Santiago> But such thing is not necessarily true.

Agreed.
Based on a long history with the package and the tests in question I am
making some assumptions.
You're absolutely right that these assumption may end up being
inaccurate.
Certainly if I do see failures on a buildd that suggest a race, I will
be very concerned.
If there is a race on the buildds, I think that will become clear over
time.
    >> So, unless I'm violating some written policy somewhere, my claim
    >> is that this is all good.

    Santiago> I think you forgot Policy 4.2, which is also "written
    Santiago> policy somewhere".  This is what it says:

I'm aware of policy 4.2.
I'm also aware that there has been some disagreement over the years
about what that means for things beyond packaged dependencies.
In particular, how that relates to available memory, cpu, etc.
And for example related to where builds can write, etc, etc.
And some of that has been resolved, and I suspect some issues have not.
It would not surprise me if there are some corner cases surrounding what
builds can do with regard to the network on the local system are
ambiguous.
We all agree that builds cannot access the internet, but beyond that I
think there is ambiguity.

But yes, if this ends up being a race, I will absolutely be interested
in fixing the race or disabling the tests.

#1052863#27
Date:
2023-10-26 07:38:13 UTC
From:
To:
Hi,

As an additional data point, I can still reproduce this failure.

Lucas

#1052863#32
Date:
2023-10-26 13:45:35 UTC
From:
To:
    Lucas> Hi,

    Lucas> As an additional data point, I can still reproduce this
    Lucas> failure.

So, my understanding is that so far for you it always fails, and the
evidence so far suggests that it generally (or always, but I am not sure
we have long enough evidence for that) succeeds on the builds.

What is your environment like?
Chroot? Container?  If any namespaces are involved, how do you build the
namespaces, and what  non-default capability settings do you have on top
of the defaults of containerization software you use.

#1052863#37
Date:
2023-10-26 17:27:02 UTC
From:
To:
Here is a more verbose output:
/tmp/krb5-1.20.1/build/lib/krb5/ccache# PYTHONPATH=/tmp/krb5-1.20.1/src/util /usr/bin/python3 ../../../../src/lib/krb5/ccache/t_cccol.py -v
*** [1] Executing: /tmp/krb5-1.20.1/build/clients/klist/klist -c KEYRING:process:abcd
klist: Credentials cache keyring 'process:abcd:abcd' not found
*** [1] Completed with return code 1
*** [2] Executing: ./t_cccol DIR:/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir/cc
*** [2] Completed with return code 0
*** [3] Executing: keyctl list @s
1 key in keyring:
862920460: --alswrv     0 65534 keyring: _uid.0
*** [3] Completed with return code 0
*** [4] Executing: keyctl list @s
1 key in keyring:
862920460: --alswrv     0 65534 keyring: _uid.0
*** [4] Completed with return code 0
*** [5] Executing: keyctl list @u
keyring is empty
*** [5] Completed with return code 0
*** [6] Executing: ./t_cccol KEYRING:/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
*** [6] Completed with return code 0
*** [7] Executing: keyctl list @s
2 keys in keyring:
862920460: --alswrv     0 65534 keyring: _uid.0
924062711: --alswrv     0     0 keyring: _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
*** [7] Completed with return code 0
*** [8] Executing: keyctl search @s keyring _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
924062711
*** [8] Completed with return code 0
*** [9] Executing: keyctl unlink 924062711 @s
*** [9] Completed with return code 0
*** [10] Executing: ./t_cccol KEYRING:legacy:/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
*** [10] Completed with return code 0
*** [11] Executing: keyctl list @s
2 keys in keyring:
862920460: --alswrv     0 65534 keyring: _uid.0
554948115: --alswrv     0     0 keyring: _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
*** [11] Completed with return code 0
*** [12] Executing: keyctl search @s keyring _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
554948115
*** [12] Completed with return code 0
*** [13] Executing: keyctl unlink 554948115 @s
*** [13] Completed with return code 0
*** [14] Executing: ./t_cccol KEYRING:session:/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
*** [14] Completed with return code 0
*** [15] Executing: keyctl list @s
2 keys in keyring:
862920460: --alswrv     0 65534 keyring: _uid.0
1042415883: --alswrv     0     0 keyring: _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
*** [15] Completed with return code 0
*** [16] Executing: keyctl search @s keyring _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
1042415883
*** [16] Completed with return code 0
*** [17] Executing: keyctl unlink 1042415883 @s
*** [17] Completed with return code 0
*** [18] Executing: ./t_cccol KEYRING:user:/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
*** [18] Completed with return code 0
*** [19] Executing: keyctl list @u
1 key in keyring:
 80739909: --alswrv     0     0 keyring: _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
*** [19] Completed with return code 0
*** [20] Executing: keyctl search @u keyring _krb_/tmp/krb5-1.20.1/build/lib/krb5/ccache/testdir
80739909
*** [20] Completed with return code 0
*** [21] Executing: keyctl unlink 80739909 @u
*** [21] Completed with return code 0
*** [22] Executing: ./t_cccol KEYRING:process:abcd
*** [22] Completed with return code 0
*** [23] Executing: ./t_cccol KEYRING:thread:abcd
*** [23] Completed with return code 0
*** [24] Executing: /tmp/krb5-1.20.1/build/kadmin/dbutil/kdb5_util create -s -P master
Initializing database '/var/lib/krb5kdc/principal' for realm 'KRBTEST.COM',
master key name 'K/M@KRBTEST.COM'
*** [24] Completed with return code 0
*** [25] Executing: /tmp/krb5-1.20.1/build/kadmin/cli/kadmin.local addprinc -pw user119606 user@KRBTEST.COM
*** [25] Completed with return code 0
*** [26] Executing: /tmp/krb5-1.20.1/build/kadmin/cli/kadmin.local addprinc -pw admin119606 user/admin@KRBTEST.COM
*** [26] Completed with return code 0
*** [27] Starting: /tmp/krb5-1.20.1/build/kdc/krb5kdc -n
krb5kdc: starting...
*** [27] Started with pid 119633
*** [28] Executing: /tmp/krb5-1.20.1/build/clients/kinit/kinit user@KRBTEST.COM
kinit: Cannot contact any KDC for realm 'KRBTEST.COM' while getting initial credentials
*** [28] Completed with return code 1
*** Failure: /tmp/krb5-1.20.1/build/clients/kinit/kinit failed with code 1.
*** Last command (#28): /tmp/krb5-1.20.1/build/clients/kinit/kinit user@KRBTEST.COM
*** Output of last command:
kinit: Cannot contact any KDC for realm 'KRBTEST.COM' while getting initial credentials
Use --debug=NUM to run a command under a debugger.  Use
--stop-after=NUM to stop after a daemon is started in order to
attach to it with a debugger.  Use --help to see other
options.

I can reproduce it locally on my laptop, with similar versions/setup
(bookworm, sbuild+schroot).

I cannot reproduce it on a Grid5000 node running Debian12 (outside of
sbuild/schroot). And I also cannot reproduce it on the same node, inside
sbuild/schroot.

Lucas

#1052863#42
Date:
2023-10-27 17:36:15 UTC
From:
To:
    Lucas> On 26/10/23 at 07:45 -0600, Sam Hartman wrote:
    >> >>>>> "Lucas" == Lucas Nussbaum <lucas@debian.org> writes:
    Lucas> Hi,
    >>
    Lucas> As an additional data point, I can still reproduce this
    Lucas> failure.
    >>
    >> So, my understanding is that so far for you it always fails, and
    >> the evidence so far suggests that it generally (or always, but I
    >> am not sure we have long enough evidence for that) succeeds on
    >> the builds.
    >>
    >> What is your environment like?  Chroot? Container?  If any
    >> namespaces are involved, how do you build the namespaces, and
    >> what non-default capability settings do you have on top of the
    >> defaults of containerization software you use.

    Lucas> It's a standard sbuild setup, on an AWS EC2 VM.

Yep, it's some sort of DNS issue.  A kind developer gave me access to a
similarly configured machine on which I can reproduce the problem.

Outside the chroot:

PING ip-10-84-234-64(ip-10-84-234-64 (fe80::816:edff:fe35:ded1%ens5)) 56
data bytes
64 bytes from ip-10-84-234-64 (fe80::816:edff:fe35:ded1%ens5):
icmp_seq=1 ttl=64 time=0.022 ms
64 bytes from ip-10-84-234-64 (fe80::816:edff:fe35:ded1%ens5):
icmp_seq=2 ttl=64 time=0.035 ms
^C
--- ip-10-84-234-64 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1010ms
rtt min/avg/max/mdev = 0.022/0.028/0.035/0.006 ms


(bookworm)hartmans@ip-10-84-234-64:~$ ping ip-10-84-234-64
ping: ip-10-84-234-64: Name or service not known

The test framework uses gethostname() to find the name to use to contact
the KDC.
kinit is trying to look up ip-10-84-234-64 and contact a KDC on that
address.
But because that name is not resolvable within the chroot, the test
fails.

The configuration difference between inside the chroot and outside the
chroot appears to be the use of systemd-resolved.
In particular libnss-resolve  is installed on the host.
Resolved can resolve ip-10-84-234-64,
but the EC2 nameserver referenced by resolv.conf in the chroot cannot
resolve that name without qualification.

I'm not entirely sure what the right fix is.  Krb5 tests really do want
a name that they can use to refer to the local machine (and it would be
significant work to get that name from a source other than
gethostname()).

Possibilities include:

* Use 127.0.0.53 inside the chroot and let systemd-resolved resolve the
  name.

* Find some way to detect this situation and bypass the tests.

* include the appropriate search domain in the chroot's resolv.conf so
  that EC2 can resolve the machine's name.

#1052863#47
Date:
2023-10-27 18:08:46 UTC
From:
To:
El 27/10/23 a las 19:36, Sam Hartman escribió:

Hi. The /etc/hosts file for the host machine was like this:

127.0.0.1	localhost
::1		localhost ip6-localhost ip6-loopback
ff02::1		ip6-allnodes
ff02::2		ip6-allrouters

and I always wondered what's the minimal file which
is ok to build packages.

In my own test rebuilds, the machines have a minimalistic
/etc/hosts file which at least assigns an IP to
the hostname:

127.0.0.1	localhost
192.168.122.101	thehostname

which I guess it helps to avoid problems like this.

So, I wonder if it would feasible to do something similar
in the mass-rebuilds done by Lucas.

inside the chroot makes the non-FQDN ping to work again.

Certainly, not because any service is started at all, as
they are disabled inside the chroot, as usual:

   All runlevel operations denied by policy

but maybe because of additional support files that allow
the chroot to communicate with the DNS in the host machine.

Would it make sense to add systemd-resolved as a build-dependency?

In either case, I don't quite understand the need to resolve the hostname
to obtain the IP to which the connection should take place in the
test suite, as connecting to the outside world inside a package build
is forbidden by policy. In the context of a test suite, should
127.0.0.1 not be enough, or it has to be done with a different IP?

Thanks.

#1052863#52
Date:
2023-10-29 20:14:11 UTC
From:
To:
Hi,

I changed my setup slightly to use a hostname that resolves, and can
confirm this does not fail anymore.

I'll let you decide whether you want to keep this bug open (with a lower
severity) or not. I'm fine with either.

Lucas

#1052863#57
Date:
2023-12-15 11:27:05 UTC
From:
To:
El 29/10/23 a las 21:14, Lucas Nussbaum escribió:

Hi. Since this build failure has been solved by having a "sane" /etc/hosts,
and debian-installer itself always tries to make sure that there is
a line in /etc/hosts for the hostname, I personally do not consider this
as a bug in krb5 anymore, so I would recommend Sam to close it without
doing anything else.

Thanks.