#1059995 pdns: flaky autopkgtest (host dependent): pdns.service: Failed to set up IPC namespacing: Resource temporarily unavailable

#1059995#5
Date:
2024-01-04 13:42:59 UTC
From:
To:
Dear maintainer(s),

I looked at the results of the autopkgtest of your package. I noticed
that it regularly fails. The failures seem related on the host that runs
the test. ci-worker13 is a beefy machine [1] and test seem to fail
consistently there, while the other amd64 workers are much more moderate
[2] and tests pass there.

Because the unstable-to-testing migration software now blocks on
regressions in testing, flaky tests, i.e. tests that flip between
passing and failing without changes to the list of installed packages,
are causing people unrelated to your package to spend time on these
tests.

Don't hesitate to reach out if you need help and some more information
from our infrastructure.

Paul

[1] https://metal.equinix.com/product/servers/m3-large/
[2] https://aws.amazon.com/ec2/instance-types/m5/

https://ci.debian.net/packages/p/pdns/testing/amd64/

https://ci.debian.net/data/autopkgtest/testing/amd64/p/pdns/41325109/log.gz

268s + service pdns restart
269s Job for pdns.service failed because the control process exited with
error code.
269s See "systemctl status pdns.service" and "journalctl -xeu
pdns.service" for details.
269s + journalctl _SYSTEMD_UNIT=pdns.service -n 10 --no-pager
269s Dec 25 16:13:20 ci-359-77591125 (s_server)[3766]: pdns.service:
Failed to set up IPC namespacing: Resource temporarily unavailable
269s Dec 25 16:13:20 ci-359-77591125 (s_server)[3766]: pdns.service:
Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource
temporarily unavailable
269s Dec 25 16:13:21 ci-359-77591125 (s_server)[3852]: pdns.service:
Failed to set up IPC namespacing: Resource temporarily unavailable
269s Dec 25 16:13:21 ci-359-77591125 (s_server)[3852]: pdns.service:
Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource
temporarily unavailable
269s Dec 25 16:13:23 ci-359-77591125 (s_server)[3876]: pdns.service:
Failed to set up IPC namespacing: Resource temporarily unavailable
269s Dec 25 16:13:23 ci-359-77591125 (s_server)[3876]: pdns.service:
Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource
temporarily unavailable
269s Dec 25 16:13:24 ci-359-77591125 (s_server)[3886]: pdns.service:
Failed to set up IPC namespacing: Resource temporarily unavailable
269s Dec 25 16:13:24 ci-359-77591125 (s_server)[3886]: pdns.service:
Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource
temporarily unavailable
269s Dec 25 16:13:25 ci-359-77591125 (s_server)[3915]: pdns.service:
Failed to set up IPC namespacing: Resource temporarily unavailable
269s Dec 25 16:13:25 ci-359-77591125 (s_server)[3915]: pdns.service:
Failed at step NAMESPACE spawning /usr/sbin/pdns_server: Resource
temporarily unavailable
269s ++ mktemp
269s + TMPFILE=/tmp/tmp.jah1Y5TJIa
269s + trap cleanup EXIT
269s + tee /tmp/tmp.jah1Y5TJIa
269s + sdig 127.0.0.1 53 smoke.pgsql.example.org A
279s Fatal: Timeout waiting for data
279s + grep -c '127\.0\.0\.222' /tmp/tmp.jah1Y5TJIa
279s 0
279s + echo smoke.pgsql.example.org could not be resolved
279s smoke.pgsql.example.org could not be resolved
279s + exit 1
279s + cleanup

#1059995#10
Date:
2024-01-04 14:08:58 UTC
From:
To:
It would seem that the host runs out of IPC space?
Does it run more tests in parallel than other workers, or so?

I wouldn't know what to do about this, its not really under the
control of src:pdns.

Chris

#1059995#15
Date:
2024-01-04 14:37:21 UTC
From:
To:
Hi,

What is IPC space? And when does a host run out of it? As I said, this
is one of our most powerful hosts, so I would expect it to run out of
things last.

Yes, this host (like most of our host, but a bit more) runs multiple lxc
based debci workers.

Well, maybe check for it and fail gracefully? Or, since a couple of
days, if qemu VM don't run out of IPC space, we could run them in qemu
always.

Paul

#1059995#20
Date:
2024-01-04 16:28:00 UTC
From:
To:
https://manpages.debian.org/bookworm/manpages/sysvipc.7.en.html
https://manpages.debian.org/bookworm/manpages/ipc_namespaces.7.en.html
anything special, the limits are probably shared with the whole
host.
kernel.shmmax, kernel.msgmax are I think the limits (but I'm not
entirely sure).

But how? systemd sets up the IPC namespace.

I imagine a fully separated VM would not run out of IPC space,
indeed.

Chris

#1059995#25
Date:
2024-01-04 17:14:49 UTC
From:
To:
Hi,

Can you figure out decent numbers for these? Below I printed the output
of lsipc and AFAICT SHMMAX is already pretty big ;) (and the same on all
our hosts, which is also true for MSGMAX).

On the other hand, $(ipcs -a) doesn't show anything on the host, not
even if I let it run in a while-loop (1 second interval) while I
schedule the test of pdns. So, could this be a bug in systemd (which you
claim below should be handeling this) or is this just not really
supported in lxc and do you need a full VM. Because it works elsewhere,
I feel more like a bug, and it would not be the first instance where
code fails to properly handle 64 cores or 256GB or RAM.

exit with 77 when you detect problems and add the skippable restriction.

I just ran the test in qemu on ci-worker13 and it PASSed.

Paul

root@ci-worker13:~# lsipc
RESOURCE DESCRIPTION                                              LIMIT
USED  USE%
MSGMNI   Number of message queues                                 32000
   0 0.00%
MSGMAX   Max size of message (bytes)                                 8K
   -     -
MSGMNB   Default max size of queue (bytes)                          16K
   -     -
SHMMNI   Shared memory segments                                    4096
   0 0.00%
SHMALL   Shared memory pages                       18446744073692774399
   0 0.00%
SHMMAX   Max size of shared memory segment (bytes)                  16E
   -     -
SHMMIN   Min size of shared memory segment (bytes)                   1B
   -     -
SEMMNI   Number of semaphore identifiers                          32000
   0 0.00%
SEMMNS   Total number of semaphores                          1024000000
   0 0.00%
SEMMSL   Max semaphores per semaphore set.                        32000
   -     -
SEMOPM   Max number of operations per semop(2)                      500
   -     -
SEMVMX   Semaphore max value                                      32767
   -     -

#1059995#30
Date:
2024-01-12 11:36:04 UTC
From:
To:
Hi,

can you confirm two additional things please:

1) this happens only on the large host?

2) this does not or does happen with other packages also requesting
the same settings from systemd, e.g. dnsdist or pdns-recursor?

Chris

#1059995#35
Date:
2024-01-12 19:02:53 UTC
From:
To:
Hi,

https://ci.debian.net/packages/p/pdns/testing/s390x/41650331/

Seems it happens on our s390x host too (which has 10 debci workers
running in parallel).

https://ci.debian.net/packages/d/dnsdist/ -> Page not found.

pdns-recursor seems to be flaky as well on amd64 and all passing tests
were on one of the smaller hosts. pdns-recursor passes on s390x though.

Paul

#1059995#40
Date:
2024-01-21 16:00:05 UTC
From:
To:
For now I've added the exit 77 hack in the pdns tests, but this is
quite unsatisfying.

I've opened an issue with systemd upstream, maybe someone there has
any insight: https://github.com/systemd/systemd/issues/31037

Chris

#1059995#45
Date:
2024-01-21 15:58:22 UTC
From:
To:
We believe that the bug you reported is fixed in the latest version of
pdns, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 1059995@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Chris Hofstaedtler <zeha@debian.org> (supplier of updated pdns package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Sun, 21 Jan 2024 12:11:54 +0100
Source: pdns
Architecture: source
Version: 4.8.3-3
Distribution: unstable
Urgency: medium
Maintainer: pdns packagers <pdns@packages.debian.org>
Changed-By: Chris Hofstaedtler <zeha@debian.org>
Closes: 1059995
Changes:
 pdns (4.8.3-3) unstable; urgency=medium
 .
   * tests: Abort if IPC namespaces do not work (Closes: #1059995)
Checksums-Sha1:
 8c2bccfdaa7d5cd7df2f56f350d4f227bba3a10a 3628 pdns_4.8.3-3.dsc
 d147b0d0266ef6d023bf57bf76b078a0244ebfba 46680 pdns_4.8.3-3.debian.tar.xz
 981ede89dd308b2365578212e54a66e8e869b6a0 23977 pdns_4.8.3-3_arm64.buildinfo
Checksums-Sha256:
 cef2b8f66c6e1d11c8c37c71b37fdd771289f163da38aefc9fa40a452b83054b 3628 pdns_4.8.3-3.dsc
 d8c886849592c63333edea3862a2ae1822b1d48d3c16df95be6882494b1b3ee9 46680 pdns_4.8.3-3.debian.tar.xz
 fb8465f38df8c52a8296e3df14f8d3baa01224cb275b1a555c211650bae06bad 23977 pdns_4.8.3-3_arm64.buildinfo
Files:
 e2d3325bc3c02f459a4c4a34c5c3eace 3628 net optional pdns_4.8.3-3.dsc
 236d73448237fe0b97442acbd08a9212 46680 net optional pdns_4.8.3-3.debian.tar.xz
 09d4ef90c99aa450830547b3c99b0c89 23977 net optional pdns_4.8.3-3_arm64.buildinfo
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEfRrP+tnggGycTNOSXBPW25MFLgMFAmWtO8QACgkQXBPW25MF
LgPSShAAhyMRtpttafOU9s0UUd5hLrPA3+OiUAGOlW3Moagcp2C8xJ66d6AYZm+0
hUjI7AKnbSxz/2fKiWCJ+rYKdrYUOJJYgbR5UsC0pL+cl3g3vfrQYlzxzDyALVYb
N5taaUIExakNWkDjIEpGe5xQo3jma1sojOODPDRutGIQm9A7GMiSNwddkJhdvI9+
DlNiuzLWJMrUamv7Kw2396vYw+omcyk1TufpOsGHb6ki0K8ZCgjmysayArnLywzX
JCsqqFjUBystVQP8+MhcPDBVMJtz+I0aUnH0Jtg3/Sdq4+w21A+iwoOahtWOE4Iv
Bq6QPIi3xHwWojYQXBdDWDe0UlwTljwQhTNPnEAsym2YYck1296YAX8vrGS6wJuW
sTvhSAO1UgJPC5fL6X+DWDoWZ1R9heVnP1k9y3f2eTF/+BljWWMKtAy0ESRgsFfd
3yhYPC9GF3HlVcDXleZ/2ZeOX1YQW8u/hTqCEjTdyrakAXROH+KEzY6A9FAnBDuu
5VWJ/YDSunQbsk9Pfgn+APJjwqfmOtZUuWEZFTILlFfxRqmRZTdAzsxaI/g8P0p8
yu/lZxRPLBFCY6wprWdmgbs7TKMU3hO/Lt8rUD7FMRvj2dJF/2epHLc6oO1/5ujd
LbR7IVsBSONg2UXdItBc45Iz2WCdROqB81KiQgErUauyPb4RO1Q=
=LRsk
-----END PGP SIGNATURE-----

#1059995#50
Date:
2024-01-26 09:21:19 UTC
From:
To:
Hi Paul,

* Paul Gevers <elbrus@debian.org> [240104 18:14]:

Likely, but it is probably in systemd or in lxc or in apparmor or
elsewhere.

I see this "works", but now the tests fail after one try on the
problematic worker and then are never retried. Can this please be
fixed?

Thanks,
Chris

#1059995#55
Date:
2024-01-26 10:31:37 UTC
From:
To:
clone 1059995 -1
reopen -1
reassign -1 systemd
found -1 systemd/254.3-1
forwarded -1 https://github.com/systemd/systemd/issues/31037
thanks

Dear systemd Packagers,

Paul Gevers noted that src:pdns's autopkgtests fail every so often
on a large amd64 debci worker and on s390x workers. Apparently a
similar problem can be seen in src:pdns-recursor's debci runs.

As there is no pdns(-recursor) code running at this point, this
seems to be a problem somewhere in the space of systemd <> lxc <>
apparmor <> kernel.

I've opened a bug with systemd upstream, unfortunately with very
little info as I don't know how to provide additional info from
within a debci run. Help with providing additional info would be
very welcome.

Thanks,
Chris

#1059995#62
Date:
2024-01-26 21:25:37 UTC
From:
To:
Hi zeha,

What do you have in mind? I think you need to wait until issue 166 [1]
is fixed, which I guess isn't going to happen soon.

Paul

[1] https://salsa.debian.org/ci-team/debci/-/issues/166

#1059995#67
Date:
2024-01-30 07:04:58 UTC
From:
To:
* Paul Gevers <elbrus@debian.org> [240126 22:25]:

166 seems like an option, or auto-retry on a different worker, if
thats possible?

Chris

#1059995#72
Date:
2024-02-25 20:05:07 UTC
From:
To:
Hi,

The issue (or at least some issue) seems to be kernel related. Due to
issues with the backports kernel on arm64, we had to revert to the
bookworm kernel and now pdns fails on arm64 too. On ppc64el and riscv64
the test passes for the last two months, both run a newer kernel
(backports or even sid). However, s390x also runs a backports kernel and
the issue still exists there.

Paul
By the way, if you want to use "exit 77" when conditions are not met,
you also need to set the skippable restriction on those tests, otherwise
the exit code is used like any other.

#1059995#77
Date:
2024-02-26 09:39:04 UTC
From:
To:
We believe that the bug you reported is fixed in the latest version of
pdns, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 1059995@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Chris Hofstaedtler <zeha@debian.org> (supplier of updated pdns package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Mon, 26 Feb 2024 02:41:58 +0100
Source: pdns
Architecture: source
Version: 4.8.3-4
Distribution: unstable
Urgency: medium
Maintainer: pdns packagers <pdns@packages.debian.org>
Changed-By: Chris Hofstaedtler <zeha@debian.org>
Closes: 1059995
Changes:
 pdns (4.8.3-4) unstable; urgency=medium
 .
   * tests: mark tests as skippable for exit code 77 (Closes: #1059995)
Checksums-Sha1:
 e49ae0e2bd3ca08ce5ee23bc6a49237d56da608d 3628 pdns_4.8.3-4.dsc
 2ff861b5ff0f7739dbde4034599d6ea86a7d0a83 46692 pdns_4.8.3-4.debian.tar.xz
 6319296b7bcf4d9b4ee2e91057e305d6bebc3111 24501 pdns_4.8.3-4_arm64.buildinfo
Checksums-Sha256:
 fa328a3df9e85c2069b4d3b1ec39ac5cce1369ed284aa9067b73efdb108d8ed8 3628 pdns_4.8.3-4.dsc
 e7da8f9266178d78ffcf73c88f996ff5d2afcccefbb13bf7ad27356ffe3b9b31 46692 pdns_4.8.3-4.debian.tar.xz
 4ece8c33f3a8f86d8f87143bf21ab70c7a28e7d3680d702e838fb391d87b4c76 24501 pdns_4.8.3-4_arm64.buildinfo
Files:
 6fb6d41166e6081b634b8a127649de22 3628 net optional pdns_4.8.3-4.dsc
 1dea1ba3ff90f0d0f70e8c6ad77e6300 46692 net optional pdns_4.8.3-4.debian.tar.xz
 090c7f71e74e251f4ee06072169acf46 24501 net optional pdns_4.8.3-4_arm64.buildinfo
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEEfRrP+tnggGycTNOSXBPW25MFLgMFAmXb7qkACgkQXBPW25MF
LgParBAAi6V3izCpDEOcse3KwI2mAya6ZSLOK16+5vSuffk8KYXKhOnNi9osc5eM
QAUw8F/CvVzLG+SUeUaLUgZTb/mHo1Is/1WoEKMYKVNKUvn//Ze5W32rcUjMrC35
EYAUNWfdEVyrBnP8b08KzEYCagA+Mo816iW02fymEpaGXyYact3YywuEYBWvtOnP
BZ7YZ9WZqACEm+C3Tn4LT4EvdVjCG72iw7h8RXWRwiPY+x7+5qbUnYJ/1lA7cao1
qMtYdJre2miYfSyAkSkuL16dVsGXcyOErYDFbmJvoOppEoUHgM7iZA2Vob2jLA29
hNU4MO9+M9dwZ6TT8FYkEJjf/CsgjxZF9ePMczL9dD6undDLT6idKY+7yaQ9eQiV
cQCWgHGOA4TC6qikqKp138o5+hX4tCR/2IRNPkPCPkl79DIHTiikBLyyLiWy4oQj
gvXzoFArhSf/NTWEK9zdy59Q9T+fRZavvrR+hmMvkUfIoadnUfEP86zPEiLmltcL
NCt8hVBs0D518xdSk/M0WF86SOrdndowQZFpHWmVd4OEMZnoYqFcpPk/uvLU8Sdl
ii/tI5ywtqY5k3MjTyLd9OSBVvzWp1S+fDj++zxt4TBSgtM4tQbAzk/ozWSyGSw8
F5rE+a90NnMqxT7/yluBu2z0cJRHMU2i+y03XGrNr7hnufct0XE=
=/c4f
-----END PGP SIGNATURE-----