#1131809 dracut: ppc64el autopkgtest are flaky and take 7 hours per run

#1131809#5
Date:
2026-03-24 21:51:53 UTC
From:
To:
Dear Maintainer(s),

The dracut autopkgtests on ppc64el are flaky, and many runs fail:

https://ci.debian.net/packages/d/dracut/testing/ppc64el/

What's worse is that each run is ~7 hours, so it takes a full day just
for a couple retries. This blocks other packages from migrating.

If this is not possible to fix, please consider disabling autopkgtest
on ppc64el, or marking the test suite as flaky. Thanks.

#1131809#10
Date:
2026-03-27 22:04:22 UTC
From:
To:
Hi,

I spent a lot of time to address the flakyness of the dracut tests
(races, timeouts, etc). The flakyness should be fixed in dracut 110-7
(there is one more timeout fix for test 41 in 110-8).

There are three test failures of 110-7:

1) 72-nbd: kernel BUG at arch/powerpc/include/asm/interrupt.h:355
2) 70-iscsi: I/O error, dev sdb, sector 69122 op 0x0:(READ)
3) 71-iscsi-multi: timed out (last log line: "Run /init as init process")

So failure 1 and 2 look like kernel related. Failure 3 does not look
like dracut being the culprit.

#1131809#15
Date:
2026-05-13 18:12:21 UTC
From:
To:
On Fri, 27 Mar 2026 23:04:22 +0100 Benjamin Drung <bdrung@debian.org> wrote:
just
autopkgtest
process")

We discussed this offline, so I'm following up here to summarise that
conversation from my point of view.

I suggested that some of the flakiness here may be the fault of QEMU
rather than anything else.  Case 1 looks like an "impossible" assertion
failure and the other 2 like flaky (emulated) storage devices.

I thought it would make sense to run a reduced test suite on
architectures where we don't expect the CI runners to have access to
KVM.  This would mitigate the slowness and (somewhat) the bugginess of
software-only QEMU.  Since we do have KVM on x86 (if I understood
correctly) the full test suite would still get run there.

Ben.

#1131809#20
Date:
2026-06-27 09:58:02 UTC
From:
To:
Hi,

I just had a look at the loong64 results. I didn't spot a successful run
yet, while most run for more than 8 hours and run into the global
autopkgtest timeout of 100000 seconds.

For now, I've added dracut/loong64 to the reject_list to save resources
(the test get triggered often).

Paul

E.g.
https://ci.debian.net/packages/d/dracut/testing/loong64/72513201/

31849s autopkgtest [07:41:46]: @@@@@@@@@@@@@@@@@@@@ summary
31849s 12-uefi              SKIP Test lists explicitly supported
architectures, but the current architecture loong64 isn't listed.
31849s hint-testsuite-triggers SKIP unknown restriction
hint-testsuite-triggers
31849s 12-uefi              SKIP Test lists explicitly supported
architectures, but the current architecture loong64 isn't listed.
31849s hint-testsuite-triggers SKIP unknown restriction
hint-testsuite-triggers
31849s lsinitrd             PASS (superficial)
31849s 10-basic             FAIL non-zero exit status 124
31849s 13-sysroot           FAIL non-zero exit status 124
31849s 14-hooks             FAIL non-zero exit status 124
31849s 40-systemd           FAIL non-zero exit status 124
31849s 42-systemd-initrd    FAIL non-zero exit status 124
31849s 43-kernel-install    FAIL non-zero exit status 124
31849s 80-getarg            PASS
31849s 81-skipcpio          PASS
31849s 11-usr-mount         FAIL non-zero exit status 124
31849s 20-storage           FAIL non-zero exit status 124
31849s 26-enc-raid-lvm      FAIL non-zero exit status 124
31849s 21-overlayfs         FAIL non-zero exit status 124
31849s 30-dmsquash          FAIL non-zero exit status 124
31849s 31-livenet           FAIL non-zero exit status 124
31849s 41-full-systemd      FAIL non-zero exit status 124
31849s 44-drivers           FAIL non-zero exit status 124
31849s 45-systemd-import    FAIL non-zero exit status 124
31849s 46-systemd-sysext    FAIL non-zero exit status 124
31849s 50-network           FAIL non-zero exit status 124
31849s 60-nfs               FAIL timed out
31849s 70-iscsi             SKIP global timeout exceeded
31849s 71-iscsi-multi       SKIP global timeout exceeded
31849s 72-nbd               SKIP global timeout exceeded

#1131809#25
Date:
2026-06-29 11:12:25 UTC
From:
To:
The long execution time is caused by the individual tests running into
the test-configured timeout. I just uploaded dracut 110-4 which contains
a patch to use edk2 firmware for loong64 tests. This should resolve
running into the timeouts.