#1020500 glibc: flaky autopkgtest on armel: multiple different failures

#1020500#5
Date:
2022-09-22 09:19:57 UTC
From:
To:
Dear maintainer(s),

I looked at the results of the autopkgtest of your package. I noticed
that it regularly fails on armel while testing if other packages can
migrate. A retry (or retry of retry) passes, so it doesn't seem related
to those packages.

Because the unstable-to-testing migration software now blocks on
regressions in testing, flaky tests, i.e. tests that flip between
passing and failing without changes to the list of installed packages,
are causing people unrelated to your package to spend time on these
tests. I now looked at it because both gcc-11 and gcc-12 showed up as
regressing the glibc autopkgtest.

Don't hesitate to reach out if you need help and some more information
from our infrastructure.

Paul

https://ci.debian.net/packages/g/glibc/testing/armel/

https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/23501044/log.gz


https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/26322757/log.gz

nptl/tst-rwlock9
[...]
Timed out: killed the child process
Termination time: 2022-09-22T07:41:04.502168635
Last write to standard output: 2022-09-22T07:28:34.991525943


https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/26218800/log.gz
https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/26223226/log.gz
https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/26322746/log.gz
----------
FAIL: rt/tst-cpuclock2-time64
original exit status 1
live thread clock ffb6e90e resolution 0.000000001
live thread before sleep => 0.000254800
self thread before sleep => 0.000728320
live thread after sleep => 0.473986200
self thread after sleep => 0.001080840
clock_nanosleep on process slept 97739240 (outside reasonable range)
----------


https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/25779292/log.gz

/bin/bash testdata/gen-XT5.sh >
/tmp/autopkgtest-lxc.pjd0aipn/downtmp/build.Ui1/src/build-tree/armel-libc/timezone/testdata/XT5.tmp
/bin/bash: line 1:
/tmp/autopkgtest-lxc.pjd0aipn/downtmp/build.Ui1/src/build-tree/armel-libc/timezone/testdata/XT5.tmp:
No such file or directory

#1020500#10
Date:
2022-10-07 18:55:58 UTC
From:
To:
Hi,

Please find my answer (and questions for each test below).

I have not been able to reproducible this bug after 1M tests on
amdahl.d.o, an RPI3 (running an arm64 kernel) and a STM32MP1 board
(armhf). Would it be possible to give more details, like any
corresponding dmesg entry to have a better idea of the issue?
1/2500 on average. I have tracked it down to this bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=24774

It appears to be fixed by this patch that didn't seem to attract a lot
of interest:
https://sourceware.org/pipermail/libc-alpha/2021-September/131546.html

I just reviewed and tested it, so let's see if it get merged soon:
https://sourceware.org/pipermail/libc-alpha/2021-September/131546.html

I also can't reproduce this one after 100000 tests on amdahl.d.o, an
RPI3 (running an arm64 kernel) and a STM32MP1 board (armhf). According
to upstream it seems that this test is known to fail heavy loaded hosts
as it relies on wall time. Is it the case of the debci workers, do they
have dedicated CPUs to run their tests? Are the armel workers different
than the others?

Nevertheless the part of the test that relies on wall time has been
removed from upstream so this should be considered as fixed in glibc
2.35 that is now in testing:
https://sourceware.org/git/?p=glibc.git;a=commit;h=f3c6c190388bb445568cfbf190a0942fc3c28553
https://sourceware.org/git/?p=glibc.git;a=commit;h=62db87ab24f9ca483f97f5e52ea92445f6a63c6f

Regards
Aurelien

#1020500#15
Date:
2022-10-07 19:14:49 UTC
From:
To:
Hi Aurelien,

Thanks for your thorough testing.

First off, we have recently changed our setup for armel and armhf
testing. The real host is the same, but instead of one VM for armel
where we ran 10 debci workers in parallel, we now have smaller VM's with
only 4 parallel debci workers per VM. Maybe this changes some of the
metrics.

I'll try to have a look if I spot this again. The original dmesg is gone
by now.

Yes, and as mentioned above we changed it too. But as said, we ran a lot
of parallel workers, so they could be heavy loaded. We also have an
amd64 host that runs lots of parallel workers, and so does s390x, but
maybe they are a bit better spec-ed than the armel VM was.

That's good to hear.

So, lets see the coming time if thing changed (hopefully for the better)..

Paul

#1020500#20
Date:
2022-10-11 21:48:15 UTC
From:
To:
Hi Paul,

A small update on this bug. Now that glibc 2.35-3 migrated to testing,
the only unsolved issue is that one:

Cheers
Aurelien

#1020500#25
Date:
2025-01-11 08:51:44 UTC
From:
To:
Hi Aurelien,

On Tue, 11 Oct 2022 23:48:15 +0200 Aurelien Jarno <aurelien@aurel32.net> wrote:
armel looks much better now (maybe acceptable); This is the one I see now:
https://ci.debian.net/packages/g/glibc/testing/armel/55087086/

3777s FAIL: posix/tst-waitid
3777s original exit status 1
3777s tst-waitid.c:73: numeric comparison failure
3777s    left: 0 (0x0); from: siginfo.si_status
3777s   right: 19 (0x13); from: status
3777s error: 1 test failures


But amd64 also has one (although the failure rate is somewhat acceptable):
https://ci.debian.net/packages/g/glibc/testing/amd64/55596911/

2980s check for check_libc failed

Paul

#1020500#30
Date:
2025-02-08 14:22:34 UTC
From:
To:
Hi Paul,

I have been able to reproduce it, also on other architectures, but only
on an heavily loaded systemi, which I believe is the case for the debci
runners. In my tests, this happens around once every 8000 times on a
heavily loaded system. I'll try to debug that more.

Unfortunately the middle of the log is missing (the failures are
reported per flavour) so it is difficult to know which test fails. I
have pushed a patch to repeat a summary of the failures of each flavour
at the end of the test log, it will be in the next glibc upload.

Regards
Aurelien