Dear maintainer(s), I looked at the results of the autopkgtest of your package. I noticed that it regularly fails on armel while testing if other packages can migrate. A retry (or retry of retry) passes, so it doesn't seem related to those packages. Because the unstable-to-testing migration software now blocks on regressions in testing, flaky tests, i.e. tests that flip between passing and failing without changes to the list of installed packages, are causing people unrelated to your package to spend time on these tests. I now looked at it because both gcc-11 and gcc-12 showed up as regressing the glibc autopkgtest. Don't hesitate to reach out if you need help and some more information from our infrastructure. Paul https://ci.debian.net/packages/g/glibc/testing/armel/ https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/23501044/log.gz https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/26322757/log.gz nptl/tst-rwlock9 [...] Timed out: killed the child process Termination time: 2022-09-22T07:41:04.502168635 Last write to standard output: 2022-09-22T07:28:34.991525943 https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/26218800/log.gz https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/26223226/log.gz https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/26322746/log.gz ---------- FAIL: rt/tst-cpuclock2-time64 original exit status 1 live thread clock ffb6e90e resolution 0.000000001 live thread before sleep => 0.000254800 self thread before sleep => 0.000728320 live thread after sleep => 0.473986200 self thread after sleep => 0.001080840 clock_nanosleep on process slept 97739240 (outside reasonable range) ---------- https://ci.debian.net/data/autopkgtest/testing/armel/g/glibc/25779292/log.gz /bin/bash testdata/gen-XT5.sh > /tmp/autopkgtest-lxc.pjd0aipn/downtmp/build.Ui1/src/build-tree/armel-libc/timezone/testdata/XT5.tmp /bin/bash: line 1: /tmp/autopkgtest-lxc.pjd0aipn/downtmp/build.Ui1/src/build-tree/armel-libc/timezone/testdata/XT5.tmp: No such file or directory
Hi, Please find my answer (and questions for each test below). I have not been able to reproducible this bug after 1M tests on amdahl.d.o, an RPI3 (running an arm64 kernel) and a STM32MP1 board (armhf). Would it be possible to give more details, like any corresponding dmesg entry to have a better idea of the issue? 1/2500 on average. I have tracked it down to this bug: https://sourceware.org/bugzilla/show_bug.cgi?id=24774 It appears to be fixed by this patch that didn't seem to attract a lot of interest: https://sourceware.org/pipermail/libc-alpha/2021-September/131546.html I just reviewed and tested it, so let's see if it get merged soon: https://sourceware.org/pipermail/libc-alpha/2021-September/131546.html I also can't reproduce this one after 100000 tests on amdahl.d.o, an RPI3 (running an arm64 kernel) and a STM32MP1 board (armhf). According to upstream it seems that this test is known to fail heavy loaded hosts as it relies on wall time. Is it the case of the debci workers, do they have dedicated CPUs to run their tests? Are the armel workers different than the others? Nevertheless the part of the test that relies on wall time has been removed from upstream so this should be considered as fixed in glibc 2.35 that is now in testing: https://sourceware.org/git/?p=glibc.git;a=commit;h=f3c6c190388bb445568cfbf190a0942fc3c28553 https://sourceware.org/git/?p=glibc.git;a=commit;h=62db87ab24f9ca483f97f5e52ea92445f6a63c6f Regards Aurelien
Hi Aurelien, Thanks for your thorough testing. First off, we have recently changed our setup for armel and armhf testing. The real host is the same, but instead of one VM for armel where we ran 10 debci workers in parallel, we now have smaller VM's with only 4 parallel debci workers per VM. Maybe this changes some of the metrics. I'll try to have a look if I spot this again. The original dmesg is gone by now. Yes, and as mentioned above we changed it too. But as said, we ran a lot of parallel workers, so they could be heavy loaded. We also have an amd64 host that runs lots of parallel workers, and so does s390x, but maybe they are a bit better spec-ed than the armel VM was. That's good to hear. So, lets see the coming time if thing changed (hopefully for the better).. Paul
Hi Paul, A small update on this bug. Now that glibc 2.35-3 migrated to testing, the only unsolved issue is that one: Cheers Aurelien
Hi Aurelien, On Tue, 11 Oct 2022 23:48:15 +0200 Aurelien Jarno <aurelien@aurel32.net> wrote: armel looks much better now (maybe acceptable); This is the one I see now: https://ci.debian.net/packages/g/glibc/testing/armel/55087086/ 3777s FAIL: posix/tst-waitid 3777s original exit status 1 3777s tst-waitid.c:73: numeric comparison failure 3777s left: 0 (0x0); from: siginfo.si_status 3777s right: 19 (0x13); from: status 3777s error: 1 test failures But amd64 also has one (although the failure rate is somewhat acceptable): https://ci.debian.net/packages/g/glibc/testing/amd64/55596911/ 2980s check for check_libc failed Paul
Hi Paul, I have been able to reproduce it, also on other architectures, but only on an heavily loaded systemi, which I believe is the case for the debci runners. In my tests, this happens around once every 8000 times on a heavily loaded system. I'll try to debug that more. Unfortunately the middle of the log is missing (the failures are reported per flavour) so it is difficult to know which test fails. I have pushed a patch to repeat a summary of the failures of each flavour at the end of the test log, it will be in the next glibc upload. Regards Aurelien