Hi, this bug tracks progress with regards to occassional FTBFS (about 5-10% of all buildd builds across different architectures) due to a hanging test suite. They're difficult to reproduce outside the buildds, even with pbuilder. I ran the test suite 200 times in a row inside a non-interactive pbuilder session and was not able to trigger the issue. Related postings on the upstream mailing list (which is upstream's bug tracker): http://www.zsh.org/mla/workers/2014/msg00789.html ff. http://www.zsh.org/mla/workers/2014/msg00847.html http://www.zsh.org/mla/workers/2014/msg00851.html http://www.zsh.org/mla/workers/2014/msg00863.html (first case reproduced by upstream) Builds where later builds succeeded again, either giveback or another upload: https://buildd.debian.org/status/logs.php?pkg=zsh&arch=amd64 https://buildd.debian.org/status/logs.php?pkg=zsh&arch=kfreebsd-amd64 There's a small possibility that former test suite hangs on hurd-i386 are a similar issue. But they happened so often, that we completely disabled the test suite on hurd-i386 for now, as it's no release architecture: https://buildd.debian.org/status/logs.php?pkg=zsh&arch=hurd-i386 The recent (but workarounded) FTBFS on s390x (https://buildd.debian.org/status/logs.php?pkg=zsh&arch=s390x) are a different issue and are tracked in this gcc-4.9 bug report: https://bugs.debian.org/759870
Hi, Axel Beckert wrote: JFTR: armhf today is another such case, it built successfully after one giveback (Thanks Ivo!): https://buildd.debian.org/status/logs.php?pkg=zsh&arch=armhf Regards, Axel
Hi Axel Beckert wrote: Plus the one on armhf. (Ivo was so kind to do the giveback.) There's now a tracking bug for this issue: https://bugs.debian.org/760061 (Cc'ed) In the meanwhile the mipsel failed with a hanging test suite, too. So please giveback zsh on mipsel, too: gb zsh_5.0.6-1 . mipsel Regards, Axel
Hi again, Axel Beckert wrote: Then again, this issue comes up more often on kfreebsd-amd64 (4x over the past 3 uploads and 2 givebacks) than on any other architecture (maximum once per architecture so far over the past 3 uploads). Raising the severity because it surely can't go on like this forever, especially not with the current failure rate on kfreebsd-amd64. Investigation so far suggests that not the tests itself hang, but the script parsing and executing them. It'd be nice if kfreebsd-amd64 could be given back a second time, in addition to mipsel: gb zsh_5.0.6-1 . mipsel kfreebsd-amd64 Regards, Axel
Hi we currently experience occassional hangs of the script which executes the zsh test suite. It happens across all architectures and approximately in 5-10% of all builds on the buildds -- but nearly never outside (even with pbuilder) which makes it very difficult to debug. Upstream is aware of it. See https://bugs.debian.org/760061 (Cc'ed) for details. The previous 5.0.6-1 upload was affected on three architectures, 5.0.6-2 only on a single one so far (armhf; arm64 still being in needs-build state). So please giveback zsh on armhf: gb zsh_5.0.6-2 . armhf TIA! Regards, Axel
Hi again, Thanks for the confirmation that this is likely not debian-specific. I though still don't understand why this happens way more often on the buildds than outside of the buildds. Since we ran into the issue again with the upload one week ago, and we have to do another upload again before the upcoming freeze for Debian's next stable release, I did another run of building the zsh package again and again (without connected terminal by using "ssh localhost" without "-t" to get closer to the buildd environment) until I run into the issue. And this time I had some luck after something like 70 to 100 successful package builds in one row, but interestingly, with different results as Bart: ~ # ps auxwwwf | fgrep -B10 -A2 Src/zsh abe 18988 0.0 0.1 41920 9216 ? SNs 01:53 0:00 \_ /usr/bin/perl /usr/bin/debuild -uc -us -B -j5 abe 19029 0.0 0.0 8964 1388 ? SN 01:53 0:00 \_ tee ../zsh_5.0.6-2_amd64.build abe 19030 0.0 0.1 41788 11232 ? SN 01:53 0:00 \_ /usr/bin/perl /usr/bin/dpkg-buildpackage -rfakeroot -D -us -uc -i -j3 -B -j5 abe 13060 0.0 0.0 22632 3220 ? SN 01:54 0:00 \_ /bin/bash /usr/bin/fakeroot debian/rules binary-arch abe 13075 0.0 0.0 13444 2424 ? SN 01:54 0:00 \_ /usr/bin/make -f debian/rules binary-arch abe 22865 0.0 0.0 8476 808 ? SN 01:55 0:00 \_ /bin/sh -c HOME="/home/abe/zsh/zsh/obj-static/testhome" dh_auto_test -B obj-static --parallel || true abe 22866 0.0 0.1 47996 13312 ? SN 01:55 0:00 \_ /usr/bin/perl -w /usr/bin/dh_auto_test -B obj-static --parallel abe 22875 0.0 0.0 13444 2464 ? SN 01:55 0:00 \_ make -j5 test abe 22876 0.0 0.0 8476 828 ? SN 01:55 0:00 \_ /bin/sh -c cd Test ; make check abe 22877 0.0 0.0 13444 2284 ? SN 01:55 0:00 \_ make check abe 22879 0.0 0.0 8476 924 ? SN 01:55 0:00 \_ /bin/sh -c if ZTST_testlist="`for f in ../../Test/*.ztst; \ do echo $f; done`" \ ZTST_srcdir="../../Test" \ ZTST_exe=../Src/zsh \ ../Src/zsh +Z -f ../../Test/runtests.zsh; then \ stat=0; \ else \ stat=1; \ fi; \ sleep 1; \ rm -rf Modules .zcompdump; \ exit $stat abe 22881 0.0 0.0 7428 2120 ? SN 01:55 0:00 \_ ../Src/zsh +Z -f ../../Test/runtests.zsh abe 23664 0.0 0.0 8084 2812 ? SN 01:55 0:00 \_ ../Src/zsh +Z -f ../../Test/ztst.zsh ../../Test/A05execution.ztst This means the issue occurred while running the test suite against the static build of the test suite. I'll tomorrow check the other failed builds if that was the case there, too. If so, I'll disable the test suite for the static build completely and hope that's the fix for now. (We currently ignore the result for the static build anyways since C02cond.ztst always fails there. See some other mail from me several months if not a year ago.) So what's different to Bart's case are the backtraces, latest children first: ~ # gdb -p 23664 GNU gdb (Debian 7.7.1+dfsg-3) 7.7.1 [...] Attaching to process 23664 Reading symbols from /home/abe/zsh/zsh/obj-static/Src/zsh...done. 0x000000000054f0eb in __lll_lock_wait_private () (gdb) bt #0 0x000000000054f0eb in __lll_lock_wait_private () #1 0x000000000050d551 in _L_lock_3682 () #2 0x0000000000508b1b in _int_free () #3 0x00000000004349cd in freejob (jn=jn@entry=0x131e430, deleting=deleting@entry=1) at ../../Src/jobs.c:1240 #4 0x0000000000433578 in deletejob (jn=jn@entry=0x131e430, disowning=disowning@entry=0) at ../../Src/jobs.c:1295 #5 0x00000000004339ba in printjob (jn=jn@entry=0x131e430, lng=0, synch=synch@entry=0) at ../../Src/jobs.c:1146 #6 0x00000000004348d0 in update_job (jn=0x131e430) at ../../Src/jobs.c:565 #7 0x00000000004636e1 in wait_for_processes () at ../../Src/signals.c:515 #8 0x0000000000463df5 in zhandler (sig=17) at ../../Src/signals.c:594 #9 <signal handler called> #10 0x0000000000509038 in _int_free () #11 0x0000000000417f92 in execshfunc (shf=0x1374c10, args=0x7f847b7567c8) at ../../Src/exec.c:4434 #12 0x00000000004184b0 in execshfunc (args=0x7f847b7567c8, shf=0x1374c10) at ../../Src/exec.c:4310 #13 execfuncdef (state=0x7fff6eb724f0, do_exec=<optimized out>) at ../../Src/exec.c:4319 #14 0x0000000000415cdd in execsimple (state=state@entry=0x7fff6eb724f0) at ../../Src/exec.c:1120 #15 0x000000000041cfb0 in execlist (state=state@entry=0x7fff6eb724f0, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0) at ../../Src/exec.c:1259 #16 0x000000000041d540 in execode (p=p@entry=0x1339300, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0, context=context@entry=0x59b98b "shfunc") at ../../Src/exec.c:1070 #17 0x0000000000417610 in runshfunc (prog=0x1339300, wrap=0x0, name=0x7f847b75b5e0 "ZTST_execchunk") at ../../Src/exec.c:4895 #18 0x0000000000417bd6 in doshfunc (shfunc=shfunc@entry=0x13390d0, doshargs=doshargs@entry=0x7f847b75b498, noreturnval=noreturnval@entry=0) at ../../Src/exec.c:4775 #19 0x0000000000417f80 in execshfunc (shf=0x13390d0, args=0x7f847b75b498) at ../../Src/exec.c:4432 #20 0x000000000041b7d9 in execshfunc (args=<optimized out>, shf=<optimized out>) at ../../Src/exec.c:4400 #21 execcmd (state=0x7fff6eb73720, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=2, last1=0) at ../../Src/exec.c:3269 #22 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb73720, pcode=pcode@entry=5507, how=how@entry=2, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #23 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb73720, slcode=<optimized out>, how=how@entry=2, last1=0) at ../../Src/exec.c:1478 #24 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb73720, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #25 0x000000000043d471 in execif (state=0x7fff6eb73720, do_exec=0) at ../../Src/loop.c:525 #26 0x000000000041ac06 in execcmd (state=0x7fff6eb73720, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=2, last1=0) at ../../Src/exec.c:3232 #27 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb73720, pcode=pcode@entry=5123, how=how@entry=2, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #28 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb73720, slcode=<optimized out>, how=how@entry=2, last1=0) at ../../Src/exec.c:1478 #29 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb73720, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #30 0x000000000043d0bd in execwhile (state=0x7fff6eb73720, do_exec=<optimized out>) at ../../Src/loop.c:420 #31 0x000000000041ac06 in execcmd (state=0x7fff6eb73720, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=2, last1=0) at ../../Src/exec.c:3232 #32 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb73720, pcode=pcode@entry=323, how=how@entry=2, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #33 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb73720, slcode=<optimized out>, how=how@entry=2, last1=0) at ../../Src/exec.c:1478 #34 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb73720, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #35 0x000000000041d540 in execode (p=p@entry=0x1339440, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0, context=context@entry=0x59b98b "shfunc") at ../../Src/exec.c:1070 #36 0x0000000000417610 in runshfunc (prog=0x1339440, wrap=0x0, name=0x7f847b75a910 "ZTST_test") at ../../Src/exec.c:4895 #37 0x0000000000417bd6 in doshfunc (shfunc=shfunc@entry=0x1338820, doshargs=doshargs@entry=0x7f847b75a8b0, noreturnval=noreturnval@entry=0) at ../../Src/exec.c:4775 #38 0x0000000000417f80 in execshfunc (shf=0x1338820, args=0x7f847b75a8b0) at ../../Src/exec.c:4432 #39 0x000000000041b7d9 in execshfunc (args=<optimized out>, shf=<optimized out>) at ../../Src/exec.c:4400 #40 execcmd (state=0x7fff6eb74980, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=2, last1=0) at ../../Src/exec.c:3269 #41 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb74980, pcode=pcode@entry=32515, how=how@entry=2, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #42 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb74980, slcode=<optimized out>, how=how@entry=2, last1=0) at ../../Src/exec.c:1478 #43 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb74980, dont_change_job=dont_change_job@entry=1, exiting=0) at ../../Src/exec.c:1261 #44 0x000000000043d5e7 in execcase (state=0x7fff6eb74980, do_exec=0) at ../../Src/loop.c:603 #45 0x000000000041ac06 in execcmd (state=0x7fff6eb74980, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=18, last1=0) at ../../Src/exec.c:3232 #46 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb74980, pcode=pcode@entry=31491, how=how@entry=18, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #47 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb74980, slcode=<optimized out>, how=how@entry=18, last1=0) at ../../Src/exec.c:1478 #48 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb74980, dont_change_job=dont_change_job@entry=1, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #49 0x000000000043d0bd in execwhile (state=0x7fff6eb74980, do_exec=<optimized out>) at ../../Src/loop.c:420 #50 0x000000000041ac06 in execcmd (state=0x7fff6eb74980, input=input@entry=0, output=20401232, output@entry=0, how=20401232, how@entry=18, last1=0) at ../../Src/exec.c:3232 #51 0x000000000041bcfe in execpline2 (state=state@entry=0x7fff6eb74980, pcode=pcode@entry=31427, how=how@entry=18, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #52 0x000000000041c0c6 in execpline (state=state@entry=0x7fff6eb74980, slcode=<optimized out>, how=how@entry=18, last1=0) at ../../Src/exec.c:1478 #53 0x000000000041d271 in execlist (state=state@entry=0x7fff6eb74980, dont_change_job=dont_change_job@entry=0, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #54 0x000000000041d540 in execode (p=p@entry=0x7f847b75a230, dont_change_job=dont_change_job@entry=0, exiting=exiting@entry=0, context=context@entry=0x59cf89 "toplevel") at ../../Src/exec.c:1070 #55 0x000000000042e00c in loop (toplevel=toplevel@entry=1, justonce=justonce@entry=0) at ../../Src/init.c:185 #56 0x00000000004313b6 in zsh_main (argc=<optimized out>, argv=<optimized out>) at ../../Src/init.c:1625 #57 0x00000000004f1840 in __libc_start_main () #58 0x0000000000401c87 in _start () ~ # gdb -p 22881 GNU gdb (Debian 7.7.1+dfsg-3) 7.7.1 [...] Attaching to process 22881 Reading symbols from /home/abe/zsh/zsh/obj-static/Src/zsh...done. 0x00000000004f8746 in sigsuspend () (gdb) bt #0 0x00000000004f8746 in sigsuspend () #1 0x0000000000463296 in signal_suspend (sig=sig@entry=17, wait_cmd=wait_cmd@entry=0) at ../../Src/signals.c:375 #2 0x0000000000434e46 in zwaitjob (job=<optimized out>, wait_cmd=0) at ../../Src/jobs.c:1454 #3 0x0000000000435527 in waitjobs () at ../../Src/jobs.c:1499 #4 0x000000000041c6eb in execpline (state=state@entry=0x7fffd409efc0, slcode=<optimized out>, how=<optimized out>, how@entry=2, last1=0) at ../../Src/exec.c:1554 #5 0x000000000041d271 in execlist (state=state@entry=0x7fffd409efc0, dont_change_job=dont_change_job@entry=1, exiting=0) at ../../Src/exec.c:1261 #6 0x000000000043c581 in execfor (state=0x7fffd409efc0, do_exec=0) at ../../Src/loop.c:164 #7 0x000000000041ac06 in execcmd (state=0x7fffd409efc0, input=8, input@entry=0, output=output@entry=0, how=-1, how@entry=18, last1=368) at ../../Src/exec.c:3232 #8 0x000000000041bcfe in execpline2 (state=state@entry=0x7fffd409efc0, pcode=pcode@entry=771, how=how@entry=18, input=0, output=0, last1=last1@entry=0) at ../../Src/exec.c:1691 #9 0x000000000041c0c6 in execpline (state=state@entry=0x7fffd409efc0, slcode=<optimized out>, how=how@entry=18, last1=0) at ../../Src/exec.c:1478 #10 0x000000000041d271 in execlist (state=state@entry=0x7fffd409efc0, dont_change_job=dont_change_job@entry=0, exiting=exiting@entry=0) at ../../Src/exec.c:1261 #11 0x000000000041d540 in execode (p=p@entry=0x7fbe3410a948, dont_change_job=dont_change_job@entry=0, exiting=exiting@entry=0, context=context@entry=0x59cf89 "toplevel") at ../../Src/exec.c:1070 #12 0x000000000042e00c in loop (toplevel=toplevel@entry=1, justonce=justonce@entry=0) at ../../Src/init.c:185 #13 0x00000000004313b6 in zsh_main (argc=<optimized out>, argv=<optimized out>) at ../../Src/init.c:1625 #14 0x00000000004f1840 in __libc_start_main () #15 0x0000000000401c87 in _start () Hope this helps! For now the process is still there and I can still do some more investigation -- if it survives my Thinkpad going to Suspend-to-RAM when I go to work in a few hours. Kind regards, Axel
Hi again, Axel Beckert wrote: I was too curious, so I checked all buildd logs with hanging test suites immediately after my last mail. We seem to have either a bunch of different cases or (IMHO more likely) rather non-deterministic places where in the test-suite it happens (which again would be more in line with Bart's findings a month ago). 1) Hangs in the static build at ../../Test/A04redirect.ztst: starting. https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=armhf&ver=5.0.6-1&stamp=1409479814 https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=armhf&ver=5.0.6-2&stamp=1411273633 2) Hangs in the static build at This test takes 5 seconds to fail... https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=kfreebsd-amd64&ver=5.0.6-1&stamp=1409578643 This is the type I have currently here for debugging. It's on amd64 Linux, i.e. not Debian GNU/kFreeBSD on amd64 as the log mentioned above. 3) Hangs in the static build at ../../Test/A05execution.ztst: starting. https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=mipsel&ver=5.0.6-1&stamp=1409529387 4) Hangs in the non-static build at ../../Test/X02zlevi.ztst: starting. https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=amd64&ver=5.0.5-dev-3-1&stamp=1409099297 5) Hangs in the non-static build at This test takes 5 seconds to fail... https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=kfreebsd-amd64&ver=5.0.5-dev-2-1&stamp=1407974756 https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=kfreebsd-amd64&ver=5.0.5-dev-2-1&stamp=1407936338 6) Hangs in the non-static build at ../../Test/A05execution.ztst: starting. https://buildd.debian.org/status/fetch.php?pkg=zsh&arch=kfreebsd-amd64&ver=5.0.6-1&stamp=1409502508 So it's clearly not always the test suite of the static build. But I plan to disable the test suite run on the statically build zsh for now anyways to reduce the probability to run into that issue -- as we ignore its result currently. Additionally, I've got some more information about the hanging process in case that helps: Console output: HOME="/home/abe/zsh/zsh/obj-static/testhome" dh_auto_test -B obj-static --parallel || true make[1]: Entering directory '/home/abe/zsh/zsh/obj-static' cd Test ; make check make[2]: Entering directory '/home/abe/zsh/zsh/obj-static/Test' if test -n ""; then \ cd .. && DESTDIR= \ make MODDIR=`pwd`/Test/Modules install.modules > /dev/null; \ fi if ZTST_testlist="`for f in ../../Test/*.ztst; \ do echo $f; done`" \ ZTST_srcdir="../../Test" \ ZTST_exe=../Src/zsh \ ../Src/zsh +Z -f ../../Test/runtests.zsh; then \ stat=0; \ else \ stat=1; \ fi; \ sleep 1; \ rm -rf Modules .zcompdump; \ exit $stat ../../Test/A01grammar.ztst: starting. This test hangs the shell when it fails... ../../Test/A01grammar.ztst: all tests successful. ../../Test/A02alias.ztst: starting. This test hangs the shell when it fails... ../../Test/A02alias.ztst: all tests successful. ../../Test/A03quoting.ztst: starting. ../../Test/A03quoting.ztst: all tests successful. ../../Test/A04redirect.ztst: starting. ../../Test/A04redirect.ztst: all tests successful. ../../Test/A05execution.ztst: starting. Unable to change MONITOR option This test takes 5 seconds to fail... <Hangs here> Open files, again latest child first: ~ # lsof -p 23664 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME zsh 23664 abe cwd DIR 253,3 4096 1053779 /home/abe/zsh/zsh/obj-static/Test/command.tmp zsh 23664 abe rtd DIR 253,1 4096 2 / zsh 23664 abe txt REG 253,3 5567456 1053778 /home/abe/zsh/zsh/obj-static/Src/zsh zsh 23664 abe mem REG 253,1 151984 942005 /usr/lib/locale/C.UTF-8/LC_CTYPE zsh 23664 abe mem REG 253,1 50 942004 /usr/lib/locale/C.UTF-8/LC_NUMERIC zsh 23664 abe mem REG 253,1 2454 942993 /usr/lib/locale/C.UTF-8/LC_TIME zsh 23664 abe mem REG 253,1 1501202 941630 /usr/lib/locale/C.UTF-8/LC_COLLATE zsh 23664 abe mem REG 253,1 270 942576 /usr/lib/locale/C.UTF-8/LC_MONETARY zsh 23664 abe mem REG 253,1 48 942574 /usr/lib/locale/C.UTF-8/LC_MESSAGES/SYS_LC_MESSAGES zsh 23664 abe mem REG 253,1 34 941654 /usr/lib/locale/C.UTF-8/LC_PAPER zsh 23664 abe mem REG 253,1 62 938215 /usr/lib/locale/C.UTF-8/LC_NAME zsh 23664 abe mem REG 253,1 131 938217 /usr/lib/locale/C.UTF-8/LC_ADDRESS zsh 23664 abe mem REG 253,1 47 942270 /usr/lib/locale/C.UTF-8/LC_TELEPHONE zsh 23664 abe mem REG 253,1 23 942273 /usr/lib/locale/C.UTF-8/LC_MEASUREMENT zsh 23664 abe mem REG 253,1 26258 1010059 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache zsh 23664 abe mem REG 253,1 168 942571 /usr/lib/locale/C.UTF-8/LC_IDENTIFICATION zsh 23664 abe mem REG 253,1 3124784 930019 /usr/lib/locale/locale-archive zsh 23664 abe 0r REG 0,32 0 36967738 /tmp/zsh.ztst.in.23664 zsh 23664 abe 1w REG 0,32 9 36958604 /tmp/zsh.ztst.tout.23664 zsh 23664 abe 2w REG 0,32 0 36958605 /tmp/zsh.ztst.terr.23664 zsh 23664 abe 10r REG 253,3 14300 2101965 /home/abe/zsh/zsh/Test/ztst.zsh zsh 23664 abe 11w FIFO 0,8 0t0 36921442 pipe zsh 23664 abe 12r REG 253,3 4802 2101923 /home/abe/zsh/zsh/Test/A05execution.ztst zsh 23664 abe 13r FIFO 0,8 0t0 36920973 pipe zsh 23664 abe 14w FIFO 0,8 0t0 36921442 pipe zsh 23664 abe 15w FIFO 0,8 0t0 36921442 pipe zsh 23664 abe 16r FIFO 0,8 0t0 36968504 pipe zsh 23664 abe 19w FIFO 0,8 0t0 36968505 pipe ~ # lsof -p 22881 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME zsh 22881 abe cwd DIR 253,3 4096 1053452 /home/abe/zsh/zsh/obj-static/Test zsh 22881 abe rtd DIR 253,1 4096 2 / zsh 22881 abe txt REG 253,3 5567456 1053778 /home/abe/zsh/zsh/obj-static/Src/zsh zsh 22881 abe mem REG 253,1 151984 942005 /usr/lib/locale/C.UTF-8/LC_CTYPE zsh 22881 abe mem REG 253,1 50 942004 /usr/lib/locale/C.UTF-8/LC_NUMERIC zsh 22881 abe mem REG 253,1 2454 942993 /usr/lib/locale/C.UTF-8/LC_TIME zsh 22881 abe mem REG 253,1 1501202 941630 /usr/lib/locale/C.UTF-8/LC_COLLATE zsh 22881 abe mem REG 253,1 270 942576 /usr/lib/locale/C.UTF-8/LC_MONETARY zsh 22881 abe mem REG 253,1 48 942574 /usr/lib/locale/C.UTF-8/LC_MESSAGES/SYS_LC_MESSAGES zsh 22881 abe mem REG 253,1 34 941654 /usr/lib/locale/C.UTF-8/LC_PAPER zsh 22881 abe mem REG 253,1 62 938215 /usr/lib/locale/C.UTF-8/LC_NAME zsh 22881 abe mem REG 253,1 131 938217 /usr/lib/locale/C.UTF-8/LC_ADDRESS zsh 22881 abe mem REG 253,1 47 942270 /usr/lib/locale/C.UTF-8/LC_TELEPHONE zsh 22881 abe mem REG 253,1 23 942273 /usr/lib/locale/C.UTF-8/LC_MEASUREMENT zsh 22881 abe mem REG 253,1 26258 1010059 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache zsh 22881 abe mem REG 253,1 168 942571 /usr/lib/locale/C.UTF-8/LC_IDENTIFICATION zsh 22881 abe mem REG 253,1 3124784 930019 /usr/lib/locale/locale-archive zsh 22881 abe 0r FIFO 0,8 0t0 36920973 pipe zsh 22881 abe 1w FIFO 0,8 0t0 36921442 pipe zsh 22881 abe 2w FIFO 0,8 0t0 36921442 pipe zsh 22881 abe 10r REG 253,3 758 2101964 /home/abe/zsh/zsh/Test/runtests.zsh Contents of the open files in /tmp/: ~ # head -100 /tmp/zsh.ztst.* ==> /tmp/zsh.ztst.err.23664 <== ==> /tmp/zsh.ztst.in.23664 <== ==> /tmp/zsh.ztst.out.23664 <== 1 2 done ==> /tmp/zsh.ztst.terr.23664 <== ==> /tmp/zsh.ztst.tout.23664 <== 1 2 done HTH. Kind regards, Axel
On Sep 29, 3:51am, Axel Beckert wrote: } } 1) Hangs in the static build at } } ../../Test/A04redirect.ztst: starting. This doesn't mean very much; none of the tests in A04 produce any output (unless they fail) so it could be stuck anywhere. To refine, the "make check" need to be run with ZTST_verbose=1 in the environment. See Test/ztst.zsh for more. } 2) Hangs in the static build at } } This test takes 5 seconds to fail... This is the only one to which my previous discussions apply. I've since reproducesd this several times, never intentionally but it always happens immediately after recompiling the binaries. } 3) Hangs in the static build at } } ../../Test/A05execution.ztst: starting. Again this could be in any test, if possible retry with ZTST_verbose=1 } 4) Hangs in the non-static build at } } ../../Test/X02zlevi.ztst: starting. Once more, anywhere. X02 manipulates zpty terminals to emulate an interactive session, so is a likely place for race conditions. } 5) Hangs in the non-static build at } } This test takes 5 seconds to fail... See (2). } 6) Hangs in the non-static build at } } ../../Test/A05execution.ztst: starting. See (3).
Hi Bart, thanks for the feedback. Bart Schaefer wrote: Will do. Let's see how quick I manage to reproduce the issue again. Kind regards, Axel
Hi again, Axel Beckert wrote: It actually were 118 successful builds in a row before the first test suite run was hanging. Axel Beckert wrote: I wasn't able to reproduce the issue since I did that. I'm now at 237 builds with ZTST_verbose=1 in a row without hangs. The latest 10 builds even ran the test suite 10 times in a row per build. I'll let the builds continue for at least a few more days to check if I can still reproduce the issue. But I have a slight hope that enabling verbose output avoids some race condition which is triggering this issue occasionally. If setting ZTST_verbose=1 really avoids the issue, this may already help to get rid of the build failures as I plan to also run the official package builds with ZTST_verbose=1 -- mainly to get some more details where they fail exactly. Regards, Axel
On Sep 30, 7:40pm, Axel Beckert wrote: } } But I have a slight hope that enabling verbose output avoids some race } condition which is triggering this issue occasionally. That's quite likely. Also the patch in zsh-workers/33298 may resolve some deadlocks caused by signals interrupting memory management.
Hi Bart, Bart Schaefer wrote: Perfect. :-) JFTR for those not reading zsh-workers, Bart refers to this mail: http://www.zsh.org/mla/workers/2014/msg01086.html Yep, saw that. Already wondered if that's related when I saw Vincent's bug report (http://www.zsh.org/mla/workers/2014/msg01083.html). And I was surely happy to see a fix for it in such a short time. Thanks! I'll likely cherry-pick it for the Debian package. Regards, Axel