#1108341 ck: FTBFS randomly: build never ends

Package:
src:ck
Source:
src:ck
Submitter:
Santiago Vila
Date:
2025-12-01 07:55:02 UTC
Severity:
normal
Tags:
#1108341#5
Date:
2025-06-26 11:53:25 UTC
From:
To:
Dear maintainer:

During a rebuild of all packages in unstable, your package failed to build:
--------------------------------------------------------------------------------
[...]
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
62929123 1 62929123
Terminated
make[4]: *** [Makefile:14: check] Error 143
make[4]: Leaving directory '/<<PKGBUILDDIR>>/regressions/ck_epoch/validate'
--------------------------------------------------------------------------------

Notes:

- The build does not always fail, but when it does (approximately 10% of the
time here), it never ends and it has to be terminated by hand (as seen above).

- I believe, but I can't check anymore, that this is more likely to happen
when using the kernel of trixie.

- I've put a collection of build logs here:

https://people.debian.org/~sanvila/build-logs/202506/

- I have no idea of why this happens, but as always, I can offer
a VM to test (with the caveat that the failure happens randomly).
(Please contact me privately for details).

Thanks.

#1108341#12
Date:
2025-11-29 04:18:45 UTC
From:
To:
We believe that the bug you reported is fixed in the latest version of
ck, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 1108341@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Daniel Baumann <daniel@debian.org> (supplier of updated ck package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Sat, 29 Nov 2025 04:19:16 +0100
Source: ck
Architecture: source
Version: 0.7.2-7
Distribution: sid
Urgency: medium
Maintainer: Daniel Baumann <daniel@debian.org>
Changed-By: Daniel Baumann <daniel@debian.org>
Closes: 1108341
Changes:
 ck (0.7.2-7) sid; urgency=medium
 .
   * Harmonizing upstream urls.
   * Adding upstream metadata.
   * Removing rules-requires-root, not needed anymore.
   * Updating homepage field.
   * Removing build-depends pre-trixie.
   * Tidying rules file.
   * Running test-suite single threaded on all architectures (Closes:
     #1108341).
   * Updating watch file to version 5.
Checksums-Sha1:
 69d6ad50ea52e8ea7134e21858fc861ff6bd596d 1242 ck_0.7.2-7.dsc
 60418c9948431c5b999b745c6e6a155e0781c449 5568 ck_0.7.2-7.debian.tar.xz
 70563735601170bd27c062c5adaf2edec0eb1689 5454 ck_0.7.2-7_amd64.buildinfo
Checksums-Sha256:
 97ae766573bfd9dc890d3cd998ce2f523a2dc3326ec21e416633bb56f76b34ff 1242 ck_0.7.2-7.dsc
 446d1271fa85338154b4c0fb00f3596e8b68513a776caf94fef8419fdb33ef45 5568 ck_0.7.2-7.debian.tar.xz
 bf63e783781557e181deb36d69230562fa55c81cead9f9bc8d3438e7d9dd4bc1 5454 ck_0.7.2-7_amd64.buildinfo
Files:
 8c05a18f708fcd391ed8459c82f435e1 1242 libs optional ck_0.7.2-7.dsc
 d93e6c438bb170bfde9607afa9c501d5 5568 libs optional ck_0.7.2-7.debian.tar.xz
 5077e49a7595c28f648463f1a240ecd9 5454 libs optional ck_0.7.2-7_amd64.buildinfo
-----BEGIN PGP SIGNATURE-----

iHUEARYKAB0WIQQmmGg4gLaoSj0ERgL7tPDoCoAiLwUCaSpvYwAKCRD7tPDoCoAi
L5lNAQCCRXITMqH/tTmjSoDa34ViTn104q2vRfCot/0XY9C5fQD9FT2e4iRG6aTD
NQHMXyUFaFK/7uqYX8/fgU6KfwH8gg4=
=eGUo
-----END PGP SIGNATURE-----

#1108341#17
Date:
2025-11-29 12:46:53 UTC
From:
To:
reopen 1108341
thanks

Sorry for the reopening, but I believe this is not fixed yet.

This package used to FTBFS in my setup, on machines with 2 CPUs,
approximately 10% the time. The autobuilder hang happens in a way
that it has to be killed by hand. Not even sbuild's built-in
timeout mechanism works.

Now I've tried to build 0.7.2-7 a lot of times (25) and I got two hangs,
so the failure rate remains more or less the same, and the bug I
initially reported ("random hang on machines with 2 CPUs") is not fixed.


Unfortunately, there is more: The previous version used to build
flawlessly on machines with 1 CPU, but now this is what happens
when I try the same:
----[ Testing barrier....
make[4]: Entering directory '/<<PKGBUILDDIR>>/regressions/ck_barrier/validate'
rc=0;                                                   \
for d in barrier_centralized barrier_combining barrier_dissemination barrier_tournament barrier_mcs ; do
                  \
        echo $d;                                        \
        ./$d 2 1 1 || rc=1;                     \
done;                                                   \
exit $rc
barrier_centralized
Creating threads (barrier)...done
Waiting for threads to finish correctness regression...WARNING: Could not affine thread: Invalid argument
E: Build killed with signal TERM after 10 minutes of inactivity


I tried 25 times and it failed 25 times. If you want to reproduce
this, the easy way is to tell grub via GRUB_CMDLINE_LINUX="nr_cpus=1",
but if you prefer a VM, contact me privately and I will gladly provide one.

In summary, the bug I reported is not fixed, and now the package always
fails on machines with 1 CPU. I would say that version -6 was better.

Thanks.

#1108341#26
Date:
2025-12-01 07:53:16 UTC
From:
To:
Hi Santiago,

no worries, thanks for re-testing it.

this seems to be a really weird issue, I could never reproduce this on
neither my notebook nor my desktop where I did all my ck uploads ever since.

which is important to fix (just to emphasis that). however, I'm a bit
lost on the "real-world" impact.

let's focus on amd64 for now since that's where you noticed the problem.
On the buildds, the package always built fine in the first attempt. Is
this problem a more 'make packages better by testing different things'
case (I remember the "doesn't build with cpu=1 MBF" some years ago), or
is it actually breaking something?

I'm not trying to argue the problem away, just trying to understand the
impact. If it's a "doesn't build reliably under non-default conditions",
maybe we could just skip the testsuite if nproc < 4 or so..

wow, if I summarize correctly:

   * 0.7.2-6 on cpu=1 sets automatically tests=1, works better than -7
   * 0.7.2-7 on cpu=1 sets statically test=1, works worse than -6

Since the test=1 handling is in d/rules rather than upstream, I can't
see any logic behind this behaviour.

I've uploaded -8 yesterday which is identical to -6.

Regards,
Daniel