#979656 llvm-toolchain-7: Test suite hangs the autobuilder on single-CPU systems

#979656#5
Date:
2016-03-25 23:36:36 UTC
From:
To:
Dear maintainer:

I tried to build this package with "dpkg-buildpackage -A"
(i.e. only architecture-independent packages), and it failed.

In fact, I tried twice.

The first time this is what happened:
--------------------------------------------------------------------------------
[...]
-------------------------------------------------------------------------------- [...]
#979656#10
Date:
2016-07-14 13:53:41 UTC
From:
To:
retitle 819278 llvm-toolchain-3.8: Build not reliable
severity 819278 normal
user sanvila@debian.org
usertags 819278 - binary-indep
thanks

This bug needs to be investigated, but it does not seem to be the
typical binary-indep bug so I'm removing it from my list of
binary-indep bugs for now.

Thanks.

#979656#19
Date:
2016-07-18 22:02:12 UTC
From:
To:
severity 819278 serious
thanks

Well, I tried to build this package four times already, and it always
failed, so it's not that it "does not build reliably", but instead
it "realiably does not build", or FTBFS for short.

In every case I was using sbuild triggered by a cron job.

In the first two cases the build was done in a KVM virtual machine.
In the last two cases the build was done in an Azure virtual machine.

In the last case I disabled eatmydata and it also failed, so eatmydata
is not the reason to fail.

I attach the four build logs in full.

Feel free to ask anything which may help you to reproduce the bug.

Thanks.

#979656#26
Date:
2016-07-19 06:44:07 UTC
From:
To:
severity 819278 normal
thanks

Hello,

Le 19/07/2016 à 00:02, Santiago Vila a écrit :
a bug upstream to fix it.

However, I don't agree with the severity.
I never experienced such issues on the official build infra (
https://buildd.debian.org/status/package.php?p=llvm-toolchain-3.8 )
or the upstream CI: http://llvm-jenkins.debian.net/

Sylvestre

#979656#33
Date:
2016-07-19 08:32:46 UTC
From:
To:
I fully agree, and I'd love to provide the test which fails, but the
build process is completely silent about the tests.

In fact, back in March I wrote this in this bugreport:

Any progress on that?

Do you need another different bug report for that?

Or maybe would it be helpful if we consider *this* bug report the one
about the build process being silent, so that I only report the bug
about build failing when we know the test which fails?

Please tell what do you think.

Well, if I can't build the program, then I can't modify it. If I can't
modify it, the program is non-free for me, for all practical purposes.

Would this not deserve "important" severity at least?

Also, the theory that a package needs to FTBFS in the official autobuilder
for a bug to be serious has its flaws:

Imagine a package which rolls a dice and decides to fail according to
the result, then the fact that it builds ok in the official
autobuilder just means we were lucky that time, not that the package
is ok. Package building should be deterministic, not something which
happens only in the official autobuilders by chance.

The previous example is not just a mental experiment. I have found
some packages which fail randomly in the past:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=817033

Thanks.

#979656#38
Date:
2016-07-19 10:20:42 UTC
From:
To:
Le 19/07/2016 à 10:32, Santiago Vila a écrit :
Looking at the list of processes should be enough.

Having a test case with only llvm would help upstream too.
No, I haven't find the time to work on this and probably won't as it doesn't occur much.
However, I would be happy to apply a patch for this.
If you can help, sure.
Here, we should not list all the tests (too many of them). We should instead dump the list of tests being executed when the testsuite is killed.
I stand by my initial decision. I am building the LLVM toolchain twice a day on 2 architectures for 8 Debian/Ubuntu releases + maintain Debian packages.
I think the process is reliable enough for Debian.

However, as I said, I am happy to apply any patch to help you in your goals.

Sylvestre

#979656#43
Date:
2016-07-19 12:09:16 UTC
From:
To:
Maybe, but for that to happen I have to be "present" during the build,
so to speak, and the build takes at least five hours in the fastest machine
I had. Looking at a build log in real time is not a very entertaining
TV program.

It would be much better if the build process printed something on
stdout during the tests.

Hmm. Why not? The building of a package should not be a black box.

That would be the natural thing. Small packages have small build logs.
Large packages have large build logs. Packages that are not very large
but have a lot of tests have large build logs as well.

We take that for granted and accept it as a normal thing. It's not a
problem.

But even if a large build log is a problem at all, we have a bigger problem
as a result, which is that we don't know what's happening at all.
mind if I ask some simple questions.

For example: I look at debian/rules and I don't see anything which
deliberately disables build log verbosity.

Is this lack of verbosity in the test suite a Debian change or is it
an upstream thing?

Thanks.

#979656#48
Date:
2016-07-19 12:16:27 UTC
From:
To:
Le 19/07/2016 à 14:09, Santiago Vila a écrit :
indeed ;)
because the current test execution is working fine (except for your issue of course).
When a test is failing, I only get what I need and I don't have to dig for the actual error.
This is upstream.
Only failing tests are displayed in verbose.
I am not doing any changes for this in llvm of clang (I am (was?) doing it for lldb but this is not your issue here).


Sylvestre

#979656#53
Date:
2016-07-19 14:02:14 UTC
From:
To:
I see.

Such algorithm (showing the test that failed, only when one of them fails)
is ok when the result of a test is either pass or fail.

In real life, a test may either pass or fail but also make the build to hang.
In the last case we don't know which test is the one that made the build
to hang, precisely because the build hang, and this is why I would
suggest everybody (upstreams and maintainers) to enable verbose tests.

Thanks.

#979656#58
Date:
2016-07-19 16:12:51 UTC
From:
To:
tags 819278 help
thanks

I would welcome any help to reproduce this.

Some theories about why this might fail:


* Lack of memory. Thanks to a cron job looking at /proc/meminfo, I know
that llvm-toolchain-3.8 needs at least 4600 M of memory to build. Last
two times I tried I had only 3500 M and 8000 M of swap.

If this were the case, I would expect an "Out of memory" error,
but not a hang.


* Using only 1 CPU. My virtual machines have only one CPU most of the
time. Since I collect measurements of the time required to build
packages, I tell sbuild to define parallel=1 in DEB_BUILD_OPTIONS,
so that all measurements are on the same scale.

In the past, I found at least one case where a package built fine with
2 CPU and FTBFS with only 1 CPU. It happened to be a bug in the
Makefiles.


* I build with "dpkg-buildpackage -A". It could still be that the test
suite expects some program to be there which was not created in the
build-indep target and hangs as a result.


* Some bug in sbuild which nobody reported yet. I'm using sbuild 0.68
from jessie-backports because version 0.69 has a bug that makes the
required disk space figures to be very wrong.


Thanks.

#979656#65
Date:
2018-10-31 05:43:06 UTC
From:
To:
Dear submitter,

as the package llvm-toolchain-3.8 has just been removed from the Debian archive
unstable we hereby close the associated bug reports.  We are sorry
that we couldn't deal with your issue properly.

For details on the removal, please see https://bugs.debian.org/873331

The version of this package that was in Debian prior to this removal
can still be found using http://snapshot.debian.org/.

This message was generated automatically; if you believe that there is
a problem with it please contact the archive administrators by mailing
ftpmaster@ftp-master.debian.org.

Debian distribution maintenance software
pp.
Scott Kitterman (the ftpmaster behind the curtain)

#979656#74
Date:
2019-03-04 17:11:02 UTC
From:
To:
Hello Sylvestre.

Packages llvm-toolchain-6.0 and llvm-toolchain-7 in buster show the
same behaviour in my building environment than llvm-toolchain-3.8, but
before reopening/cloning this bug it may be worth to comment here first.

Back in 2016, you said:
so weird which is almost impossible to believe that it could happen.

For example, if somebody reports that "ls" segfaults, one would
probably think that it's a problem in the user's machine, like
filesystem corruption, memory corruption, or something alike,
because otherwise a lot of people would notice.

However, continuous testing and jenkins only help to discover bugs
which happen in a similar environment. So, for example, if you always
build llvm in multi-cpu machines and it happens that it does not build
ok in single-cpu machines, the fact that you build llvm twice a day on
several differentr architectures will not help to discover such kind
of bug.

And that's precisely my theory here: I believe llvm build has never
been tested in a single-CPU system.

So: Would be possible for you to try building the package on a
single-cpu system at least once? (I can provide a virtual machine for
you if setting up such a system is a burden for you).

(If my suspicious is true, the subject ("build not reliable") would fit.
I would call a build "reliable" when its success or not does not
depend on things not in debian/control, i.e. when it builds ok in both
slow or fast computers, and regardless of vendor or number of CPU cores.
This is what I would expect from an Operating System which prides to
call itself Universal, and also this is the spirit of Debian Policy
when it says that "it must be possible to build the package when
build-essential and build-dependencies are installed". We don't have a
control field to require more than one CPU).

(Note: If you are wondering why I'm using single-CPU systems for my QA
work: They are usually cheaper and more efficient per €. In my experience,
multi-cpu is only "better" when you don't have to pay for it).

Sorry for the long email.

Thanks a lot.

#979656#83
Date:
2021-01-09 18:54:39 UTC
From:
To:
severity 819278 serious
clone 819278 -1
clone 819278 -2
reassign -1 llvm-toolchain-6.0
reassign -2 llvm-toolchain-7
retitle -1 llvm-toolchain-6.0: Test suite hangs the autobuilder on single-CPU systems
retitle -2 llvm-toolchain-7: Test suite hangs the autobuilder on single-CPU systems
thanks

I'm going to submit patches next, they will just disable the "LLD"
tests, which are the ones that are known to fail.

Note: I'm changing the severity according to Policy 4.2:

 If build-time dependencies are specified, it must be possible to build
 the package and produce working binaries on a system with only
 essential and build-essential packages installed and also those
 required to satisfy the build-time relationships (including any
 implied relationships).

Thanks.

#979656#102
Date:
2021-01-09 19:03:28 UTC
From:
To:
tags 979656 + patch
thanks

Dear LLVM maintainers:

Please apply the following patch for Debian 10.8.

Thanks.

#979656#113
Date:
2021-01-09 19:25:06 UTC
From:
To:
in interactive mode (i.e. dpkg-buildpackage, not sbuild).

In the machine where I didn't apply the patch, this is what was shown
on screen at the moment of hangup:


[100%] Running lld test suite
cd
/home/sanvila/llvm-toolchain-7-7.0.1/build-llvm/tools/clang/stage2-bins/tools/lld/test
&& /usr/bin/python2.7
/home/sanvila/llvm-toolchain-7-7.0.1/build-llvm/tools/clang/stage2-bins/./bin/llvm-lit
-sv --param
lld_site_config=/home/sanvila/llvm-toolchain-7-7.0.1/build-llvm/tools/clang/stage2-bins/tools/lld/test/lit.site.cfg
--param
lld_unit_site_config=/home/sanvila/llvm-toolchain-7-7.0.1/build-llvm/tools/clang/stage2-bins/tools/lld/test/Unit/lit.site.cfg
/home/sanvila/llvm-toolchain-7-7.0.1/build-llvm/tools/clang/stage2-bins/tools/lld/test
                                -- Testing: 1808 tests, 1 threads --
				 92%
[========================================================================------]
ETA: 00:00:10
lld :: mach-o/write-final-sections.yaml


I agree that this is for historical interest only, as I guess that
nobody here wants to investigate the failure, and everybody here
probably agrees that disabling the tests is a much better course of
action.

Thanks.

#979656#118
Date:
2025-06-16 11:10:04 UTC
From:
To:
close 979656 1:7.0.1-12+rm
thanks

Hi. I'm closing this one because the package does not exist anymore
in any supported distribution.

( Disabling the tests in the cases where we know for sure that
   they hang the machine would have been an acceptable outcome,
   but now it's too late )

Thanks.