#1019742 reprotest: add a variation that sets DEB_BUILD_OPTIONS=nocheck

#1019742#5
Date:
2022-09-14 14:00:04 UTC
From:
To:
I suggest adding a 'nocheck' variation, that sets DEB_BUILD_OPTIONS=nocheck
during the build, and enabling it by default.

The reason for doing so is that one could imagine that a package produces
differing results depending upon whether the tests were run or not, but also
(given that the tests will have passed during the normal build) the tests
failing during the varied build seems unlikely to be identifying faults that are
worth fixing, and so is just a waste of cycles.

This idea is prompted by `busybox` where the tests fail in the varied scenario,
despite the fact that the package is reproducible.

Here they are failing:

https://salsa.debian.org/installer-team/busybox/-/jobs/3227197

  (among other things, du produces weird results when the `fileordering`
   variation is active, claiming the 1MB directoy is 2MB so the tests fail, so
   the varied package is not produced, so we don't get to see that it was
   reproducible:
https://salsa.debian.org/installer-team/busybox/-/blob/master/testsuite/du/du-m-works
   )


I found a couple of ways of making the issue go away:

  1) disabling the 'fileordering' variation, thus:

https://salsa.debian.org/installer-team/busybox/-/commit/17387890c73388e1f56a6ae9fbc79783095b4e86

https://salsa.debian.org/installer-team/busybox/-/jobs/3233259

  2) telling the package to skip the tests when doing the variations:

https://salsa.debian.org/installer-team/busybox/-/commit/5260442e8ceea220fa36bdda169978d15108f781

    which is setting this in the salsa-ci.yml:
      SALSA_CI_REPROTEST_ARGS: --variations=environment.variables+=DEB_BUILD_OPTIONS=nocheck

https://salsa.debian.org/installer-team/busybox/-/jobs/3235476


Option 2) is what I'm suggesting making into a default variation.

If nothing else it will speed up testing of packages with extensive test suits.

Cheers, Phil.

#1019742#10
Date:
2022-09-14 14:31:39 UTC
From:
To:
Hi Phil,
[...]
[...]

as discussed in RL, /me likes. :) Please provide a MR including documenation
updates as needed. And thank you very much for using and improving reprotest!

#1019742#15
Date:
2022-09-14 14:54:30 UTC
From:
To:
Sounds reasonable!

Less sure...

Indeed!

How do you know weather the bugs it is identifying are worth fixing? It
could also identify non-deterministic failures, or failures triggered by
specific build environment configurations...

Consistantly? Then, maybe the test needs to be improved?

I think it's a valuable feature, but I'm not entirely sure weather it
should be default or not...


live well,
  vagrant

#1019742#20
Date:
2022-09-14 16:08:56 UTC
From:
To:
Vagrant Cascadian <vagrant@reproducible-builds.org> writes:

The point is that if the package is reproducible, then the fact that its
tests fail when run in a weird environment (that may never be found in
the wild) seems rather likely to be finding errors in the tests rather
than errors in the program that gets shipped.

Even if busybox's du really does have a bug where it miscounts the sizes
of files when run under the fileordering variation, I'm not sure that
breaking the ability to confirm that the package is reproducible is
justified in order to find that bug.

I'm afraid I've not yet managed to work out what's behind the
mis-counting, but my first guess is that it's more likely to be
something in the fuse system presenting the data than in du's counting
of it.

Of course, if the package is not reproducible, the tests may well fail
because the package ends up containing new bugs that are only present in
the variant-built package, but then its also going to show up as
non-reproducible, so does that really make a difference?

Cheers, Phil.

#1019742#25
Date:
2022-09-14 17:13:45 UTC
From:
To:
Fair point!

True, though it may make things harder to verify reproducibility in
practice, especially if it is a fairly "normal" variation that triggers
the issue...


It is a balancing act...

I guess I'd be fine with the defaults to go either way, but it would be
important to be able to enable or disable however this gets implemented.


live well,
  vagrant

#1019742#30
Date:
2022-09-15 10:14:05 UTC
From:
To:
Vagrant Cascadian <vagrant@reproducible-builds.org> writes:

Absolutely.  That's why I was requesting that it be a variation in its
own right, since that should allow one to specify:

#1019742#33
Date:
2022-09-16 09:44:16 UTC
From:
To:
The variations are defined in reprotest/build.py (each variation is a
function) and listed in VARIATIONS.
Then __init__.py will pull them in with some magic.

Overall I don't consider this format particularly programmer-friendly.

#1019742#38
Date:
2024-12-14 20:47:47 UTC
From:
To:
Should we merge #786644 and #1019742?  Or should we consider #1019742 to
be "have the option" and #786644 to be "enable it by default"?

I'm willing to try implementing this, if we agree that having it is a
good idea.  Maybe use _PROFILES rather than _OPTIONS and allow it to be
more general than just nocheck?

There is a way to vary nocheck but continue to check for test failures
in the less-standard environment: make the *first* build the nocheck
build.  (timezone is precedent that the first build is allowed to do
non-default things.  No strong opinion on whether we actually want that.)

On whether nocheck even should always mean no change to output:

DEB_BUILD_PROFILES=nocheck (which requires also setting
DEB_BUILD_OPTIONS=nocheck) *is* defined (at
https://wiki.debian.org/BuildProfileSpec#Registered_profile_names ;
Policy doesn't describe build profiles at all, #757760) as not changing
the build result.  (Packages that ship test results/tools and want to
make this optional are probably supposed to use
DEB_BUILD_PROFILES=noinsttest instead.)

However, I'm not aware of any systematic attempts to check this, and
hence, I don't know how often it is violated in practice.  There appears
to have been at least one mass rebuild with nocheck, but it only seems
to have filed bugs for packages that failed to build *at all* with
nocheck (e.g. #1086765).

Hence, I suggest implementing the option but not initially making it the
default, doing a large (maybe archive-wide) run with it enabled, and
only then deciding whether to enable it by default.

On the time/resource savings:

I suggest that this run also record the duration of each build, to
provide an estimate of the time/resource saving.

Users that care about resource use may already be making *both* builds
nocheck, e.g. some of my packages have SALSA_CI_REPROTEST_ARGS:
"--append-build-command=-Pnocheck" (Salsa reprotest defaults to a 1hr
time limit).  Hence, any implementation of this should probably avoid
breaking such existing mechanisms.

#1019742#43
Date:
2024-12-15 16:00:24 UTC
From:
To:
As the submitter of #786644, it was certainly meant as performing the
testing by default not merely having it as an option.

Allowing other variations also is useful. For instance,
pkg.unbound.libonly also is supposed to be reproducible in the sense
that any package that is being produced matches the ones that are
produced when building without build profiles, but the set of packages
produces ends up being smaller. So generally, being able to verify
reproducibility of a particular profile is a useful feature separately.

Sounds ok to me.

Confirmed.

I note that a nocheck ftbfs is considered release-critical by the
release team, because the autoremover disregards dependencies annotated
<!nocheck> and expects that you can at any time remove such dependencies
by reducing testing of the package. If the package ftbfs with nocheck,
that property no longer holds and testing no longer is self-contained.

That said, nobody is testing this systematically, so I guesstimate
something between 50 and 500 packages that ftbfs in such a way in
unstable right now. Reproducibility likely degrades in even more cases
as the compiler flags may change and they influence the buildid. Since
cross building uses nocheck, I have been filing a couple of nocheck
ftbfs bugs maybe one to two per month on average.

Reasonable.

Good idea.

So thanks for working on these old bugs!

Helmut

#1019742#48
Date:
2024-12-18 10:51:28 UTC
From:
To:
Rebecca N. Palmer wrote:

I like the idea of varying nocheck, or at least exploring the concept.

From personal experience, I think it will actually cause a
surprisingly large number of packages to become unreproducible. Many
many packages generate stuff during test runs which then gets
installed into a binary package. I can only reliably come across these
when the output is non-deterministic, but given how many instances of
this there are, I suspect there are a lot more packages that generate
*deterministic* stuff.

Just a small thing regarding the two bugs you suggest merging:

#786644 is filed against jenkins.debian.org (ie. the service powering
tests.reproducible-builds.org), whilst #1019742 is filed against the
reprotest package. It is not actually that obvious, but
tests.reproducible-builds.org does not use reprotest to do its varations,
so it is "technically" correct that they are different bugs.

They, of course, could still be merged, or be used in the "add the
option" and "enabled by default" schema as you suggest. But just to
remind anyone following these bugs that reprotest is technically a
different Thing from tests.reproducible-builds.org.


Regards,

#1019742#53
Date:
2024-12-18 12:47:43 UTC
From:
To:
same here.

I suspect the same.

indee. and thanks for pointing this out.

and then there is https://reproduce.debian.net now, which
aims to reproduce Debian binary packages distributed via deb.debian.org
and which has become almost trivial to setup, as documented in
https://reproduce.debian.net/rebuilderd-setup.html currently.

Once it has become *really* trivial to setup, that is once rebuilderd
has been packaged as well and available in trixie, setting up a
rebuilderd instance will be trivial and configuring it to rebuild
Debian with nocheck should also be trivial then. And then, there
will be hard data showing how many packages cannot be reproduced when
rebuilding with nocheck while they can without nocheck.

The machine specs to rebuild Debian on amd64 in 4-6 weeks should roughly be
a dedicated system with a modern CPU with 64gb RAM and 500gb diskspace.

rust-rebuilderd-common, rust-rebuild-ctl and rust-rebuilderd-worker are
already packaged and available in trixie. rust-rebuilderd is not there yet,
but installing with git clone and make install is also rather trivial (and
documented in above URL). So basically anyone could set this up *now*.
And I'd also be glad to help!

.oO( all I want for xmas is a rebuilderd setup for every Debian arch. ;)

#1019742#58
Date:
2024-12-21 09:45:19 UTC
From:
To:
I take those to be a yes to wanting this option to *exist*, though if it
already exists in rebuilderd then it might make more sense to use that.

(It would probably also make sense to use rebuilderd instead of
reprotest on Salsa to save resources, i.e. requiring 1 extra build
instead of 2.)

Helmut Grohne wrote:
 > Reproducibility likely degrades in even more cases
as the compiler flags may change and they influence the buildid.
Chris Lamb wrote:
 > Many many packages generate stuff during test runs which then gets
installed into a binary package.

If either of these turn out to be very common, it might make more sense
to change the policy from "nocheck must not change anything at all in
the binary packages" to "nocheck must not remove any functionality from
the binary packages (including testing tools - use noinsttest for
that)", but we probably want to gather more data before actually deciding.

#1019742#63
Date:
2025-04-16 18:27:20 UTC
From:
To:
For the record:

For packages that ship test tools (automated tests, manual tests or
examples to be run at a later time), and want to make them optional,
yes they are supposed to use DEB_BUILD_PROFILES=noinsttest instead of
nocheck. For example, gtk4 and libsdl3 implement this correctly, to the
best of my ability; so do dbus and glib2.0, which have more complicated
interactions with other build profiles like nocheck.

For packages that ship test *results*, the recommendation is "don't"
(because in practice it's very rare for them to be reproducible)
and this is out-of-scope for DEB_BUILD_PROFILES=noinsttest.
<https://wiki.debian.org/BuildArtifacts> and
<https://salsa.debian.org/debian/sbuild/-/merge_requests/14> are a
possible solution to that, which I should pick up and revisit at some
point (it's on my list but my list is very long).

     smcv

#1019742#68
Date:
2025-04-16 18:44:12 UTC
From:
To:
I missed that question, probably because the answer is "no, they have been
filed against different packages and thus usages:
#786644 is about tests.reproducible-builds.org/debian while
#1019742 is about src:reprotest"