I suggest adding a 'nocheck' variation, that sets DEB_BUILD_OPTIONS=nocheck during the build, and enabling it by default. The reason for doing so is that one could imagine that a package produces differing results depending upon whether the tests were run or not, but also (given that the tests will have passed during the normal build) the tests failing during the varied build seems unlikely to be identifying faults that are worth fixing, and so is just a waste of cycles. This idea is prompted by `busybox` where the tests fail in the varied scenario, despite the fact that the package is reproducible. Here they are failing: https://salsa.debian.org/installer-team/busybox/-/jobs/3227197 (among other things, du produces weird results when the `fileordering` variation is active, claiming the 1MB directoy is 2MB so the tests fail, so the varied package is not produced, so we don't get to see that it was reproducible: https://salsa.debian.org/installer-team/busybox/-/blob/master/testsuite/du/du-m-works ) I found a couple of ways of making the issue go away: 1) disabling the 'fileordering' variation, thus: https://salsa.debian.org/installer-team/busybox/-/commit/17387890c73388e1f56a6ae9fbc79783095b4e86 https://salsa.debian.org/installer-team/busybox/-/jobs/3233259 2) telling the package to skip the tests when doing the variations: https://salsa.debian.org/installer-team/busybox/-/commit/5260442e8ceea220fa36bdda169978d15108f781 which is setting this in the salsa-ci.yml: SALSA_CI_REPROTEST_ARGS: --variations=environment.variables+=DEB_BUILD_OPTIONS=nocheck https://salsa.debian.org/installer-team/busybox/-/jobs/3235476 Option 2) is what I'm suggesting making into a default variation. If nothing else it will speed up testing of packages with extensive test suits. Cheers, Phil.
Hi Phil, [...] [...] as discussed in RL, /me likes. :) Please provide a MR including documenation updates as needed. And thank you very much for using and improving reprotest!
Sounds reasonable! Less sure... Indeed! How do you know weather the bugs it is identifying are worth fixing? It could also identify non-deterministic failures, or failures triggered by specific build environment configurations... Consistantly? Then, maybe the test needs to be improved? I think it's a valuable feature, but I'm not entirely sure weather it should be default or not... live well, vagrant
Vagrant Cascadian <vagrant@reproducible-builds.org> writes: The point is that if the package is reproducible, then the fact that its tests fail when run in a weird environment (that may never be found in the wild) seems rather likely to be finding errors in the tests rather than errors in the program that gets shipped. Even if busybox's du really does have a bug where it miscounts the sizes of files when run under the fileordering variation, I'm not sure that breaking the ability to confirm that the package is reproducible is justified in order to find that bug. I'm afraid I've not yet managed to work out what's behind the mis-counting, but my first guess is that it's more likely to be something in the fuse system presenting the data than in du's counting of it. Of course, if the package is not reproducible, the tests may well fail because the package ends up containing new bugs that are only present in the variant-built package, but then its also going to show up as non-reproducible, so does that really make a difference? Cheers, Phil.
Fair point! True, though it may make things harder to verify reproducibility in practice, especially if it is a fairly "normal" variation that triggers the issue... It is a balancing act... I guess I'd be fine with the defaults to go either way, but it would be important to be able to enable or disable however this gets implemented. live well, vagrant
Vagrant Cascadian <vagrant@reproducible-builds.org> writes: Absolutely. That's why I was requesting that it be a variation in its own right, since that should allow one to specify:
The variations are defined in reprotest/build.py (each variation is a function) and listed in VARIATIONS. Then __init__.py will pull them in with some magic. Overall I don't consider this format particularly programmer-friendly.
Should we merge #786644 and #1019742? Or should we consider #1019742 to be "have the option" and #786644 to be "enable it by default"? I'm willing to try implementing this, if we agree that having it is a good idea. Maybe use _PROFILES rather than _OPTIONS and allow it to be more general than just nocheck? There is a way to vary nocheck but continue to check for test failures in the less-standard environment: make the *first* build the nocheck build. (timezone is precedent that the first build is allowed to do non-default things. No strong opinion on whether we actually want that.) On whether nocheck even should always mean no change to output: DEB_BUILD_PROFILES=nocheck (which requires also setting DEB_BUILD_OPTIONS=nocheck) *is* defined (at https://wiki.debian.org/BuildProfileSpec#Registered_profile_names ; Policy doesn't describe build profiles at all, #757760) as not changing the build result. (Packages that ship test results/tools and want to make this optional are probably supposed to use DEB_BUILD_PROFILES=noinsttest instead.) However, I'm not aware of any systematic attempts to check this, and hence, I don't know how often it is violated in practice. There appears to have been at least one mass rebuild with nocheck, but it only seems to have filed bugs for packages that failed to build *at all* with nocheck (e.g. #1086765). Hence, I suggest implementing the option but not initially making it the default, doing a large (maybe archive-wide) run with it enabled, and only then deciding whether to enable it by default. On the time/resource savings: I suggest that this run also record the duration of each build, to provide an estimate of the time/resource saving. Users that care about resource use may already be making *both* builds nocheck, e.g. some of my packages have SALSA_CI_REPROTEST_ARGS: "--append-build-command=-Pnocheck" (Salsa reprotest defaults to a 1hr time limit). Hence, any implementation of this should probably avoid breaking such existing mechanisms.
As the submitter of #786644, it was certainly meant as performing the testing by default not merely having it as an option. Allowing other variations also is useful. For instance, pkg.unbound.libonly also is supposed to be reproducible in the sense that any package that is being produced matches the ones that are produced when building without build profiles, but the set of packages produces ends up being smaller. So generally, being able to verify reproducibility of a particular profile is a useful feature separately. Sounds ok to me. Confirmed. I note that a nocheck ftbfs is considered release-critical by the release team, because the autoremover disregards dependencies annotated <!nocheck> and expects that you can at any time remove such dependencies by reducing testing of the package. If the package ftbfs with nocheck, that property no longer holds and testing no longer is self-contained. That said, nobody is testing this systematically, so I guesstimate something between 50 and 500 packages that ftbfs in such a way in unstable right now. Reproducibility likely degrades in even more cases as the compiler flags may change and they influence the buildid. Since cross building uses nocheck, I have been filing a couple of nocheck ftbfs bugs maybe one to two per month on average. Reasonable. Good idea. So thanks for working on these old bugs! Helmut
Rebecca N. Palmer wrote: I like the idea of varying nocheck, or at least exploring the concept. From personal experience, I think it will actually cause a surprisingly large number of packages to become unreproducible. Many many packages generate stuff during test runs which then gets installed into a binary package. I can only reliably come across these when the output is non-deterministic, but given how many instances of this there are, I suspect there are a lot more packages that generate *deterministic* stuff. Just a small thing regarding the two bugs you suggest merging: #786644 is filed against jenkins.debian.org (ie. the service powering tests.reproducible-builds.org), whilst #1019742 is filed against the reprotest package. It is not actually that obvious, but tests.reproducible-builds.org does not use reprotest to do its varations, so it is "technically" correct that they are different bugs. They, of course, could still be merged, or be used in the "add the option" and "enabled by default" schema as you suggest. But just to remind anyone following these bugs that reprotest is technically a different Thing from tests.reproducible-builds.org. Regards,
same here. I suspect the same. indee. and thanks for pointing this out. and then there is https://reproduce.debian.net now, which aims to reproduce Debian binary packages distributed via deb.debian.org and which has become almost trivial to setup, as documented in https://reproduce.debian.net/rebuilderd-setup.html currently. Once it has become *really* trivial to setup, that is once rebuilderd has been packaged as well and available in trixie, setting up a rebuilderd instance will be trivial and configuring it to rebuild Debian with nocheck should also be trivial then. And then, there will be hard data showing how many packages cannot be reproduced when rebuilding with nocheck while they can without nocheck. The machine specs to rebuild Debian on amd64 in 4-6 weeks should roughly be a dedicated system with a modern CPU with 64gb RAM and 500gb diskspace. rust-rebuilderd-common, rust-rebuild-ctl and rust-rebuilderd-worker are already packaged and available in trixie. rust-rebuilderd is not there yet, but installing with git clone and make install is also rather trivial (and documented in above URL). So basically anyone could set this up *now*. And I'd also be glad to help! .oO( all I want for xmas is a rebuilderd setup for every Debian arch. ;)
I take those to be a yes to wanting this option to *exist*, though if it already exists in rebuilderd then it might make more sense to use that. (It would probably also make sense to use rebuilderd instead of reprotest on Salsa to save resources, i.e. requiring 1 extra build instead of 2.) Helmut Grohne wrote: > Reproducibility likely degrades in even more cases as the compiler flags may change and they influence the buildid. Chris Lamb wrote: > Many many packages generate stuff during test runs which then gets installed into a binary package. If either of these turn out to be very common, it might make more sense to change the policy from "nocheck must not change anything at all in the binary packages" to "nocheck must not remove any functionality from the binary packages (including testing tools - use noinsttest for that)", but we probably want to gather more data before actually deciding.
For the record:
For packages that ship test tools (automated tests, manual tests or
examples to be run at a later time), and want to make them optional,
yes they are supposed to use DEB_BUILD_PROFILES=noinsttest instead of
nocheck. For example, gtk4 and libsdl3 implement this correctly, to the
best of my ability; so do dbus and glib2.0, which have more complicated
interactions with other build profiles like nocheck.
For packages that ship test *results*, the recommendation is "don't"
(because in practice it's very rare for them to be reproducible)
and this is out-of-scope for DEB_BUILD_PROFILES=noinsttest.
<https://wiki.debian.org/BuildArtifacts> and
<https://salsa.debian.org/debian/sbuild/-/merge_requests/14> are a
possible solution to that, which I should pick up and revisit at some
point (it's on my list but my list is very long).
smcv
I missed that question, probably because the answer is "no, they have been filed against different packages and thus usages: #786644 is about tests.reproducible-builds.org/debian while #1019742 is about src:reprotest"