#1011268 release.debian.org: proposes autoremoving every package(?) when nvidia-graphics-drivers-tesla-470 is RC-buggy

#1011268#5
Date:
2022-05-19 10:37:07 UTC
From:
To:
dkms and nvidia-graphics-drivers-tesla-470 currently have RC bug #1010884
triggering autoremovals (it was closed today, so maybe it will cease to be
relevant soon).

I would expect the only packages affected by this to be packages that
specifically depend on dkms or on the NVIDIA proprietary driver.

However, looking at
https://udd.debian.org/cgi-bin/autoremovals.cgi
it looks as though literally every package is up for autoremoval:

Santiago Vila <sanvila@debian.org>
   base-files: buggy deps nvidia-graphics-drivers-tesla-470, flagged for removal in 36.8 days

This seems very wrong. I'm fairly sure base-files doesn't depend on
graphics drivers, and in any case the NVIDIA proprietary driver is in
non-free, so by policy no package in main can possibly depend on it.

It might be better if autoremovals were evaluated something like this:

1. work out what autoremovals would take place if main was the only
   archive area that existed;
2. then, do the same for the union of main, contrib and non-free, but
   ignore attempts to autoremove additional packages from main during this
   phase

That would mean that a RC-buggy main package could trigger autoremovals
from main, contrib and/or non-free, but RC-buggy contrib/non-free packages
could only trigger autoremovals from contrib/non-free.

    smcv

#1011268#10
Date:
2022-05-19 11:58:48 UTC
From:
To:
Control: reassign -1 qa.debian.org

It's a great surprise to me that this happened. I *suspect* it's due to
that the fact that the bug you mentioned is assigned to both a key
package and a non-key package. As one can imagine, autoremovals are
based on bugs in testing. The bug in testing affects a non-key package,
but I wouldn't be surprised if that once it takes the bug number in (for
a non-key package) it isn't aware of the bts feature that a bug can be
assigned to more packages and doesn't handle this case.

With the understanding above, I suspect it's just because of the removal
of dkms, which is part of the key package set for $(reasons).

Autoremoval is supposed to only remove non-key packages and completely
ignore key packages. That's obviously not what happened, so a clear bug.

Technically the autoremoval is calculated by udd, hence reassigning there.

Paul

#1011268#17
Date:
2022-05-25 20:09:56 UTC
From:
To:
Hi Simon,

Turns out that the issue is back due to another RC bug for
nvidia-graphics-drivers-tesla-470, but this time there's no two package
bug involved.

While not having figured out where the bug exactly lies (I mean, which
lines of code), I think it's important to note that the
src:nvidia-settings (in main) is building a bin:nvidia-settings in
contrib. This is allowed by policy, but I think this is another case
(apart from the two package bugs) in the autoremoval script that isn't
ideally covered. I think we we don't want bin:nvidia-settings to be a
key (binary) package, even though we want to allow src:nvidia-settings
to be a key (source) package. This would be a new concept in the key
package definition. Chance has it that some Release Team members have
been discussing internally about viewing the key package set differently
already, because there are more potential boundaries to draw in the
current set. (As an example of what I am thinking of, in most cases I
would happily trade a grave bug against an FTBFS bug due to missing
<nocheck> or <nodoc> dependencies (by removing the said grave bug
containing package from testing). In the current case, we would have to
allow non-installability for the contrib binary package (which currently
needs RT intervention to enable the removal).

Currently, all dependencies of binaries built by a source key package
are treated as binaries built by key (source) packages. In this case the
source of nvidia-alternative becomes a key (source) package too, that is
nvidia-graphics-drivers, which even lives in non-free.

I also *suspect* that somewhere there's a "Provides" in play, as
otherwise I would have expected nvidia-graphics-drivers-testla-470 to
already have be a key package (and thus not removed), but I can't find
it yet. "Provides" in the key package calculation is just the first it
finds (if it's not already provided by something in the set found so far).

So, unless my hunch about the "Provides" is correct and we can find a
better way to determine the Providing key package, to fix this bug I
think that we need to implement fixes in several places:
1) the key package algorithm needs to ignore binary packages build for
contrib by key source packages when iterating.
2) autoremoval needs to be made aware that it shouldn't remove sources
in main if a binary built by it lives in contrib and would otherwise be
3) britney (the migration software) needs to be aware somehow that this
non-installability is OK-ish.

On the other hand, I prefer to considering this problem in the bigger
picture that ginggs and I are having with the current key package
definition and it's use.

Paul

#1011268#24
Date:
2022-05-26 21:01:17 UTC
From:
To:
This sounds like a bad idea to me.

Then the package cannot easily be built in the release.
This screws both derivatives that rebuild our distribution
and our security team.

Additionally, <nodoc> permits both different package contents and
a different set of binary packages to be built (empty -doc packages
might not be built).

Except for this bug in the autoremoval calculation, I do not see why it
would be relevant where in main/contrib/non-free the packages are.

A buggy key package in non-free is not really different from a buggy key
package in main, and the latter is the far more common problem.

cu
Adrian

#1011268#29
Date:
2022-06-05 18:25:39 UTC
From:
To:
Hi,

Neither the key package script nor the autoremoval script care about
components, so I currently don't think the issue lies there.
bin:nvidia-kernel-dkms (non-free, built from src:nvidia-graphics-drivers
in non-free) Depends (on amd64 only) on nvidia-firmware-470.103.01 which
is Provided by  bin:nvidia-kernel-support (built from
src:nvidia-graphics-drivers) *and* by bin:nvidia-tesla-470-kernel-dkms
(built from src:nvidia-graphics-drivers-tesla-470).

In _calculate_rdeps there's a check if there's a real package that
provides the Provides and prefers that over others, in case of virtual
provides it takes all of them. I believe that's where it goes wrong, as
can be seen above, a key package now becomes the reverse Depends of a
non-key package.

There's a couple of ideas:
- in the rdeps step, only add non-key packages as we're not going to
remove key packages (although the Release Team is contemplating to do
that in manually curated cases)
- if the source also builds a binary that's serving the Provides, don't
add it (as there's always a binary that Provides as long as the source
is there, so there's no gain in the check)
- if there's more than one provider, don't add the rdep relation as
there's a reasonable chance that not all of them are going to be removed
at the same time (let's not do this, it sounds fragile).
- during the *use* of the rdeps info, do something smart

Paul

#1011268#34
Date:
2022-06-05 21:18:02 UTC
From:
To:
dependency (or first alternative dep) on a non-unique virtual package at
all. Apt doesn't like that since it cannot rank the choices. (Even if in
this case there might be only one valid solution.)

I think that virtual package has grown two uses over time:
1.) a Provides/Conflicts relation to prevent co-installation of some
corresponding packages from src:nvidia-graphics-drivers and
src:nvidia-graphics-drivers-tesla-470 if the upstream version matches to
prevent file conflicts.
2.) a Provides/Depends relation to ensure the correct version of the
firmware file is available.

I'll think about that and fix it in src:nvidia*, but that shouldn't stop
you from improving the scripts. ;-)

Andreas

#1011268#39
Date:
2022-06-09 11:05:33 UTC
From:
To:
Hi Simon,
the to-be-removed packages shared a Provides that wasn't provided by any
real package of the same name. I have updated the autoremoval script and
untagged the offending bugs this morning. All seems to be fine now.

Paul

For reference:
https://salsa.debian.org/qa/udd/-/commit/78adf38dd2303688047e135d7e96e180b2163f26