#1011268 release.debian.org: proposes autoremoving every package(?) when nvidia-graphics-drivers-tesla-470 is RC-buggy #1011268
- Package:
- qa.debian.org
- Source:
- qa.debian.org
- Submitter:
- Simon McVittie
- Date:
- 2022-06-09 11:09:04 UTC
- Severity:
- normal
dkms and nvidia-graphics-drivers-tesla-470 currently have RC bug #1010884 triggering autoremovals (it was closed today, so maybe it will cease to be relevant soon). I would expect the only packages affected by this to be packages that specifically depend on dkms or on the NVIDIA proprietary driver. However, looking at https://udd.debian.org/cgi-bin/autoremovals.cgi it looks as though literally every package is up for autoremoval: Santiago Vila <sanvila@debian.org> base-files: buggy deps nvidia-graphics-drivers-tesla-470, flagged for removal in 36.8 days This seems very wrong. I'm fairly sure base-files doesn't depend on graphics drivers, and in any case the NVIDIA proprietary driver is in non-free, so by policy no package in main can possibly depend on it. It might be better if autoremovals were evaluated something like this: 1. work out what autoremovals would take place if main was the only archive area that existed; 2. then, do the same for the union of main, contrib and non-free, but ignore attempts to autoremove additional packages from main during this phase That would mean that a RC-buggy main package could trigger autoremovals from main, contrib and/or non-free, but RC-buggy contrib/non-free packages could only trigger autoremovals from contrib/non-free. smcv
Control: reassign -1 qa.debian.org It's a great surprise to me that this happened. I *suspect* it's due to that the fact that the bug you mentioned is assigned to both a key package and a non-key package. As one can imagine, autoremovals are based on bugs in testing. The bug in testing affects a non-key package, but I wouldn't be surprised if that once it takes the bug number in (for a non-key package) it isn't aware of the bts feature that a bug can be assigned to more packages and doesn't handle this case. With the understanding above, I suspect it's just because of the removal of dkms, which is part of the key package set for $(reasons). Autoremoval is supposed to only remove non-key packages and completely ignore key packages. That's obviously not what happened, so a clear bug. Technically the autoremoval is calculated by udd, hence reassigning there. Paul
Hi Simon, Turns out that the issue is back due to another RC bug for nvidia-graphics-drivers-tesla-470, but this time there's no two package bug involved. While not having figured out where the bug exactly lies (I mean, which lines of code), I think it's important to note that the src:nvidia-settings (in main) is building a bin:nvidia-settings in contrib. This is allowed by policy, but I think this is another case (apart from the two package bugs) in the autoremoval script that isn't ideally covered. I think we we don't want bin:nvidia-settings to be a key (binary) package, even though we want to allow src:nvidia-settings to be a key (source) package. This would be a new concept in the key package definition. Chance has it that some Release Team members have been discussing internally about viewing the key package set differently already, because there are more potential boundaries to draw in the current set. (As an example of what I am thinking of, in most cases I would happily trade a grave bug against an FTBFS bug due to missing <nocheck> or <nodoc> dependencies (by removing the said grave bug containing package from testing). In the current case, we would have to allow non-installability for the contrib binary package (which currently needs RT intervention to enable the removal). Currently, all dependencies of binaries built by a source key package are treated as binaries built by key (source) packages. In this case the source of nvidia-alternative becomes a key (source) package too, that is nvidia-graphics-drivers, which even lives in non-free. I also *suspect* that somewhere there's a "Provides" in play, as otherwise I would have expected nvidia-graphics-drivers-testla-470 to already have be a key package (and thus not removed), but I can't find it yet. "Provides" in the key package calculation is just the first it finds (if it's not already provided by something in the set found so far). So, unless my hunch about the "Provides" is correct and we can find a better way to determine the Providing key package, to fix this bug I think that we need to implement fixes in several places: 1) the key package algorithm needs to ignore binary packages build for contrib by key source packages when iterating. 2) autoremoval needs to be made aware that it shouldn't remove sources in main if a binary built by it lives in contrib and would otherwise be 3) britney (the migration software) needs to be aware somehow that this non-installability is OK-ish. On the other hand, I prefer to considering this problem in the bigger picture that ginggs and I are having with the current key package definition and it's use. Paul
This sounds like a bad idea to me. Then the package cannot easily be built in the release. This screws both derivatives that rebuild our distribution and our security team. Additionally, <nodoc> permits both different package contents and a different set of binary packages to be built (empty -doc packages might not be built). Except for this bug in the autoremoval calculation, I do not see why it would be relevant where in main/contrib/non-free the packages are. A buggy key package in non-free is not really different from a buggy key package in main, and the latter is the far more common problem. cu Adrian
Hi, Neither the key package script nor the autoremoval script care about components, so I currently don't think the issue lies there. bin:nvidia-kernel-dkms (non-free, built from src:nvidia-graphics-drivers in non-free) Depends (on amd64 only) on nvidia-firmware-470.103.01 which is Provided by bin:nvidia-kernel-support (built from src:nvidia-graphics-drivers) *and* by bin:nvidia-tesla-470-kernel-dkms (built from src:nvidia-graphics-drivers-tesla-470). In _calculate_rdeps there's a check if there's a real package that provides the Provides and prefers that over others, in case of virtual provides it takes all of them. I believe that's where it goes wrong, as can be seen above, a key package now becomes the reverse Depends of a non-key package. There's a couple of ideas: - in the rdeps step, only add non-key packages as we're not going to remove key packages (although the Release Team is contemplating to do that in manually curated cases) - if the source also builds a binary that's serving the Provides, don't add it (as there's always a binary that Provides as long as the source is there, so there's no gain in the check) - if there's more than one provider, don't add the rdep relation as there's a reasonable chance that not all of them are going to be removed at the same time (let's not do this, it sounds fragile). - during the *use* of the rdeps info, do something smart Paul
dependency (or first alternative dep) on a non-unique virtual package at all. Apt doesn't like that since it cannot rank the choices. (Even if in this case there might be only one valid solution.) I think that virtual package has grown two uses over time: 1.) a Provides/Conflicts relation to prevent co-installation of some corresponding packages from src:nvidia-graphics-drivers and src:nvidia-graphics-drivers-tesla-470 if the upstream version matches to prevent file conflicts. 2.) a Provides/Depends relation to ensure the correct version of the firmware file is available. I'll think about that and fix it in src:nvidia*, but that shouldn't stop you from improving the scripts. ;-) Andreas
Hi Simon, the to-be-removed packages shared a Provides that wasn't provided by any real package of the same name. I have updated the autoremoval script and untagged the offending bugs this morning. All seems to be fine now. Paul For reference: https://salsa.debian.org/qa/udd/-/commit/78adf38dd2303688047e135d7e96e180b2163f26