- Package:
- release.debian.org
- Source:
- release.debian.org
- Submitter:
- Lucas Nussbaum
- Date:
- 2026-03-01 21:03:24 UTC
- Severity:
- normal
Hi, I'd like to open a discussion about using https://debaudit.debian.net data to gate testing migrations. The service is quite new, so this would likely need a long testing period, but I guess that it's probably better to open the discussion early. In short, debaudit currently includes two "checkers" (orig-check and git2dsc). orig-check ensures that the orig tarball in Debian matches upstream's. git2dsc ensures that the Vcs-Git repository matches the Debian source package. I think that it would make sense to block migration when packages regress (similar to what was implemented for reproducible builds, if I remember correctly) -- that would allow packages that fail in testing to still migrate, while still gradually improving the overall status. The current regressions can be browsed on: https://debaudit.debian.net/orig-check/regressions/forky and https://debaudit.debian.net/git2dsc/regressions/forky There are currently 14 source packages that would be blocked because of orig-check regressions, and 32 because of git2dsc regressions. I'm not aware of false positives (= AFAIK, regressions are real problems that should be fixed). Data is refreshed every hour, and there's a JSON export with the relevant data: https://debaudit.debian.net/results.json Example: "aspell-ml": { "versions": [ /* ... details about each known version ... */ ], "migration_status": { "orig-check": { "can_migrate": false, "sid": { "diagnostic": "220 - uscan failed -- network error", "version": "0.04-1-11", "dsc_sha256": "4d1a4a2890eeb834dcdae96e5b80d8c3059cef021db5eb03d894bee8bed8e0e8", "url": "/orig-check/result/4d1a4a2890eeb834dcdae96e5b80d8c3059cef021db5eb03d894bee8bed8e0e8" }, "testing": { "diagnostic": "800 - identical after tarball normalization", "version": "0.04-1-10", "dsc_sha256": "2662500adb5b4e355e545b42e271dec5ae7b18d28db99b2df8883a15ea33232e", "url": "/orig-check/result/2662500adb5b4e355e545b42e271dec5ae7b18d28db99b2df8883a15ea33232e" } }, "git2dsc": { "can_migrate": false, "sid": { "diagnostic": "700 - generated dsc differs", "version": "0.04-1-11", "dsc_sha256": "4d1a4a2890eeb834dcdae96e5b80d8c3059cef021db5eb03d894bee8bed8e0e8", "url": "/git2dsc/result/4d1a4a2890eeb834dcdae96e5b80d8c3059cef021db5eb03d894bee8bed8e0e8" }, "testing": { "diagnostic": "910 - git-generated dsc identical to archive dsc after normalization", "version": "0.04-1-10", "dsc_sha256": "2662500adb5b4e355e545b42e271dec5ae7b18d28db99b2df8883a15ea33232e", "url": "/git2dsc/result/2662500adb5b4e355e545b42e271dec5ae7b18d28db99b2df8883a15ea33232e" } } } }, Best, Lucas
- 700 - generated dsc differs │ -asn1c deb devel unknown arch=any │ - asn1c-doc deb doc unknown arch=all │ +asn1c deb devel optional arch=any │ + asn1c-doc deb doc optional arch=all That looks like a false positive? - 290 - uscan did not produce an orig tarball with matching name With the same tarball, caused by * update watch file to version 5 IMHO it would be bad if people were forced to upload "new" upstream tarballs for the same upstream code when debian/watch changed. - Sid result: 220 - uscan failed -- network error Testing migration shouldn't depend on network connectivity to the upstream location. How will debaudit handle it when an upstream location adds a JavaScript Challenge that is not yet supported by uscan? cu Adrian
Hi, What happens here is: * asn1c (source) was built on a system with dpkg-dev 1.22.6ubuntu6.5, according to https://buildinfos.debian.net/ftp-master.debian.org/buildinfo/2026/01/30/asn1c_0.9.28+dfsg-6_source.buildinfo * asn1c dropped its Priority field with that upload * dpkg-dev issued 'unknown' in that case before dpkg-dev 1.22.12 (uploaded in January 2025) What should we do in that case? I think that it makes sense to require that source uploads are prepared with a toolchain that generates packages identical to our current toolchain. But there's a normalization step I could use to ignore those differences if needed. I'm not sure. If there's a change in debian/watch that causes a change in the used upstream tarball, maybe it's worth uploading a fake new upstream version? Assuming it's a transient problem, the maintainer could request a retry on debaudit. Also debaudit already has a retry policy, that could be made more aggressive for regressions if that's considered necessary. debaudit just uses uscan, so it's a wider problem. Note that some upstream locations, like GitHub, already do quite aggressive rate limiting (which is fine for debaudit because it doesn't perform that many requests). Maybe that rate limiting alone will be enough to avoid add JS-based challenges? Lucas
That wouldn't be an easy requirement. Many maintainers (including myself) are not running unstable on the machine where uploads are made from, and there is a diversity of workflows how to create and sign source packages. Running "dpkg-buildpackage -S -nc" on a stable system to create and sign a source package for unstable would not be an exotic setup. I haven't checked how tag2upload is setup, but this is the kind of production service where it would make sense not having to worry about toolchain breakages in unstable. The fundamental issue is that debian/watch is for downloading the next upstream version. A maintainer might update debian/watch today for downloading xz instead of gz when upstream announces there will only be xz in the future. Same for the corresponding upstream signing key, we should not ship and trust the revoked one that created the signature in our archive for the current version a decade ago but the one that might sign the next release. I won't touch debian/watch when doing an NMU for an RC bug, even when it points to a long gone upstream location. Your check might make sense on the version where a new upstream tarball is uploaded (which might have been an upload to experimental), but when an upload didn't touch the upstream tarball then it shouldn't be checked. That's true, but it would move that from a minor annoyance to a blocker for testing migration. Perhaps for GitHub, but the number of distinct upstream locations in debian/watch is likely 4 digit. salsa.debian.org is an example of an upstream location used in debian/watch of some packages where a JS-based challenge is already in use today, minicom and postgresql-debversion are examples of packages where uscan can no longer download the upstream source of the package in Debian due to that. cu Adrian
Do you mean *blocking* migration, or do you mean *delaying* migration?
I'm not a release team member, but I think we should be very careful
with what we completely block from migrating: if an issue blocks
migration, we're effectively saying it's "worse than RC".
I think a key difference between reproducible builds and orig-check is
that reproducible builds usually only regress as a result of changes to
either the package under test or one of its build-dependencies, whereas
downloading software from upstream can easily regress for upstream
reasons that are outside our control.
And, a key difference between reproducible builds and both of these
tools is that reproducible builds are very careful to "pin" the versions
of relevant packages in use (via the extensive work that has been done
on generating and retaining .buildinfo), whereas these two checks seem
to be using the latest dpkg-source and the latest uscan from unstable?
It's no longer visible because it has migrated, but at the time I first
looked at this, glib2.0 was showing up as having a git2dsc regression.
The diff appeared to be that the tarball had been generated with a
different version of dpkg, which produced a different (but equally
valid!) representation of build-profiles: one .dsc had profile:v1 (added
in dpkg 1.23.0) and the other did not.
But I happen to know that for glib2.0/2.86.3-5, what's in git should
correspond exactly to what ended up in the archive, because I uploaded
it using tag2upload, which has that as one of its key design
constraints. So either tag2upload is helping to carry out an extremely
subtle attack (unlikely), or a wrong assumption is being made somewhere.
I assume that the difference is that tag2upload runs in a stable
container (or possibly in a not-quite-bleeding-edge sid container) when
it generates its .dsc, therefore uses an older dpkg-source than the one
that git2dsc uses to attempt to reproduce the package? And I think that
is, and should be, considered to be valid: it should be equally OK for a
maintainer to generate a .dsc on a stable or testing system, as long as
they build and test binaries on an unstable
machine/VM/container/chroot/thing.
I've seen comments elsewhere saying that it should be considered to be a
bug if a source package can't be successfully built by the latest
dpkg-source, and I think that's valid (albeit possibly a bug in dpkg
rather than in the source package, if dpkg regresses); but that isn't
the same as saying that it's a bug if *the specific source package that
was uploaded* wasn't generated by the latest dpkg-source.
A possible way to mitigate that class of false positives would be if
git2dsc did this:
- take the .dsc from the archive, A
- take the .dsc that it produces from git, B
- unpack A
- repack A with the same dpkg-source version that it used for B, to get C
- compare B with C
And, as with other "every package in git" initiatives like dgit and
tag2upload, I think that if we are pushing to get every package in git
with upstream source code included, we will need a mechanism to make
exceptions for very large data packages like 0ad-data, where this
treatment probably doesn't make sense and would cost a disproportionate
amount of processing time.
smcv
Currently, yes, debaudit is using the latest dpkg and uscan from unstable. The "normalization" step tries to suppress differences that were introduced in recent dpkg versions. That's a good idea indeed. I will do that, it's actually cleaner than the current normalization step. Lucas
Note that in that case, dpkg-dev/stable would have have been fine. I suppose that things will converge to a state where some URLs will be blocked by JS-based challenges, and some others will not (but remain subject to rate-limiting). I suppose that the URLs typically used by uscan will fall in the second category. Lucas
This has been implemented, and relevant packages have been reprocessed. https://debaudit.debian.net/git2dsc/regressions/forky is up to date. Lucas
Hi Lucas, Thank you for providing that service, I think it's a great asset for Debian. However, I've been thinking about this for a while, and I'm not convinced it's right to include this in the migration checks in the near future. Both tools don't really improve what's in the archive. It seems to me that this is better exposed on tracker and similar pages. I recall Adrian referred to the same, but uscan and d/watch have been made for a different purpose. Using it like this feels like it needs to be accepted by the Debian community first. Incidentally (as an example), one [1] of the very few packages that I maintain regressed very recently because I switched some time ago to a git based version because upstream appeared to have stopped making releases and suddenly they did. I uploaded as 1.11-1 but expect +git... versions soon again. I didn't care to fix the watch file, also because I feared I would forget again. I don't want to block migrations for cases like this. I also don't like to depend on real external sources (i.e. services not managed by the Debian community). Also here, I don't want to treat a mismatch, as Simon put it, worse than RC. I *think* what he means is that if an upload to unstable fixes an RC bug, than blocking migration on this would prevent the RC bug from being fixing in testing. I would immediately unblock the migration if I'd noticed. Similarly, we currently treat out-of-sync for more than 30 days as RC; I consider being out of sync much worse than have the repository not being up to date (which at this moment feels like nice-to-have at best to me). So while a delay might be acceptable, a block would not. But, what's more, the bug even isn't in what's uploaded, but elsewhere. It can even be fixed elsewhere (e.g. by pushing the missing pieces). I think influencing migration for that is wrong. We're close with reproducible builds, but not there yet (it's only for-info). It's been a process of years. That's a lot. Normally with new policies the numbers should be better than that. This is my opinion as a Release Team member at this moment. If the rest of the team thinks it's nice to have, I wouldn't veto even if I could. Paul [1] https://debaudit.debian.net/orig-check/ type in viking (is there a direct link?)
Lucas> Hi, I'd like to open a discussion about using
Lucas> https://debaudit.debian.net data to gate testing
Lucas> migrations. The service is quite new, so this would likely
Lucas> need a long testing period, but I guess that it's probably
Lucas> better to open the discussion early.
Lucas> In short, debaudit currently includes two "checkers"
Lucas> (orig-check and git2dsc). orig-check ensures that the orig
Lucas> tarball in Debian matches upstream's. git2dsc ensures that
Lucas> the Vcs-Git repository matches the Debian source package.
Lucas> I think that it would make sense to block migration when
Lucas> packages regress (similar to what was implemented for
Lucas> reproducible builds, if I remember correctly) -- that would
Lucas> allow packages that fail in testing to still migrate, while
Lucas> still gradually improving the overall status.
I'm not convinced we have a consensus that using upstream tarballs is
the best practice.
I agree that we have a consensus that it is a best practice, but I think
that for example repacking an upstream git tag is also a best practice
or other workflows that are more git centered.
I don't want to have to get release team permission to get a migration
when I move from one of these best practices to another. I absolutely
support blocking migrations on unintended regressions if the false
positive rate is lower, but I think maintainers should have a way to
indicate a workflow change.
Lucas> Hi, I'd like to open a discussion about using
Lucas> https://debaudit.debian.net data to gate testing
Lucas> migrations. The service is quite new, so this would likely
Lucas> need a long testing period, but I guess that it's probably
Lucas> better to open the discussion early.
Lucas> In short, debaudit currently includes two "checkers"
Lucas> (orig-check and git2dsc). orig-check ensures that the orig
Lucas> tarball in Debian matches upstream's. git2dsc ensures that
Lucas> the Vcs-Git repository matches the Debian source package.
Lucas> I think that it would make sense to block migration when
Lucas> packages regress (similar to what was implemented for
Lucas> reproducible builds, if I remember correctly) -- that would
Lucas> allow packages that fail in testing to still migrate, while
Lucas> still gradually improving the overall status.
I'm not convinced we have a consensus that using upstream tarballs is
the best practice.
I agree that we have a consensus that it is a best practice, but I think
that for example repacking an upstream git tag is also a best practice
or other workflows that are more git centered.
I don't want to have to get release team permission to get a migration
when I move from one of these best practices to another. I absolutely
support blocking migrations on unintended regressions if the false
positive rate is lower, but I think maintainers should have a way to
indicate a workflow change.
Hi, Workflows that use git tags and result in tree-same orig tarballs (not bit-identical orig tarballs) are supported by orig-check (but result in "800" diagnostic, not "900" diagnostic), because orig-check tries to "normalize" tarballs before comparing them again if the initial comparison shows that they are not bit-identical. Relevant examples of packages using Git-centric workflows are ktls-utils: https://debaudit.debian.net/orig-check/result/4b519e53da98f6b0ba1ad8b82b83579779b66b57c4f630a8ecb26711255e4d3b and c-evo-dh: https://debaudit.debian.net/orig-check/result/a6fe40a68cbf7715b3c294961fc5537cadf36ebdeea531dd43497c5132951e9d I don't think it's a problem in practice. Since orig-check can distinguish between the various cases, we could decide which state transitions require release team approval or not. For example, the current implementation allows moving from bit-identical to three-same. We could also allow moving from "reproducible orig tarball" to "no watch file so no way to check orig tarball". As usual, the problem is drawing a line between things that are most likely unintended regressions and require manual review/acknowledgement, and things that are most likely intended... One case that is difficult to deal with is workflow changes (even minor) outside of new upstream versions. For example, among the current orig-check regressions, there's: - vite: https://orig-check.debian.net/orig-check/result/4506d8769158822444ef829488c2e3107a3844a5d3cad85ef062214d9808ba07 In 1.4-6, the maintainer updated debian/watch to use the tag-based tarball generated by gitlab, while it previously used the manually-generated "release asset" tarball. Since that was done outside of a new upstream release, the tarball used for 1.4-6 does not match the method described in 1.4-6's debian/watch, but rather the method described in 1.4-1's debian/watch. - wine: https://orig-check.debian.net/orig-check/result/a2f1e53637b4a36179b6b24a825e7a1d94ebf0c6666fc8c9729d0e060584d8c2 In 10.0~repack-12, debian/copyright was modified to exclude some files from the repacked orig tarball. But since there wasn't a new upstream version, the orig tarball was not updated. One could argue that the above cases should be fixed by uploading a fake new upstream version (e.g. with a +ds suffix). Others could argue that we don't care and it will be fixed anyway when there will be a new upstream version. Lucas
Such "regressions" are not rare, given the number of packages in unstable. Paul gave an example of an upstream where no releases were expected with debian/watch following git commits - but then there was a one-off tarball release that is now in unstable. It is also very common that debian/watch follows only stable releases, and it wouldn't be uncommon if a maintainer anyhow decides to upload an RC version to unstable. A maintainer could also use debian/watch on tarballs only for being notified about new releases, with the sources in Debian being created in a different way like from a git tree with submodules in a workflow without tarballs. debian/watch is simply meant for a different purpose. Problems around exclusions could be avoided by only checking whether every file in the Debian sources is also present and identical in the upstream sources. cu Adrian
I don't think that's true. Of course the historical/original purpose of debian/watch is the detection of new upstream releases, but it has also been used to generate orig tarballs for a long time. This was first used only to download new upstream versions, but even its use to download a specific version is quite common if you look at https://codesearch.debian.net/search?q=uscan.*--download-%28current-%29%3Fversion&literal=0 Maybe we need an opt-out mechanism for packages to signal that they only use debian/watch to detect new upstream versions, and that the generation of orig tarballs is done using a different mechanism. But in general, I think that it's desirable for Debian to reinforce the link between upstream-provided sources and the orig tarball, and that debian/watch is a suitable interface to do that. Good idea, thanks. I improved orig-check to check for inclusion of tarballs into each other (as in: the tarball generated by uscan is fully included in the tarball in the archive), resulting in splitting the results into those diagnostics: 700 - tarballs not identical (934 packages in sid) 710 - generated tarball fully included in archive tarball (227) 720 - archive tarball fully included in generated tarball (534) 730 - mixed subset relationships between tarballs (4) (details on https://orig-check.debian.net/orig-check/statistics ) Lucas
"downloading the next version" and "how the current version was downloaded" are identical most of the time, but not always. When you start with wanting to block testing migration, then every obscure corner case suddenly becomes relevant. What exactly are the use case(s) you have in mind? Some upstreams provide only a tarball for the latest version, so everything you are doing might work fine for testing migration but it would never be possible to verify it later for a version of that package in stable. Upstream locations disappearing is not rare. Upstream might later provide a tarball with the same name but different contents - every obscure or stupid issue imaginable has already happened to some package at some point in time. AFAIR there was some talk from reproducible folks about comparing whether different distributions have the same sources, I could see possible use cases in that area. cu Adrian
Hi Adrian, Note that some of the cases you mention would not be affected by blocking migrations on *regressions*, since they would not be regressions compared to the current state in testing. (if it's already failing for the version in testing, for example because the upstream only provides a tarball for the latest version) And some other cases which are real regressions are probably worth being checked manually, such as if the upstream updated the tarball with a different content. When I made the suggestion of blocking migration on regressions, it was obvious to me that this would come with a way to ignore some corner cases manually (similarly to how it's done for other tests used for gating migrations). When I look at the current regressions compared to forky, I see lots of minor maintainer mistakes such as forgetting to push git tags: https://debaudit.debian.net/upstream2orig/regressions/forky https://debaudit.debian.net/git2dsc/regressions/forky https://debaudit.debian.net/git2orig/regressions/forky Those are probably worth detecting systematically and fixing. Maybe I should try to report severity:minor bugs for those problems for some time (after analyzing each of them), to see what kind of feedback I get. Lucas