#1126685 release.debian.org: Using debaudit.debian.net data to gate testing migrations

#1126685#5
Date:
2026-01-30 16:27:25 UTC
From:
To:
Hi,

I'd like to open a discussion about using https://debaudit.debian.net
data to gate testing migrations. The service is quite new, so this would
likely need a long testing period, but I guess that it's probably better
to open the discussion early.

In short, debaudit currently includes two "checkers" (orig-check and
git2dsc).  orig-check ensures that the orig tarball in Debian matches
upstream's.  git2dsc ensures that the Vcs-Git repository matches the
Debian source package.

I think that it would make sense to block migration when packages
regress (similar to what was implemented for reproducible builds, if I
remember correctly) -- that would allow packages that fail in testing to
still migrate, while still gradually improving the overall status.

The current regressions can be browsed on:
https://debaudit.debian.net/orig-check/regressions/forky
and
https://debaudit.debian.net/git2dsc/regressions/forky

There are currently 14 source packages that would be blocked because of
orig-check regressions, and 32 because of git2dsc regressions.

I'm not aware of false positives (= AFAIK, regressions are real problems
that should be fixed).

Data is refreshed every hour, and there's a JSON export with the
relevant data: https://debaudit.debian.net/results.json

Example:
  "aspell-ml": {
      "versions": [
        /* ... details about each known version ... */
      ],
      "migration_status": {
          "orig-check": {
              "can_migrate": false,
              "sid": {
                  "diagnostic": "220 - uscan failed -- network error",
                  "version": "0.04-1-11",
                  "dsc_sha256": "4d1a4a2890eeb834dcdae96e5b80d8c3059cef021db5eb03d894bee8bed8e0e8",
                  "url": "/orig-check/result/4d1a4a2890eeb834dcdae96e5b80d8c3059cef021db5eb03d894bee8bed8e0e8"
              },
              "testing": {
                  "diagnostic": "800 - identical after tarball normalization",
                  "version": "0.04-1-10",
                  "dsc_sha256": "2662500adb5b4e355e545b42e271dec5ae7b18d28db99b2df8883a15ea33232e",
                  "url": "/orig-check/result/2662500adb5b4e355e545b42e271dec5ae7b18d28db99b2df8883a15ea33232e"
              }
          },
          "git2dsc": {
              "can_migrate": false,
              "sid": {
                  "diagnostic": "700 - generated dsc differs",
                  "version": "0.04-1-11",
                  "dsc_sha256": "4d1a4a2890eeb834dcdae96e5b80d8c3059cef021db5eb03d894bee8bed8e0e8",
                  "url": "/git2dsc/result/4d1a4a2890eeb834dcdae96e5b80d8c3059cef021db5eb03d894bee8bed8e0e8"
              },
              "testing": {
                  "diagnostic": "910 - git-generated dsc identical to archive dsc after normalization",
                  "version": "0.04-1-10",
                  "dsc_sha256": "2662500adb5b4e355e545b42e271dec5ae7b18d28db99b2df8883a15ea33232e",
                  "url": "/git2dsc/result/2662500adb5b4e355e545b42e271dec5ae7b18d28db99b2df8883a15ea33232e"
              }
          }
      }
  },

Best,

Lucas

#1126685#10
Date:
2026-01-31 11:39:17 UTC
From:
To:
- 700 - generated dsc differs
│ -asn1c deb devel unknown arch=any
│ - asn1c-doc deb doc unknown arch=all
│ +asn1c deb devel optional arch=any
│ + asn1c-doc deb doc optional arch=all
That looks like a false positive?

- 290 - uscan did not produce an orig tarball with matching name
With the same tarball, caused by
  * update watch file to version 5
IMHO it would be bad if people were forced to upload "new" upstream
tarballs for the same upstream code when debian/watch changed.

- Sid result: 220 - uscan failed -- network error
Testing migration shouldn't depend on network connectivity to the
upstream location.

How will debaudit handle it when an upstream location adds a JavaScript
Challenge that is not yet supported by uscan?

cu
Adrian

#1126685#15
Date:
2026-01-31 17:31:05 UTC
From:
To:
Hi,

What happens here is:

* asn1c (source) was built on a system with dpkg-dev 1.22.6ubuntu6.5,
  according to
https://buildinfos.debian.net/ftp-master.debian.org/buildinfo/2026/01/30/asn1c_0.9.28+dfsg-6_source.buildinfo
* asn1c dropped its Priority field with that upload
* dpkg-dev issued 'unknown' in that case before dpkg-dev 1.22.12
  (uploaded in January 2025)

What should we do in that case? I think that it makes sense to require
that source uploads are prepared with a toolchain that generates
packages identical to our current toolchain.

But there's a normalization step I could use to ignore those
differences if needed.

I'm not sure. If there's a change in debian/watch that causes a change
in the used upstream tarball, maybe it's worth uploading a fake new
upstream version?

Assuming it's a transient problem, the maintainer could request a retry
on debaudit. Also debaudit already has a retry policy, that could be
made more aggressive for regressions if that's considered necessary.

debaudit just uses uscan, so it's a wider problem.

Note that some upstream locations, like GitHub, already do quite
aggressive rate limiting (which is fine for debaudit because it doesn't
perform that many requests). Maybe that rate limiting alone will be enough
to avoid add JS-based challenges?

Lucas

#1126685#20
Date:
2026-02-01 11:28:28 UTC
From:
To:
That wouldn't be an easy requirement.

Many maintainers (including myself) are not running unstable on the
machine where uploads are made from, and there is a diversity of
workflows how to create and sign source packages.

Running "dpkg-buildpackage -S -nc" on a stable system to create and sign
a source package for unstable would not be an exotic setup.

I haven't checked how tag2upload is setup, but this is the kind of
production service where it would make sense not having to worry
about toolchain breakages in unstable.

The fundamental issue is that debian/watch is for downloading the next
upstream version.

A maintainer might update debian/watch today for downloading xz instead
of gz when upstream announces there will only be xz in the future.

Same for the corresponding upstream signing key, we should not ship and
trust the revoked one that created the signature in our archive for the
current version a decade ago but the one that might sign the next release.

I won't touch debian/watch when doing an NMU for an RC bug, even when it
points to a long gone upstream location.

Your check might make sense on the version where a new upstream tarball
is uploaded (which might have been an upload to experimental), but when
an upload didn't touch the upstream tarball then it shouldn't be checked.

That's true, but it would move that from a minor annoyance to a blocker
for testing migration.

Perhaps for GitHub, but the number of distinct upstream locations in
debian/watch is likely 4 digit.

salsa.debian.org is an example of an upstream location used in debian/watch
of some packages where a JS-based challenge is already in use today,
minicom and postgresql-debversion are examples of packages where uscan
can no longer download the upstream source of the package in Debian
due to that.

cu
Adrian

#1126685#25
Date:
2026-02-01 12:08:54 UTC
From:
To:
Do you mean *blocking* migration, or do you mean *delaying* migration?
I'm not a release team member, but I think we should be very careful
with what we completely block from migrating: if an issue blocks
migration, we're effectively saying it's "worse than RC".

I think a key difference between reproducible builds and orig-check is
that reproducible builds usually only regress as a result of changes to
either the package under test or one of its build-dependencies, whereas
downloading software from upstream can easily regress for upstream
reasons that are outside our control.

And, a key difference between reproducible builds and both of these
tools is that reproducible builds are very careful to "pin" the versions
of relevant packages in use (via the extensive work that has been done
on generating and retaining .buildinfo), whereas these two checks seem
to be using the latest dpkg-source and the latest uscan from unstable?

It's no longer visible because it has migrated, but at the time I first
looked at this, glib2.0 was showing up as having a git2dsc regression.
The diff appeared to be that the tarball had been generated with a
different version of dpkg, which produced a different (but equally
valid!) representation of build-profiles: one .dsc had profile:v1 (added
in dpkg 1.23.0) and the other did not.

But I happen to know that for glib2.0/2.86.3-5, what's in git should
correspond exactly to what ended up in the archive, because I uploaded
it using tag2upload, which has that as one of its key design
constraints. So either tag2upload is helping to carry out an extremely
subtle attack (unlikely), or a wrong assumption is being made somewhere.

I assume that the difference is that tag2upload runs in a stable
container (or possibly in a not-quite-bleeding-edge sid container) when
it generates its .dsc, therefore uses an older dpkg-source than the one
that git2dsc uses to attempt to reproduce the package? And I think that
is, and should be, considered to be valid: it should be equally OK for a
maintainer to generate a .dsc on a stable or testing system, as long as
they build and test binaries on an unstable
machine/VM/container/chroot/thing.

I've seen comments elsewhere saying that it should be considered to be a
bug if a source package can't be successfully built by the latest
dpkg-source, and I think that's valid (albeit possibly a bug in dpkg
rather than in the source package, if dpkg regresses); but that isn't
the same as saying that it's a bug if *the specific source package that
was uploaded* wasn't generated by the latest dpkg-source.

A possible way to mitigate that class of false positives would be if
git2dsc did this:

- take the .dsc from the archive, A
- take the .dsc that it produces from git, B
- unpack A
- repack A with the same dpkg-source version that it used for B, to get C
- compare B with C

And, as with other "every package in git" initiatives like dgit and
tag2upload, I think that if we are pushing to get every package in git
with upstream source code included, we will need a mechanism to make
exceptions for very large data packages like 0ad-data, where this
treatment probably doesn't make sense and would cost a disproportionate
amount of processing time.

     smcv

#1126685#30
Date:
2026-02-01 16:19:59 UTC
From:
To:
Currently, yes, debaudit is using the latest dpkg and uscan from
unstable.  The "normalization" step tries to suppress differences that were
introduced in recent dpkg versions.

That's a good idea indeed. I will do that, it's actually cleaner than
the current normalization step.

Lucas

#1126685#35
Date:
2026-02-01 13:05:59 UTC
From:
To:
Note that in that case, dpkg-dev/stable would have have been fine.

I suppose that things will converge to a state where some URLs will be
blocked by JS-based challenges, and some others will not (but remain
subject to rate-limiting). I suppose that the URLs typically used by
uscan will fall in the second category.

Lucas

#1126685#40
Date:
2026-02-02 06:39:36 UTC
From:
To:
This has been implemented, and relevant packages have been reprocessed.

https://debaudit.debian.net/git2dsc/regressions/forky is up to date.

Lucas

#1126685#45
Date:
2026-02-03 21:08:06 UTC
From:
To:
Hi Lucas,


Thank you for providing that service, I think it's a great asset for
Debian. However, I've been thinking about this for a while, and I'm not
convinced it's right to include this in the migration checks in the near
future. Both tools don't really improve what's in the archive. It seems
to me that this is better exposed on tracker and similar pages.


I recall Adrian referred to the same, but uscan and d/watch have been
made for a different purpose. Using it like this feels like it needs to
be accepted by the Debian community first. Incidentally (as an example),
one [1] of the very few packages that I maintain regressed very recently
because I switched some time ago to a git based version because upstream
appeared to have stopped making releases and suddenly they did. I
uploaded as 1.11-1 but expect +git... versions soon again. I didn't care
to fix the watch file, also because I feared I would forget again. I
don't want to block migrations for cases like this.

I also don't like to depend on real external sources (i.e. services not
managed by the Debian community).


Also here, I don't want to treat a mismatch, as Simon put it, worse than
RC. I *think* what he means is that if an upload to unstable fixes an RC
bug, than blocking migration on this would prevent the RC bug from being
fixing in testing. I would immediately unblock the migration if I'd
noticed. Similarly, we currently treat out-of-sync for more than 30 days
as RC; I consider being out of sync much worse than have the repository
not being up to date (which at this moment feels like nice-to-have at
best to me). So while a delay might be acceptable, a block would not.
But, what's more, the bug even isn't in what's uploaded, but elsewhere.
It can even be fixed elsewhere (e.g. by pushing the missing pieces). I
think influencing migration for that is wrong.


We're close with reproducible builds, but not there yet (it's only
for-info). It's been a process of years.


That's a lot. Normally with new policies the numbers should be better
than that.

This is my opinion as a Release Team member at this moment. If the rest
of the team thinks it's nice to have, I wouldn't veto even if I could.

Paul

[1] https://debaudit.debian.net/orig-check/ type in viking (is there a
direct link?)

#1126685#50
Date:
2026-02-06 00:44:09 UTC
From:
To:
    Lucas> Hi, I'd like to open a discussion about using
    Lucas> https://debaudit.debian.net data to gate testing
    Lucas> migrations. The service is quite new, so this would likely
    Lucas> need a long testing period, but I guess that it's probably
    Lucas> better to open the discussion early.

    Lucas> In short, debaudit currently includes two "checkers"
    Lucas> (orig-check and git2dsc).  orig-check ensures that the orig
    Lucas> tarball in Debian matches upstream's.  git2dsc ensures that
    Lucas> the Vcs-Git repository matches the Debian source package.

    Lucas> I think that it would make sense to block migration when
    Lucas> packages regress (similar to what was implemented for
    Lucas> reproducible builds, if I remember correctly) -- that would
    Lucas> allow packages that fail in testing to still migrate, while
    Lucas> still gradually improving the overall status.

I'm not convinced we have a consensus that using upstream tarballs is
the best practice.
I agree that we have a consensus that it is a best practice, but I think
that for example repacking an upstream git tag is also a best practice
or other workflows that are more git centered.

I don't want to have to get release team permission to get a migration
when I move from one of these best practices to another. I absolutely
support blocking migrations on unintended regressions if the false
positive rate is lower, but I think maintainers should have a way to
indicate a workflow change.

#1126685#55
Date:
2026-02-06 00:44:09 UTC
From:
To:
    Lucas> Hi, I'd like to open a discussion about using
    Lucas> https://debaudit.debian.net data to gate testing
    Lucas> migrations. The service is quite new, so this would likely
    Lucas> need a long testing period, but I guess that it's probably
    Lucas> better to open the discussion early.

    Lucas> In short, debaudit currently includes two "checkers"
    Lucas> (orig-check and git2dsc).  orig-check ensures that the orig
    Lucas> tarball in Debian matches upstream's.  git2dsc ensures that
    Lucas> the Vcs-Git repository matches the Debian source package.

    Lucas> I think that it would make sense to block migration when
    Lucas> packages regress (similar to what was implemented for
    Lucas> reproducible builds, if I remember correctly) -- that would
    Lucas> allow packages that fail in testing to still migrate, while
    Lucas> still gradually improving the overall status.

I'm not convinced we have a consensus that using upstream tarballs is
the best practice.
I agree that we have a consensus that it is a best practice, but I think
that for example repacking an upstream git tag is also a best practice
or other workflows that are more git centered.

I don't want to have to get release team permission to get a migration
when I move from one of these best practices to another. I absolutely
support blocking migrations on unintended regressions if the false
positive rate is lower, but I think maintainers should have a way to
indicate a workflow change.

#1126685#60
Date:
2026-02-06 08:40:51 UTC
From:
To:
Hi,
Workflows that use git tags and result in tree-same orig tarballs (not
bit-identical orig tarballs) are supported by orig-check (but result in
"800" diagnostic, not "900" diagnostic), because orig-check tries to
"normalize" tarballs before comparing them again if the initial
comparison shows that they are not bit-identical.

Relevant examples of packages using Git-centric workflows are ktls-utils:
https://debaudit.debian.net/orig-check/result/4b519e53da98f6b0ba1ad8b82b83579779b66b57c4f630a8ecb26711255e4d3b
and c-evo-dh:
https://debaudit.debian.net/orig-check/result/a6fe40a68cbf7715b3c294961fc5537cadf36ebdeea531dd43497c5132951e9d

I don't think it's a problem in practice. Since orig-check can distinguish
between the various cases, we could decide which state transitions
require release team approval or not. For example, the current
implementation allows moving from bit-identical to three-same.
We could also allow moving from "reproducible orig tarball" to "no watch
file so no way to check orig tarball".

As usual, the problem is drawing a line between things that are most
likely unintended regressions and require manual review/acknowledgement,
and things that are most likely intended...

One case that is difficult to deal with is workflow changes (even minor)
outside of new upstream versions.  For example, among the current
orig-check regressions, there's:
- vite: https://orig-check.debian.net/orig-check/result/4506d8769158822444ef829488c2e3107a3844a5d3cad85ef062214d9808ba07
  In 1.4-6, the maintainer updated debian/watch to use the tag-based tarball
  generated by gitlab, while it previously used the manually-generated
  "release asset" tarball. Since that was done outside of a new upstream
  release, the tarball used for 1.4-6 does not match the method
  described in 1.4-6's debian/watch, but rather the method described in
  1.4-1's debian/watch.
- wine: https://orig-check.debian.net/orig-check/result/a2f1e53637b4a36179b6b24a825e7a1d94ebf0c6666fc8c9729d0e060584d8c2
  In 10.0~repack-12, debian/copyright was modified to exclude some files
  from the repacked orig tarball. But since there wasn't a new upstream
  version, the orig tarball was not updated.
One could argue that the above cases should be fixed by uploading a fake
new upstream version (e.g. with a +ds suffix). Others could argue that
we don't care and it will be fixed anyway when there will be a new
upstream version.

Lucas

#1126685#65
Date:
2026-02-06 11:32:16 UTC
From:
To:
Such "regressions" are not rare, given the number of packages in unstable.

Paul gave an example of an upstream where no releases were expected with
debian/watch following git commits - but then there was a one-off tarball
release that is now in unstable.

It is also very common that debian/watch follows only stable releases,
and it wouldn't be uncommon if a maintainer anyhow decides to upload
an RC version to unstable.

A maintainer could also use debian/watch on tarballs only for being
notified about new releases, with the sources in Debian being created
in a different way like from a git tree with submodules in a workflow
without tarballs.

debian/watch is simply meant for a different purpose.

Problems around exclusions could be avoided by only checking whether
every file in the Debian sources is also present and identical in the
upstream sources.

cu
Adrian

#1126685#70
Date:
2026-02-11 12:26:48 UTC
From:
To:
I don't think that's true. Of course the historical/original purpose of
debian/watch is the detection of new upstream releases, but it has also
been used to generate orig tarballs for a long time. This was first used
only to download new upstream versions, but even its use to download a
specific version is quite common if you look at
https://codesearch.debian.net/search?q=uscan.*--download-%28current-%29%3Fversion&literal=0

Maybe we need an opt-out mechanism for packages to signal that they only
use debian/watch to detect new upstream versions, and that the
generation of orig tarballs is done using a different mechanism.

But in general, I think that it's desirable for Debian to reinforce the
link between upstream-provided sources and the orig tarball, and that
debian/watch is a suitable interface to do that.

Good idea, thanks. I improved orig-check to check for inclusion of
tarballs into each other (as in: the tarball generated by uscan is fully
included in the tarball in the archive), resulting in splitting the
results into those diagnostics:
700 - tarballs not identical (934 packages in sid)
710 - generated tarball fully included in archive tarball (227)
720 - archive tarball fully included in generated tarball (534)
730 - mixed subset relationships between tarballs (4)

(details on https://orig-check.debian.net/orig-check/statistics )

Lucas

#1126685#75
Date:
2026-02-27 16:03:11 UTC
From:
To:
"downloading the next version" and "how the current version was
downloaded" are identical most of the time, but not always.

When you start with wanting to block testing migration, then every
obscure corner case suddenly becomes relevant.

What exactly are the use case(s) you have in mind?

Some upstreams provide only a tarball for the latest version, so
everything you are doing might work fine for testing migration but
it would never be possible to verify it later for a version of that
package in stable.

Upstream locations disappearing is not rare.

Upstream might later provide a tarball with the same name but different
contents - every obscure or stupid issue imaginable has already happened
to some package at some point in time.

AFAIR there was some talk from reproducible folks about comparing
whether different distributions have the same sources, I could see
possible use cases in that area.

cu
Adrian

#1126685#80
Date:
2026-03-01 19:37:14 UTC
From:
To:
Hi Adrian,

Note that some of the cases you mention would not be affected by
blocking migrations on *regressions*, since they would not be
regressions compared to the current state in testing. (if it's already
failing for the version in testing, for example because the upstream
only provides a tarball for the latest version)

And some other cases which are real regressions are probably worth being
checked manually, such as if the upstream updated the tarball with a
different content.

When I made the suggestion of blocking migration on regressions, it was
obvious to me that this would come with a way to ignore some corner
cases manually (similarly to how it's done for other tests used for
gating migrations).

When I look at the current regressions compared to forky, I see lots of
minor maintainer mistakes such as forgetting to push git tags:
https://debaudit.debian.net/upstream2orig/regressions/forky
https://debaudit.debian.net/git2dsc/regressions/forky
https://debaudit.debian.net/git2orig/regressions/forky
Those are probably worth detecting systematically and fixing.

Maybe I should try to report severity:minor bugs for those problems for
some time (after analyzing each of them), to see what kind of feedback I
get.

Lucas