#1106071 wanted: tag2upload support for pristine-tar

#1106071#5
Date:
2025-05-19 10:16:52 UTC
From:
To:
tl;dr:
  tag2upload ought to support, but not recommend or encourage,
  pristine-tar.  But I'm probably not the person to implemnt it.

Desirability of pristine-tar
----------------------------

Currently, tsg2upload doesn't support pristine-tar.  For
new-upstream-version uploads, it will use `git-deborig` which is a
thin wrapper around `git-archive`.  After #1105862 it will try to
detect when the user was trying to use pristine-tar, and fail.

pristine-tar's purpose is to mitigate some of the inconvenience of the
doctrine that Debian should base its work on, and redistribute,
upstream tarballs.  Personally, I think that doctrine is obsolete,
even harmful, for a large majority of upstreams.  Also, pristine-tar
is something of a hack and doesn't always work.

So my personal view is that pristine-tar is largely pointless
complexity to support an inferior workflow - indeed, a workflow that
exposes us to greater upstream supply chain risk since upstream
tarballs are less trustworthy than upstream git.

However, a key goal of tag2upload (and indeed my whole git transition
project) is to try to meet people where they are - and that includes
supporting partial transitions from tarballs+patches to git.  I think
pristine-tar falls into this category.

Therefore I think tag2upload *should* support pristine-tar.

But we should definitely recommend against it, and not put any
barriers in the way of people who don't use pristine-tar.


Implementation
--------------

I have almost never used pristine-tar and I don't intend to adopt it
now.  I don't really know how it works - what git refs it uses, what
the contents are, what invariants it preserves, and so on.  I think
the design and implementation would have to be done by someone who
does understand these things (and can explain them to me).

I think the ingredients (and skills needed) would be:

 * Some new metadata item(s) in the please-upload tag, including
   details of precisely which pristine-tar git objects are to be used,
   and maybe what refs they are to be fetched from if that's not
   obvious.  (Security and correctness design; pristine-tar.)

 * Recheck the code in git-debpush that does pristine-tar detection,
   which we are currently adding as part of #1105862 (which is just to
   detect use of pristine-tar and *reject*, to avoid mistakes).
   If we're going to use it to control the output, rather than merely
   as a safety catch against mistakes, It needs to be reliable.
   (Security and correctness design; pristine-tar; bash.)

 * Code in git-debpush to check that the pristine-tar information is
   consistent with the rest of the git information.  In particular, we
   must check that the tarball implied by pristine-tar is treesame to
   the upstream tag.  IDK if this is true by pristine-tar's design.
   (Security and correctness design; pristine-tar; bash.)

 * Given the design, code in dgit-repos-server to parse the new tag
   metadata, fetch the pristine-tar objects (easy) and run
   pristine-tar (probably also easy).  (Perl; pristine-tar; help from
   tag2upload authors.)

 * Change in tag2upload-service-manager to tolerate but ignore the new
   critical metadata item in the tag.  (Rust; easy.)

 * Test cases in dgit.git.  (Bash; Perl; pristine-tar.  Help wrestling
   the test suite from the src:dgit maintainers.)


References
----------

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1105862
    git-debpush check to detect and fail if user wanted pristine-tar

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=891033
    request for dgit to use pristine-tar automatically


Anyone who is interested in working on this should please get in
touch.

Ian.

#1106071#10
Date:
2025-05-21 12:34:54 UTC
From:
To:
Hello,

I agree.

Based on my understanding of how it works, which isn't great, these
notes cover what's needed.

#1106071#15
Date:
2025-07-18 10:05:22 UTC
From:
To:
FTR I am currently in the process of moving the details of the
orig-obtaining code from tag2upload_obtain_origs in dgit-repos-server
(which runs in the oracle) to a new script in dgit.deb which will run
on the builder.

This will make these two bugs easier, but means coding for them
should probably wait until I'm done.

Ian.

#1106071#20
Date:
2025-07-18 23:42:49 UTC
From:
To:
Several people have expressed interest in helping out with various
tag2upload work, including pristine-tar support and other issues.
This is great!

I feel we should have some kind of mini-coordination-bof.  For those
who are here at Debconf, I suggest we some of the post-lunch break
tomorrow (Saturday).  We'll meet somewhere at 13:30.  (I'll
investigate tomorrow morning and send another email with a definite
location.)

This will be a short get-to-know-each-other, with maybe some
handwaving and organisational stuff.  We may have Sean on IRC but I
don't know his schedule - I had to pick a time, and Saturday 13:30 is
it.

This is very exciting and I look forward to (trying to) put names to
faces (perhaps again, sorry!).  Also, I mined some email @debian
addresses from the internet; sorry if I got those wrong.  I'll ping
the affected folks on IRC too.

Ian.

PS I will be leaving promptly at 14:00 because I need to attend Fabian
Grünbichler's Rust talk.

#1106071#25
Date:
2025-07-19 08:39:45 UTC
From:
To:
I wrote:

I need to leave a bit earlier, so I suggest we meet at 13:15.

In terms of rooms, I haven't managed to book anywhere.  But I think we
will be able to make use of the noisy hacklab for this.  (The noisy
hacklab is near the BoF room.)

I hope to see some of you in the Noisy Hacklab, at *13:15*.

We will be done by no later than 13:45 so you'll have some gap before
talks start.

Ian.

#1106071#30
Date:
2025-07-19 09:41:57 UTC
From:
To:
Ian Jackson writes ("Bug#1106071: tag2upload contributors coordination mini-bof"):

The Content Team suggest we might use the BoF Room, so we will be
there unless someone else has grabbed it.

Ian.

#1106071#35
Date:
2025-07-19 15:15:01 UTC
From:
To:
Ian Jackson writes ("Bug#1106071: Moving t2u orig handling"):

This is now done and the result is in dgit.dgit mainline and also
deployed in production.

The orig handling is now in a script `tag2upload-obtain-origs`
(which can be found at the root or the source tree, and is shipped in
dgit.deb.)

Ian.

#1106071#40
Date:
2025-07-21 17:32:08 UTC
From:
To:
Hi!

Here's an initial draft of the pristine-tar support. More than something
intended to be merged in, this is just a way to get some feedback,
trying to understand if I'm on the right track.

I did two things:

1. Added a new pristine-tar=commit_id metadata field to the signed tag
   generated by git-debpush
2. If a pristine-tar commitid is present, the git repo is hard-reset to
   it, then the tarball name is obtained from the tree, and
   `pristine-tar checkout` is invoked to generate the tarball.

I believe this is safe security-wise because the commit id represented
the expected status of the pristine-tar branch on the developer's
machine is signed at the time of the upload. If for some reason the
branch gets in-between the upload, and the expected commit is lost,
things will just fail instead of generating a "wrong" tarball. Do you
see flaws in this reasoning?

Note: I did not test that my implementation is actually correct and
working. I did not add any new test either. I believe there are no
regressions in old tests but I'm testing this on a plane and didn't look
too much into the log output.

Let me know that you think! Bye :)
--- git-debpush | 12 +++++++++++- infra/dgit-repos-server | 5 ++++- tag2upload-obtain-origs | 23 ++++++++++++++++++++++- 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/git-debpush b/git-debpush index e3a4ba39..1782513e 100755 --- a/git-debpush +++ b/git-debpush @@ -457,6 +457,16 @@ if $upstream; then to_push+=("$upstream_tag") fi +# TODO: pristine-tar +# I obtain the commit ID at the time of the upload, so that I can be sure that +# the tag2upload service generates the tarball with the expected pristine-tar +# branch state +pristine_tar_info='' +if pristine_tar_commit=$(git rev-parse --verify --quiet 'refs/heads/pristine-tar'); then + pristine_tar_info=" pristine-tar=$pristine_tar_commit" +fi + + #**** Useful sanity checks **** #---- UNRELEASED suite @@ -837,7 +847,7 @@ fi tagmessage="$source release $version for $target [dgit distro=$distro split$quilt_mode_text] -[dgit please-upload source=$source version=$version$upstream_info] +[dgit please-upload source=$source version=$version$upstream_info$pristine_tar_info] " git_tag_main_opts_args=(-m "$tagmessage" "$debian_tag" "$branch_commit") diff --git a/infra/dgit-repos-server b/infra/dgit-repos-server index f6a3716c..e797123b 100755 --- a/infra/dgit-repos-server +++ b/infra/dgit-repos-server @@ -1304,7 +1304,7 @@ our ($t2u_email_noreply, $t2u_email_noreply_addr, $t2u_email_reply_to, @t2u_email_copies, $t2u_jid, $t2u_url, $t2u_putative_package); our ($t2u_tagger, $t2u_tagger_addr, $t2u_timeout); our ($t2u_signing_keyid); -our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt); +our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt, $t2u_pristinetar); sub t2u_dgit_cmd () { ( @@ -1840,6 +1840,8 @@ sub tag2upload_parsetag ($) { $package = $1; } elsif (s/^version=(\S+) //) { $tagversion = $1; + } elsif (s/^pristine-tar=(\w+) //) { + $t2u_pristinetar = $1; } else { return 0; } @@ -2030,6 +2032,7 @@ END "v=$version", "s=$suite", "u=$t2u_upstreamc", + "pristinetar=$t2u_pristinetar", ); flush EMAIL_REPORT or confess $!; open STDOUT, ">& EMAIL_REPORT" or confess $!; diff --git a/tag2upload-obtain-origs b/tag2upload-obtain-origs index 016fa655..0453c2de 100755 --- a/tag2upload-obtain-origs +++ b/tag2upload-obtain-origs @@ -16,6 +16,7 @@ # optional settings: # # bpd defaults to ../bpd +# pristinetar=PRISTINE-TAR-COMMITID set -eu -o pipefail shopt -s inherit_errexit # #514862, wtf @@ -96,7 +97,27 @@ case "$rc" in ;; esac
#1106071#45
Date:
2025-07-21 21:26:44 UTC
From:
To:
I noticed I made some typos in my previous message. Sorry, I'm tired.

I meant "representing" instead of "represented".

I meant "the branch gets **modified** in-between the upload **and the
t2u service build**".

#1106071#50
Date:
2025-07-22 15:20:51 UTC
From:
To:
Here there should probably also be a `mv "$tarball" ..`, as pristine-tar
checkout generates the tarball in the current working directory.

#1106071#55
Date:
2025-07-24 22:00:29 UTC
From:
To:
Andrea Pappacoda writes ("Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support"):

Thanks.   This seems to be going in the right direction.  Very
exciting!

I have a bunch of comments about various details etc.  Please don't be
discouraged by my pickiness.  It's no reflection on the quality of
your work.  Getting feedback like this from me is entirely normal and
expected :-).

This seems right to me.

Should this be a "critical extension" in X.509 terminology?

This would be the first such, (--quilt ought to have been but is
grandfathered) but I think probably if someone hypothetically somehow
sends this option to a t2u service where this support has not yet been
deployed (or has been withdrawn!) what should happen?  Probably it
should fail.

So how about
  [dgit ... !pristine-tar=... ...]
or
  [dgit ... +pristine-tar=... ...]
or some such?

The new metadata item will need to be documented in tag2upload(5).

By "the tarball name is obtained from the tree" you mean that the orig
tarball name is obtained from the pristine-tar commit's tree.

I think this is correct from my understanding of pristine-tar.
Subject to what I write below about pristine-tar vs HEAD.

Right.
...

Don't we want to do this only if the current upstream version actually
has any informaation in pristine-tar ?  If not, then the user didn't
import the tarball, presumably ?

What happens if another maintainer did this new upstream version, and
our user hasn't got that up-to-date pristine-tar branch yet?  I think
ideally we'd use the orig from the archive, but there is a risk of
lossage if the uploads come too close together - see ##1109130.

I don't know what those might be.  I guess we could leave them for now
and see what things go wrong in reality.  Ie, I don't think this is a
blocker.

We might want to linewrap this a bit more.  That line is getting quite
long.  I guess we could put all the upstream and pristine-tar info in
a separate line maybe?

This part looks correct.  Except:

I think this option should be called "pristine_tar".

I don't think this is right, is it ?  We want to rewind the local
pristine-tar branch to that commit, not the HEAD.  I'm assuming
pristine-tar implicitly uses refs/heads/pristine-tar rather than HEAD.

We should definitely have a test case simulating an attack where the
salsa pristine-tar branch has got strange stuff in it.

We definitely don't want it to be random!  Let's call this an error
for now.

If this turns out to be a problem in practice, and users don't want to
just git rm the unwanted info from their pristine-tar branch, then I
guess we'll need to add the tarball name to the tag or something.

If you're using git-ls-tree then you need to use the nul-separated
output format I think.  Urgh.  Feel free to switch implementation
language.

This part of the code needs to be super-careful and cope with all
weird and unexpected git objects.  For that matter, we should probably
try to defend pristine-tar from "weird stuff" so we should probably
check that the tarball metadata file is indeed a *file*.

Ian.

#1106071#60
Date:
2025-07-25 10:50:05 UTC
From:
To:
Nice to hear!

Don't worry at all! Please be as picky as you can be. I'm not afraid of
negative feedbacks :)

Yeah, I'd make sense. If I have git-debpush adding pristine-tar
metadata, I'm probably expecting the t2u service to use it. But maybe
it'd make sense to make all extensions critical by default? I can't
think of a scenario where git-debpush would add something which the t2u
service could ignore.

I like "!" more, as "+" to me has more of an "cumulative" feel. And, in
case we go for critical by default, I'd use "?" for
a non-critical/optional field.

I'm not super familiar with Git terminology, but yes.

Great!

Yes, makes perfect sense. I have a couple of packages where
a pristine-tar branch is present but outdated, so this would break all
of them actually :)

Maybe in the t2u service we should check if pristine-tar data exists in
the repo and error out if the pushed tag didn't contain pristine-tar
metadata? Because in any case, user's local builds would still be using
a different orig than the intended one, so you're effectively testing
your package against the wrong upstream code.

You mean like:

    [dgit distro=$distro split$quilt_mode_text]
    [dgit please-upload source=$source version=$version]
    [dgit $upstream_info$pristine_tar_info]

If there's no functional difference, why not.

Doesn't the tag2upload-obtain-origs script check that the option name
only contains [0-9a-z-] characters? I could add an underscore there.

Yes, that's what I wanted to do. Didn't know about the existence of `git
update-ref` (which is what I'm supposed to use here, right?). The tool
also has a `--no-deref` option which makes me kinda nervous (having
heard of all the security issues related to symlinks) - maybe it'd make
sense to use that too?

And yes, pristine-tar looks for that ref specifically.

Yes.

Got it. For the record, `gbp export-orig` generates the tarball
mentioned in the most recent pristine-tar branch commit message. If the
messages do not contain the expected tarball version, it proceeds
assuming gzip, and if that fails, generates a new tarball with git
archive and commits it into the pristine-tar branch. Which is probably
less sensible than using a random one :)

Yes, but how? If there are two tarballs of the same version in
pristine-tar, the user might not even realise that. Which of the two
should be added? Should the user be asked about that by git-debpush?
It's a case which should actually never happen in practise
(pristine-tar's objective is being able of recreating the upstream
tarball, and usually there's just one of them).

Maybe we can make git-deborig check for this too, and error out while
creating the tag?
just in case, which languages are acceptable?

Sure! I haven't been *too* careful while writing this initial patch, but
I'll do better as we finalize it.

Thanks for the review! Will update the code and send a v2. Bye :)

#1106071#65
Date:
2025-07-25 22:47:35 UTC
From:
To:
Andrea Pappacoda writes ("Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support"):

We've already doucmented (and implemented in our three existing
parsers) that unknown extensions with the current syntax should be
ignored.  So I don't think we should change this now.

Let's go with !.

I think even if #1109130 happens, we can make the upload *fail* rather
than use the wrong orig.

But, I suggest we could deal with this by warning the user (with a
failed check) if the upstream pristine-tar branch is ahead of their
own.

Yes.  No, there isn't.

Huh.  That makes no sense because it tries to make them into shell
varisbles!  This seems like a bug in the script.

We completely control the local ref namespace, because we use a
`git fetch` with refspecs, rather than `git clone`.  So there won't be
unexpected symrefs.

Blimey.

That would be fine.  In general, things that will definitely cause the
upload to fail should be detected locally if that's reasonably
feasible.

I think I would say Perl is fine but we probably don't want Python
without a compelling reason.

You can assume bash.  The set of things installed there is already
much bigger than essential.

Great.  I look forward to reading it.

Ian.

#1106071#70
Date:
2025-07-26 06:48:36 UTC
From:
To:
Hello,

There is already some pristine-tar stuff in git-debpush, the "Intent to
use pristine-tar for this upload" check.  I think that your changes
should be integrated with that.  I.e., the conditions under which we
currently fail because we think the user wants to use pristine-tar,
should be precisely the conditions in which pristine_tar= gets added to
the tag.

As for everything else, I'll assume Ian's review probably covered it,
and I'll take a closer look at your v2.  Thank you for helping us with
this.

#1106071#75
Date:
2025-07-26 12:12:32 UTC
From:
To:
---
Ok! Round two.

Here's a summary of the changes from v1. For git-debpush:

- The pristine-tar checking code is now only run if this is a non-native
  package (i.e., "if $upstream").
- The upstream version is used instead of the Debian-revised one.
- Differently from the old pristine-tar check, the code is not run just
  for the first (i.e., -1 or -0.1) revision, but for any upload. This
  way, the t2u service can potentially handle the case where
  a pristine-tar upload was intended, but no orig is available in the
  archive yet. Please let me know if this makes sense or not!

For tag2upload-obtain-origs:

- Code is a bit more carefully written (using nul terminated command
  output when possible). This also applies to the git-debpush script.
- The pristinetar option has been renamed to pristine_tar, and the
  option keys glob has been changed to accept underscores instead of
  dashes.
- The process will fail if there is more than one pristine-tar orig.
- It is now checked that pristine-tar metadata is a regular file
  (according to git ls-tree)
- git update-ref is not used to rewind the pristine-tar branch, instead
  of git reset --hard.

I did not add the "critical extension" stuff yet. Also, what should we
do about the signature files which pristine-tar can optionally store and
retrieve?

 git-debpush             | 41 ++++++++++++++++++++++++++---------------
 infra/dgit-repos-server |  7 ++++++-
 tag2upload-obtain-origs | 38 ++++++++++++++++++++++++++++++++++++--
 3 files changed, 68 insertions(+), 18 deletions(-)

diff --git a/git-debpush b/git-debpush
index e3a4ba39..78e42fb9 100755
--- a/git-debpush
+++ b/git-debpush
@@ -457,6 +457,30 @@ if $upstream; then
     to_push+=("$upstream_tag")
 fi

+# I obtain the commit ID at the time of the upload, so that I can be sure that
+# the tag2upload service generates the tarball with the expected pristine-tar
+# branch state
+pristine_tar_info=''
+if $upstream; then
+    uversion="${version%-*}"
+
+    if pristine_tar_commit=$(git rev-parse --verify --quiet 'refs/heads/pristine-tar'); then
+        pristine_tar_tarballs=$(git ls-tree -z --name-only -- 'refs/heads/pristine-tar' \
+            | grep -zF -- "${source}_${uversion}.orig.tar." \
+            | grep -zc -- "\.id$")
+
+        if [ "$pristine_tar_tarballs" -gt 1 ]; then
+            fail 'more then one pristine-tar orig'
+        fi
+
+        # If there's no tarball, the user probably stopped using pristine-tar a
+        # while ago, but didn't delete the branch. Just ignore it.
+        if [ "$pristine_tar_tarballs" -eq 1 ]; then
+            pristine_tar_info=" pristine-tar=$pristine_tar_commit"
+        fi
+    fi
+fi
+
 #**** Useful sanity checks ****

 #---- UNRELEASED suite
@@ -522,20 +546,6 @@ case "$branch" in
         fi
 esac

-#---- Intent to use pristine-tar for this upload
-
-case "$version" in
-    *"-1"|*"-0.1")
-	uversion="${version%-*}"
-	if $upstream && type pristine-tar >/dev/null 2>/dev/null \
-		&& pristine-tar list \
-		    | grep -q "^${source}_${uversion}"'\.orig\.tar\.'
-	then
-	    fail_check pristine-tar \
- "pristine-tar data present for $uversion, but this will be ignored (#1106071)"
-	fi
-esac
-
 #---- Submodules

 # Per gitmodules(7) "FORMS", .gitmodules is always present at the
@@ -837,7 +847,8 @@ fi
 tagmessage="$source release $version for $target

 [dgit distro=$distro split$quilt_mode_text]
-[dgit please-upload source=$source version=$version$upstream_info]
+[dgit please-upload source=$source version=$version]
+${upstream_info:+[dgit $upstream_info$pristine_tar_info]}
 "

 git_tag_main_opts_args=(-m "$tagmessage" "$debian_tag" "$branch_commit")
diff --git a/infra/dgit-repos-server b/infra/dgit-repos-server
index f6a3716c..96058d56 100755
--- a/infra/dgit-repos-server
+++ b/infra/dgit-repos-server
@@ -1304,7 +1304,7 @@ our ($t2u_email_noreply, $t2u_email_noreply_addr, $t2u_email_reply_to,
      @t2u_email_copies, $t2u_jid, $t2u_url, $t2u_putative_package);
 our ($t2u_tagger, $t2u_tagger_addr, $t2u_timeout);
 our ($t2u_signing_keyid);
-our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt);
+our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt, $t2u_pristine_tar);

 sub t2u_dgit_cmd () {
     (
@@ -1840,6 +1840,8 @@ sub tag2upload_parsetag ($) {
 	    $package = $1;
 	} elsif (s/^version=(\S+) //) {
 	    $tagversion = $1;
+	} elsif (s/^pristine-tar=(\w+) //) {
+	    $t2u_pristine_tar = $1;
 	} else {
 	    return 0;
 	}
@@ -2031,6 +2033,9 @@ END
         "s=$suite",
         "u=$t2u_upstreamc",
     );
+    if (length $t2u_pristine_tar) {
+	push(@obtain_origs, "pristine_tar=$t2u_pristine_tar")
+    }
     flush EMAIL_REPORT or confess $!;
     open STDOUT, ">& EMAIL_REPORT" or confess $!;
     t2u_b_run_fetch_cmd_errok 'work', @obtain_origs;
diff --git a/tag2upload-obtain-origs b/tag2upload-obtain-origs
index 016fa655..73a23bea 100755
--- a/tag2upload-obtain-origs
+++ b/tag2upload-obtain-origs
@@ -16,6 +16,7 @@
 # optional settings:
 #
 #     bpd                          defaults to ../bpd
+#     pristine_tar=PRISTINE-TAR-COMMITID

 set -eu -o pipefail
 shopt -s inherit_errexit # #514862, wtf
@@ -32,7 +33,7 @@ while [ $# != 0 ]; do
 	    k="${1%%=*}"
 	    v="${1#*=}"
 	    case "$k" in
-		*[^0-9a-z-]*) fail "bad syntax for setting" ;;
+		*[^0-9a-z_]*) fail "bad syntax for setting" ;;
 		*)
 		    eval "s_$k=\$v"
 		    ;;
@@ -96,7 +97,40 @@ case "$rc" in
 	;;
 esac

#1106071#80
Date:
2025-07-26 13:31:24 UTC
From:
To:
Hello,

Thanks.  I've included some inline comments below.

I think it would be helpful to work on the spec in tag2upload(5) before
continuing too much with code.  It'll make it easier to keep the three
of us on the same page.

ITYM your new code, right?
The old check already had both these properties.

It might make sense, I'm not sure yet.  Can you describe a concrete
example that would lead to this being helpful?

Can you explain why you've put this in at this point in the script?  I
think that maybe it should go later, after all the sanity checks.

I take it you switched from invoking pristine-tar itself to calling
git-ls-tree in order to use NUL termination?  If so, maybe we should
make that change first to the existing check.  Perhaps you could prepare
an MR to that effect.

Generally we avoid parentheses on builtin operators and use poetry
style, so

    push @obtain_origs, "pristine_tar=$t2u_pristine_tar"
      if $t2u_pristine_tar;

Do you think we could extract them and include them in the upload?
I think we can verify them by using the upstream key embedded in the
source package, right?  And if that verification fails we should
probably abort the upload -- maintainers who choose to use tarball
signatures had better make sure they verify.

#1106071#85
Date:
2025-07-26 13:56:32 UTC
From:
To:
Hi Sean,

Ok :)
Documentation work has arrived earlier than I had anticipated...

Yes. I copied them from the old check :)

This is meant as a way to handle the potential issue described by Ian in
<26754.44285.166852.764186@chiark.greenend.org.uk>. Unless I have
misunderstood the issue, of course!

If one does a -2 upload and the archive does not have the orig yet, and
t2u has a reference to the pristine-tar branch, it can (safely?)
re-create the tarball as it would be bit-by-bit identical to the already
uploaded one. Does it make sense?

No, I cannot explain that :)

There's really no reason why really, I just tried to put everything
pristine-tar related in the same place. Thinking about it, these checks
can only go before obtaining the pristine_tar_info, because I cannot
reasonably get the pristine-tar info before first making sure there's
just one orig.

Kind of. I first wrote the checks in tag2upload-obtain-origs using plain
shell and git, and then simply copied them back here. This was before
looking at the existing pristine-tar check. But yes, pristine-tar does
not use nul termination.

You mean I should write a separate patch for the check and submit it
independently from this patch? I'd like to finish this patch in
a reasonable time, so maybe it doesn't make sense to fixup a local check
which is going to be removed soon anyway?

Thanks! Also, the code mixes tabs and spaces for indentation; looking at
the diff here made me remember that. Not my fault!

Yes, that'd be the correct thing to do. What I wasn't sure about is
whether t2u should just checkout the signature and upload it to the
archive, verify it as well, or not do anything at all. In other words:
do we have to do verification here, or does it happen after sending
everything to the archive with dput? In that case, would it make sense
to duplicate the verification?

Thanks for the review, Sean!

#1106071#90
Date:
2025-07-26 23:15:08 UTC
From:
To:
Sean Whitton writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support"):

Yes, I very much agree.  Spec work on the protocol ought to precede
the code.

I'm pretty sure Andrea has this right.

If we have pristine-tar support we should engage it whenever we are
doing upstream handling (ie for non-native source formats) and this
ought not to depend on the version number.

The -1 etc. thing is just a guess really.  That's fine for a check,
but it's not fine for functional code.

Of course the pristine-tar codepath is entitled to decide that
pristine-tar ought not to be applied to this upload, using its
pristine-tar-specific knowledge.

Does pristine-tar convey signature files?  If so we should definitely
support them.

I don't think I agree that we ought to be doing signature verification
that isn't related to our functioning.

In particular, this means that if we were to accept an upload with no
signature file, we should accept an upload with a signature file we
can't verify for whatever reason.   Since we are obviously not relying
on the signature if we don't mind if it's totally absent.

I know that this is not standard in Debian tooling but I find the
approach of Debian's tooling contrary to reasonable cryptographic
protocol design.

I guess a signature you can't verify might be a failed check, but,
really, is a maintainer *actually* going to get as far as git-debpush
without having discovered the signature doesn't verify in their local
environment?  They've probably *run* the upstream code by then.

Or to put it another way, a signature that won't verify probably means
that the upload was previously done by another maintainer or on
another system where the right public key *was* available, but it's
not available here and now.  It doesn't seem to me that it is likely
to mean "this is an attack and we should stop" or anything like that.


Questions like the ones we discuss above are examples of reasons why
it is a good idea to nail down the spec before writing code.  Deciding
on correct behaviour in advance saves rework (and rework is extra
effort and often leads to additional confusion and additional bugs).

Ian.

#1106071#95
Date:
2025-07-27 14:11:19 UTC
From:
To:
Hi again!

I tried to add to the tag2upload.5 manpage the pristine-tar handling
design outlined in our discussions, which is inline below. Still, I have
a few questions:

What should we do with that upstream commit metadata? pristine-tar does
not need that, since it'll generate the tarball from the git tree id
stored in source_version.orig.tar.id. Still, we might want to make sure
that the pristine-tar tree corresponds to the one of the upstream commit
id. I don't know how useful this would be though, since the delta may
contain additional file additions and removals. Also, what should we do
with such tarballs whose contents are not identical to the git tree?

In the text below, I assume that:

- We want to verify equality of upstreamc's tree and the one used by
  pristine-tar.
- We allow binary deltas (i.e., the .delta file) to contain
  modifications to files stored in the referenced tree, such as the
  addition of configure scripts.

Here it is:

=item C<pristine-tar>=COMMITID

Identifies the state of the pristine-tar branch at the time of push, if
present and containing data related to the current upstream version.

If this metadata item is present, the C<upstream> and C<upstream-tag>
items must be present too. The tag2upload service will ensure that the
tree contained in the .id file of the pristine-tar branch will
correspond to the tree referenced by the commit id contained in the
C<upstream> metadata item.

If the pristine-tar branch contains a signature file, this will be
published together with the orig tarball, and no signature verification
will be performed.

#1106071#100
Date:
2025-07-28 18:41:02 UTC
From:
To:
Hello,

Right, I see, thank you.

I would suggest it should go in the section marked "Gather git history
information".

What I was thinking is that changing to use git(1) instead of
pristine-tar(1) is a logically distinct change from changing from a
check to embedding pristine-tar info in the tag.  So they should be
separate commits anyway, and we'd want to run the full test suite
against both of them.  While we are still discussing design you could
get the first change out of the way with a MR now.

Yeah, this is our inconsistent use of Emacs, sorry about that.
Just don't worry about it.

Do you mean whether dak does any verification?  I don't know.

#1106071#105
Date:
2025-07-28 18:44:18 UTC
From:
To:
Hello,

Sorry, but which metadata is that?

Trying to read your patch, I think the fact I don't use pristine-tar is
really showing.  Is the .id file defined somewhere?  Is your knowledge
of the pristine-tar branch contents from reading a spec, or empirical?

Glad we have someone who knows it better working on it.

#1106071#110
Date:
2025-07-28 18:58:05 UTC
From:
To:
Hi,

Will do!

Put this way, it makes sense. Will send another patch soon.

Great :)

I think that for the time being, publishing the signature without extra
processing is the most appropriate solution. We always have time to
revise it if needed.

#1106071#115
Date:
2025-07-28 19:14:24 UTC
From:
To:
Hello,

ACK.

#1106071#120
Date:
2025-07-28 19:19:28 UTC
From:
To:
Andrea Pappacoda writes ("Re: Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support"):

I think we decided it should start with !.

I will add something to the spec about critical extensions starting
with !.

This is ind kof in the wrong mood.  It reads like a description of
when git-debpush should include it.  It ought instead to be a
specification of what meaning of the item is.

Something like

  Names a commit containing pristine-tar metadata.

  The commit must contain SOMETHING LIKE exactly one .id file with
  SOME PROPERTIES OR OTHER.  The .id file MUST SATISFY SOME
  CONDITIONS THAT I DON'T UNDERSTAND.

  The tag must also contain an C<upstream> item, and the tree named in
  the .id file must be identical to that of the C<upstream> commit.

  The pristine-tar commit may contain SOMEHOW IDENTIFIABLE signature
  file.  The signature file MUST SATISFY REASONAB.E CONDITIONS SUCH AS
  ITS FILENAME BEING SANE.  The signature file will then be published
  together with the orig tarball.  The signature file is treated as
  pure data by the service (so will not be verified or even format
  checked).

  If an orig tarball needs to be (re)generated, the service will use
  pristine-tar, using precixely the metadata in the .id file.  The
  service will check that the generated tarball MATCHES THE HASH IN
  THE .ID FILE and that its contained tree is identical to SOMETHING.

  The named prstine-tar commit must be reachable from the
  C<pristine-tar> branch in the repository.

Ian.

#1106071#125
Date:
2025-07-28 19:21:22 UTC
From:
To:
It's likely that I didn't explain myself correctly. I meant the existing
upstream= and upstream-tag= metadata fields which git-debpush already
uses. The pristine-tar tool does not need those to generate a tarball,
but I believe it's still useful to include them alongside the
pristine-tar= metadata field to compare the pristine-tar tree to the
tree of the git commit contained in the upstream= metadata field.

Hope it's clearer now! If not, here's some code which should express my
intent less ambiguously than in English.

    pristine_tar_tree_id=$(git cat-file -- blob "${s_pristine_tar}:${tarball}.id")
    upstream_commit_tree_id=$(git rev-parse --verify --end-of-options "${s_u}^{tree}")
    if [ "$pristine_tar_tree_id" != "$upstream_commit_tree_id" ]; then
        fail 'pristine-tar tree id differs from the upstream commit one'
    fi

Kind of both. The pristine-tar(1) manpage says, under the `pristine-tar
commit _tarball_ _upstream_` section:

So yes, pristine-tar specifies that it stores the tree id somewhere. It
does not explicitly say where (well, not in the manpages), but it does
store that tree id inside a file named as the input tarball with ".id"
appended (as shown in its source code). This is not configurable, and
pristine-tar also looks for such file when running `pristine-tar
checkout`, so it cannot change really, otherwise new pristine-tar
versions would be unable to extract old tarballs, which defeats the
purpose of the tool.

The ".delta" file is explicitly mentioned in the manpage, just below the
paragraph I quoted before.

Thanks!

#1106071#130
Date:
2025-07-29 08:27:38 UTC
From:
To:
Hello,

Thanks.  We don't want to depend on the pristine-tar field for anything
other than obtaining the orig.tar, so we would definitely want to keep
the upstream= and upstream-tag= fields no matter what.

Thanks, I understand now.

#1106071#135
Date:
2025-08-02 10:20:58 UTC
From:
To:
Hi Andrea,

Have you had a chance to look at the following?

#1106071#140
Date:
2025-08-02 17:34:56 UTC
From:
To:
Hi Sean!

Sorry, I thought I had replied already. Thanks for the reminder :)

The branch must contain exactly one .id file per upstream release. Its
name should correspond to the name of the orig tarball, with the ".id"
suffix. The file must be a regular file.

Yes.

In practise, pristine-tar always stores the signature file as
"orig_name.asc". So I think we could just specify this requirement here.

Yes.

I'm not sure I get this part, but if you meant what I understood, then
it's wrong. The .id file does not contain the hash of the tarball, it
contains a single line which corresponds to the tree id, as mentioned
above. I'm honestly not sure where the hash verification happens, but *i
believe* it's part of the reconstruction when pristine-gz and co re run,
thanks to information stored in the .delta (VCDIFF) file.

Yes.

One question remains unanswered. Should we allow .delta files modifying
the tarball contents (i.e., do we want to allow generating tarballs
which have different contents then the git tree)?

#1106071#145
Date:
2025-08-02 19:17:10 UTC
From:
To:
I've come back from a party and am a bit tipsy so I will read this
properly later, but:

Thanks for engaging with these questions!

Andrea Pappacoda writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support"):

I think in principle it might be a .sig.

So the .id contains the tree (git tree object) which uniquely
identifies the *contents* of the tarball.

But how does the pristine-tar information specify the precise hash of
the tarball itself?  Does the .delta file say what the output hash is
supposed to be ?

I don't think I fully understand the implications.  My default
position is that the answer should be "no" unless one of us *does*
understand the implications :-).

Regards,
Ian.

#1106071#150
Date:
2025-08-03 10:27:28 UTC
From:
To:
I'm replying to your email after a small party too, but at least I have
slept a couple of hours :)

Maybe yes, but regardless of the input signature filename, pristine-tar
always stores the signature in Git with a name of orig.asc. Also,
doesn't dpkg-source look for .asc files only?

Yes, but see below.

Yes, I've checked now and the .delta contains the expected SHA256 hash.
the orig tarball contains empty dirs, which are not representable in
Git. As an example:

    $ tar -xvzf mypackage_1.0.orig.tar.gz
    mypackage/
    mypackage/file.txt
    mypackage/empty_dir/

    $ cd mypackage

    $ git init -b upstream/latest

    $ git add --all

    $ git commit -m init

    $ git show pristine-tar:mypackage_1.0.orig.tar.gz.id | xargs git show
    tree 385d33e969fefd23b8efaca69c1d2db507ce0daf

    file.txt

    $ pristine-tar commit ../mypackage_1.0.orig.tar.gz upstream/latest

    $ rm ../mypackage_1.0.orig.tar.gz

    $ pristine-tar --debug checkout mypackage_1.0.orig.tar.gz
    pristine-tar: set subdir to mypackage
    pristine-tar: subdir is mypackage
    pristine-tar: mypackage/empty_dir/ is listed in the manifest but may not be present in the source directory
    pristine-tar: creating missing mypackage/empty_dir/
    pristine-tar: doing full tree sweep to catch missing files
    pristine-tar: successfully generated mypackage_1.0.orig.tar.gz

    $ tar -tzf mypackage_1.0.orig.tar.gz
    mypackage/
    mypackage/file.txt
    mypackage/empty_dir/

One different example which may illustrates the "unexpected" results
which this could lead to is this one. Here, the tarball is created with
a file containing "evil" content, while in the upstream/latest branch
only the "good" content is stored. Upon tarball checkout, the good
content gets replaced with the evil one:

    $ mkdir repo

    $ echo evil > repo/file.txt

    $ tar -czf repo_1.0.orig.tar.gz repo

    $ echo good > repo/file.txt

    $ cd repo

    $ git init -b upstream/latest

    $ git add --all

    $ git commit -m init

    $ pristine-tar commit ../repo_1.0.orig.tar.gz upstream/latest

    $ git show pristine-tar:repo_1.0.orig.tar.gz.id
    ca1cc63dd18610bc64a150397556d33e850a61e8

    $ git rev-parse --verify --end-of-options 'upstream/latest^{tree}'
    ca1cc63dd18610bc64a150397556d33e850a61e8

    $ git show ca1cc63dd18610bc64a150397556d33e850a61e8:file.txt
    good

    $ rm ../repo_1.0.orig.tar.gz

    $ pristine-tar checkout repo_1.0.orig.tar.gz

    $ tar -xvzf repo_1.0.orig.tar.gz
    repo/
    repo/file.txt

    $ cat repo/file.txt
    evil

Even though both the pristine-tar .id file and the upstream/latest
branch point to the same tree id, the binary .delta contains
modifications to file.txt which change the contents from "good" (stored
in the git tree) to "evil" upon orig checkout.

Even though this example is artificial (the tarball contents are usually
committed to version control after it has been downloaded, not before),
it would still theoretically be possible for a malicious maintainer to
sneak a backdoor in (like in the xz backdoor case, but with the extra
step of also having a Debian maintainer collaborate). So I'm inclined to
say "sorry, no, this is too dangerous".

It is also true that this is currently allowed in regular Salsa repos,
so allowing this would not really make the situation worse.

The thing is: how do we disallow this? I'm not aware of any pristine-tar
switch which makes it fail when such .delta file performing file content
modifications exists. Do we have to perform our own checking *after* the
tarball is checked out, by e.g. extracting it again on top of the
upstream commit tree and making sure no differences exist? Hacky but may
work.

Let me know! Bye :)

#1106071#155
Date:
2025-08-03 13:13:39 UTC
From:
To:
Hello,

Okay, could you rewrite this part, then?

It might be a good time to open an MR adding the latest version of your
text to tag2upload(5).

ISTM that we should allow this as otherwise we would not be supporting
many pristine-tar users.

#1106071#160
Date:
2025-08-03 13:15:20 UTC
From:
To:
Hello,

I thought that the .delta files were mostly to cover, for example, the
tarball containing autotools-generated files that aren't in git?
Isn't that a key use case?

#1106071#165
Date:
2025-08-03 13:31:04 UTC
From:
To:
Sean Whitton writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support"):

Not according to Colin in the "want Jia Tan option" bug,
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109423#15

Empty directories are a corner case but git will consider them
treesame so if we do the check in git all will be well.

Ian.

#1106071#170
Date:
2025-08-03 13:51:12 UTC
From:
To:
Hello,

Ah, right, thanks.  This is confusing, huh?  Then indeed, let's not
support them.

#1106071#175
Date:
2025-08-03 13:59:05 UTC
From:
To:
The main undeniable advantage of pristine-tar is regenerating a tarball
which is bit-by-bit identical to the upstream one, without having to
keep the actual tarball around. This is useful for source
reproducibility use cases (ignoring that Git is better for this anyway).

I argue that containing autotools-generated files is not the main use
case because in the usual git-buildpackage workflow you actually import
the tarballs into git, so the Debian git tree has the autotools stuff as
well.

When one uses a mixed upstream git + tarballs gbp workflow, the tarball
contents gets applied as a new commit on top of the upstream git tag.
So, even there, the contents of the tarball match the contents of the
git tree pointed by the upstream/latest branch (minus stuff like empty
dirs).

So, we can just say: if you want to use pristine-tar, make sure to
commit its contents to the upstream/latest branch (gbp does this by
default anyway).

Note: here I use "upstream/latest" to refer to the branch containing the
upstream code to be used for package builds. It could have a different
name, of course, but that's what DEP14 recommends.

#1106071#180
Date:
2025-08-03 14:02:43 UTC
From:
To:
Yes, exactly. This is what I tried to explain in my previous message,
but Colin has done so way better :)

#1106071#185
Date:
2025-08-03 15:05:18 UTC
From:
To:
---
This patch adds the pristine-tar item to the tag2upload spec. It's based
on Ian's suggested text, with some clarifications. Should be almost
ready.

 tag2upload.5.pod | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/tag2upload.5.pod b/tag2upload.5.pod
index 7206fb4e..f934d210 100644
--- a/tag2upload.5.pod
+++ b/tag2upload.5.pod
@@ -139,6 +139,32 @@ With C<baredebian> quilt modes, this option is mandatory.
 specifies a native source package format,
 or if the targeted archive already contains a suitable orig.)

+=item C<!pristine-tar>=COMMITID
+
+Names a commit containing pristine-tar metadata.
+
+The commit must contain exactly one .id file and one .delta for the current
+upstream release, and their names must correspond to the name of the orig
+tarball, with ".id" and ".delta" appended, respectively.  They must be
+regular files.
+
+The tag must also contain an C<upstream> item, and the tree named in the .id
+file must be identical to that of the C<upstream> commit.
+
+The pristine-tar commit may contain a signature file.  The signature file
+name must correspond to the name of the orig tarball, with ".asc" appended.
+The signature file will then be published together with the orig tarball.
+The signature file is treated as pure data by the service (so will not be
+verified or even format checked).
+
+If an orig tarball needs to be (re)generated, the service will use
+pristine-tar, using precisely the metadata in the aforementioned files.  The
+service will check that the generated tarball is treesame to the tree named
+in the .id file.
+
+The named prstine-tar commit must be reachable from the C<pristine-tar>
+branch in the repository.
+
 =item C<--quilt=QUILT-MODE>

 Specifies the git tree format in use,

#1106071#190
Date:
2025-08-03 16:45:52 UTC
From:
To:
Hello,

Thanks for confirming, Andrea -- we're on the same page.

#1106071#195
Date:
2025-08-04 10:31:02 UTC
From:
To:
Hello,

Can we say that the .delta file must represent an empty change (or
equivalent), since we're not going to support actual deltas?

I think your later text implies this but it would be easier to read if
we said something here too.

Maybe "tree object" or even "git tree object" for readability.

#1106071#200
Date:
2025-08-04 17:25:49 UTC
From:
To:
Hi Sean,

Well, a .delta file represents a binary diff, so an empty binary diff
would result in no modifications to the tar file at all. We want to
support deltas which change things like the stored order of files in the
tarball, or empty directories. So I don't really know how to say this
other than "the resulting tarball must be treesame to the named git
tree".

Maybe something like "the .delta file must not contain changes to the
tarball contents, except for empty directories".

Makes sense.

#1106071#205
Date:
2025-08-04 17:29:59 UTC
From:
To:
Andrea Pappacoda writes ("Bug#1106071: [PATCH dgit v1] tag2upload: add pristine-tar metadata item"):

I think

this phrase is perfect, for a spec.  It is very precise and says
exactly what we mean.

It would be worse to say the same thing again in different words.  In
specs that can lead to ambiguity if one of the descriptions can be
interpreted differently.

Ian.

#1106071#210
Date:
2025-08-04 17:31:31 UTC
From:
To:
Ian Jackson writes ("Re: Bug#1106071: [PATCH dgit v1] tag2upload: add pristine-tar metadata item"):

Just after writing this I had a thought.  If there are things in the
working tree that aren't files or directories or symlinks, what does
git do ?

For our treesame check to be meaningful, we need it to fail, I think.

Since I think we don't want to permit tarballs that contain device
files, sockets, or whatever.

Ian.

#1106071#215
Date:
2025-08-04 17:41:56 UTC
From:
To:
Yeah, git seems to completely ignore device files (and tar does not seem
to support sockets?). So should we explicitly state that only empty
directories are allowed? If so, how do we check that?

#1106071#220
Date:
2025-08-04 17:45:01 UTC
From:
To:
Andrea Pappacoda writes ("Re: Bug#1106071: [PATCH dgit v1] tag2upload: add pristine-tar metadata item"):

Hngh.  (I bet tar does support sockets.  It certainly supports fifos.)

I can't think of a better way than comparing the output of
git ls-files with the output of find \! -type d -print0.

Ian.

#1106071#225
Date:
2025-08-04 21:36:20 UTC
From:
To:
Hello,

Right, okay.  Then I agree with Ian that what you already say is enough.

#1106071#230
Date:
2025-08-04 21:38:07 UTC
From:
To:
Hello,

We encountered a similar problem when writing mini-git-tag-fsck.
I think the right thing to do is to fail, indeed.

#1106071#235
Date:
2025-08-04 21:46:51 UTC
From:
To:
Okay so, for v2 should I:

1. Change "The service will check that the generated tarball is treesame
   to the tree named in the .id file" with "the resulting tarball must
   be treesame to the named git tree".
2. That, but also add ", except for empty directories", implying that
   anything else isn't allowed.
3. Something else?

#1106071#240
Date:
2025-08-05 09:52:54 UTC
From:
To:
Hello,

I think this is everything.  Maybe you could make an MR?

#1106071#245
Date:
2025-08-05 10:05:56 UTC
From:
To:
Submitted as https://salsa.debian.org/dgit-team/dgit/-/merge_requests/264
#1106071#250
Date:
2025-08-16 19:29:47 UTC
From:
To:
Hi!

First of all thanks to Ian and Andrea for working on this tirelessly
since DebConf. For my packages having support for pristine-tar and
using real original tarballs is important and I am looking forward to
see https://salsa.debian.org/dgit-team/dgit/-/merge_requests/264
finished.

I have one question about the design: How will this behave with
repackaged sources?

For example in Godot we use d/copyright Files-Excluded to tell uscan
to repackage the upstream tarball, with a resulting pristine-tar
branch commit and filenames like this:

commit 3aefb8ae3866d43ca1ecd9e58872c2b1cf5c7f39 (HEAD -> pristine-tar)
Author: Travis Wrightsman <travis@wrightsman.org>
Date:   Wed Jul 30 21:28:23 2025 +0200

    pristine-tar data for godot_4.4.1+ds.orig.tar.xz

diff --git a/godot_4.4.1+ds.orig.tar.xz.delta b/godot_4.4.1+ds.orig.tar.xz.delta
new file mode 100644
index 00000000000..e888d17a93b
Binary files /dev/null and b/godot_4.4.1+ds.orig.tar.xz.delta differ
diff --git a/godot_4.4.1+ds.orig.tar.xz.id b/godot_4.4.1+ds.orig.tar.xz.id
new file mode 100644
index 00000000000..5ddcd1b0750
--- /dev/null
+++ b/godot_4.4.1+ds.orig.tar.xz.id
@@ -0,0 +1 @@
+2fddcc20d38b7f802ff624d597436262e28a1058


If is of course debatable if pristine-tar + repackaging makes sense
anymore, as we intentionally break the supply-chain "seal" and the
upstream tarball signature can't be used to verify the tarball in
Debian anymore. I would also accept the outcome that in case of
repacking, tag2upload would opt out from using pristine-tar. But some
could argue that it should be used anyway for consistency due to how
uscan and git-buildpackage are expected to have a certain branch
layout and contents, and pristine-tar would at least ensure that
people working on the package will have the same source (before
upload, when getting sources from git is still relevant).

What are your thoughts on repackaging?

#1106071#255
Date:
2025-08-16 19:50:33 UTC
From:
To:
Hi Otto!

I wouldn't say that I've worked tirelessly on this, but thanks :)

I think it would behave exactly like non-repackaged origs. The
upstream/<num> tag would contain the repackaged code, and the
pristine-tar data would contain the usual binary diff. pristine-tar does
not "know" that the tarball was repackaged, and tag2upload doesn't care
either.

Hope it makes sense to you! If you have any doubt, just ask.

Yeah, to me it doesn't make much sense. But I do use pristine-tar with
some packages with repacked sources, in some packages. I shouldn't, I'll
probably stop, but I'm currently doing so.

Bye :)

#1106071#260
Date:
2025-08-16 19:53:11 UTC
From:
To:
Otto Kekäläinen writes ("Bug#1106071: wanted: tag2upload support for pristine-tar"):

Thanks for bringing this up.  However, as you can see from the start
of this report, we wouldn't generally recommend using pristine-tar
anyway.  We're intending to support it because it's a thing some of
our users will expect.

When the upstream source code is repacked, prstine-tar makes even less
sense.  In those situations, we are making our own tarball anyway.

Our dgit-maint-*(7) workflow manpages give information on maintaining
filtered git branches.  I don't think uscan is a particularly good way
of doing this filtering.  It thinks about things in a very tarball
way.

But I don't think any of this is particularly relevant for this bug.
git-debpush and tag2upload should use pristine-tar data if it is
available.

If you would like to discuss this further, please file a new bug.
I think it's important to keep *this* bug for details of the behaviour
of git-debpush and tag2uplaod's pristine-tar impleentation. [1]

Ian.

[1] For the avoidance of any doubt:

If you disagree, and still think that this has implications for
pristine-tar support in tag2upload, please *still file a new bug*.

So, speaking as a maintainer of the src:dgit package, please do not
post further messages on this topic to *this* bug.  I don't want to
see it derailed with a discussion about uscan and/or tarball
repackaging.

#1106071#265
Date:
2025-08-16 19:57:17 UTC
From:
To:
Yes, this makes sense. The upstream signature verification chain will
be no longer be intact, but anyone auditing the supply-chain will see
the file and the +ds or +ds1 suffix and easily figure out that the
file intentionally isn't exactly the same anymore. In this case the
pristine-tar feature is kind of moot, but it is still of more
"transparent" to a potential auditor to let them find the file in the
same places in the git history as they expect but with a +ds1 suffix,
than to not find it or find a file with original filename but
non-original contents and confuse them on where it got changed and
why.