- Package:
- dgit-infrastructure
- Source:
- dgit-infrastructure
- Submitter:
- Ian Jackson
- Date:
- 2025-08-16 19:58:48 UTC
- Severity:
- normal
tl;dr: tag2upload ought to support, but not recommend or encourage, pristine-tar. But I'm probably not the person to implemnt it. Desirability of pristine-tar ---------------------------- Currently, tsg2upload doesn't support pristine-tar. For new-upstream-version uploads, it will use `git-deborig` which is a thin wrapper around `git-archive`. After #1105862 it will try to detect when the user was trying to use pristine-tar, and fail. pristine-tar's purpose is to mitigate some of the inconvenience of the doctrine that Debian should base its work on, and redistribute, upstream tarballs. Personally, I think that doctrine is obsolete, even harmful, for a large majority of upstreams. Also, pristine-tar is something of a hack and doesn't always work. So my personal view is that pristine-tar is largely pointless complexity to support an inferior workflow - indeed, a workflow that exposes us to greater upstream supply chain risk since upstream tarballs are less trustworthy than upstream git. However, a key goal of tag2upload (and indeed my whole git transition project) is to try to meet people where they are - and that includes supporting partial transitions from tarballs+patches to git. I think pristine-tar falls into this category. Therefore I think tag2upload *should* support pristine-tar. But we should definitely recommend against it, and not put any barriers in the way of people who don't use pristine-tar. Implementation -------------- I have almost never used pristine-tar and I don't intend to adopt it now. I don't really know how it works - what git refs it uses, what the contents are, what invariants it preserves, and so on. I think the design and implementation would have to be done by someone who does understand these things (and can explain them to me). I think the ingredients (and skills needed) would be: * Some new metadata item(s) in the please-upload tag, including details of precisely which pristine-tar git objects are to be used, and maybe what refs they are to be fetched from if that's not obvious. (Security and correctness design; pristine-tar.) * Recheck the code in git-debpush that does pristine-tar detection, which we are currently adding as part of #1105862 (which is just to detect use of pristine-tar and *reject*, to avoid mistakes). If we're going to use it to control the output, rather than merely as a safety catch against mistakes, It needs to be reliable. (Security and correctness design; pristine-tar; bash.) * Code in git-debpush to check that the pristine-tar information is consistent with the rest of the git information. In particular, we must check that the tarball implied by pristine-tar is treesame to the upstream tag. IDK if this is true by pristine-tar's design. (Security and correctness design; pristine-tar; bash.) * Given the design, code in dgit-repos-server to parse the new tag metadata, fetch the pristine-tar objects (easy) and run pristine-tar (probably also easy). (Perl; pristine-tar; help from tag2upload authors.) * Change in tag2upload-service-manager to tolerate but ignore the new critical metadata item in the tag. (Rust; easy.) * Test cases in dgit.git. (Bash; Perl; pristine-tar. Help wrestling the test suite from the src:dgit maintainers.) References ---------- https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1105862 git-debpush check to detect and fail if user wanted pristine-tar https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=891033 request for dgit to use pristine-tar automatically Anyone who is interested in working on this should please get in touch. Ian.
Hello, I agree. Based on my understanding of how it works, which isn't great, these notes cover what's needed.
FTR I am currently in the process of moving the details of the orig-obtaining code from tag2upload_obtain_origs in dgit-repos-server (which runs in the oracle) to a new script in dgit.deb which will run on the builder. This will make these two bugs easier, but means coding for them should probably wait until I'm done. Ian.
Several people have expressed interest in helping out with various tag2upload work, including pristine-tar support and other issues. This is great! I feel we should have some kind of mini-coordination-bof. For those who are here at Debconf, I suggest we some of the post-lunch break tomorrow (Saturday). We'll meet somewhere at 13:30. (I'll investigate tomorrow morning and send another email with a definite location.) This will be a short get-to-know-each-other, with maybe some handwaving and organisational stuff. We may have Sean on IRC but I don't know his schedule - I had to pick a time, and Saturday 13:30 is it. This is very exciting and I look forward to (trying to) put names to faces (perhaps again, sorry!). Also, I mined some email @debian addresses from the internet; sorry if I got those wrong. I'll ping the affected folks on IRC too. Ian. PS I will be leaving promptly at 14:00 because I need to attend Fabian Grünbichler's Rust talk.
I wrote: I need to leave a bit earlier, so I suggest we meet at 13:15. In terms of rooms, I haven't managed to book anywhere. But I think we will be able to make use of the noisy hacklab for this. (The noisy hacklab is near the BoF room.) I hope to see some of you in the Noisy Hacklab, at *13:15*. We will be done by no later than 13:45 so you'll have some gap before talks start. Ian.
Ian Jackson writes ("Bug#1106071: tag2upload contributors coordination mini-bof"):
The Content Team suggest we might use the BoF Room, so we will be
there unless someone else has grabbed it.
Ian.
Ian Jackson writes ("Bug#1106071: Moving t2u orig handling"):
This is now done and the result is in dgit.dgit mainline and also
deployed in production.
The orig handling is now in a script `tag2upload-obtain-origs`
(which can be found at the root or the source tree, and is shipped in
dgit.deb.)
Ian.
Hi! Here's an initial draft of the pristine-tar support. More than something intended to be merged in, this is just a way to get some feedback, trying to understand if I'm on the right track. I did two things: 1. Added a new pristine-tar=commit_id metadata field to the signed tag generated by git-debpush 2. If a pristine-tar commitid is present, the git repo is hard-reset to it, then the tarball name is obtained from the tree, and `pristine-tar checkout` is invoked to generate the tarball. I believe this is safe security-wise because the commit id represented the expected status of the pristine-tar branch on the developer's machine is signed at the time of the upload. If for some reason the branch gets in-between the upload, and the expected commit is lost, things will just fail instead of generating a "wrong" tarball. Do you see flaws in this reasoning? Note: I did not test that my implementation is actually correct and working. I did not add any new test either. I believe there are no regressions in old tests but I'm testing this on a plane and didn't look too much into the log output. Let me know that you think! Bye :)--- git-debpush | 12 +++++++++++- infra/dgit-repos-server | 5 ++++- tag2upload-obtain-origs | 23 ++++++++++++++++++++++- 3 files changed, 37 insertions(+), 3 deletions(-) diff --git a/git-debpush b/git-debpush index e3a4ba39..1782513e 100755 --- a/git-debpush +++ b/git-debpush @@ -457,6 +457,16 @@ if $upstream; then to_push+=("$upstream_tag") fi +# TODO: pristine-tar +# I obtain the commit ID at the time of the upload, so that I can be sure that +# the tag2upload service generates the tarball with the expected pristine-tar +# branch state +pristine_tar_info='' +if pristine_tar_commit=$(git rev-parse --verify --quiet 'refs/heads/pristine-tar'); then + pristine_tar_info=" pristine-tar=$pristine_tar_commit" +fi + + #**** Useful sanity checks **** #---- UNRELEASED suite @@ -837,7 +847,7 @@ fi tagmessage="$source release $version for $target [dgit distro=$distro split$quilt_mode_text] -[dgit please-upload source=$source version=$version$upstream_info] +[dgit please-upload source=$source version=$version$upstream_info$pristine_tar_info] " git_tag_main_opts_args=(-m "$tagmessage" "$debian_tag" "$branch_commit") diff --git a/infra/dgit-repos-server b/infra/dgit-repos-server index f6a3716c..e797123b 100755 --- a/infra/dgit-repos-server +++ b/infra/dgit-repos-server @@ -1304,7 +1304,7 @@ our ($t2u_email_noreply, $t2u_email_noreply_addr, $t2u_email_reply_to, @t2u_email_copies, $t2u_jid, $t2u_url, $t2u_putative_package); our ($t2u_tagger, $t2u_tagger_addr, $t2u_timeout); our ($t2u_signing_keyid); -our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt); +our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt, $t2u_pristinetar); sub t2u_dgit_cmd () { ( @@ -1840,6 +1840,8 @@ sub tag2upload_parsetag ($) { $package = $1; } elsif (s/^version=(\S+) //) { $tagversion = $1; + } elsif (s/^pristine-tar=(\w+) //) { + $t2u_pristinetar = $1; } else { return 0; } @@ -2030,6 +2032,7 @@ END "v=$version", "s=$suite", "u=$t2u_upstreamc", + "pristinetar=$t2u_pristinetar", ); flush EMAIL_REPORT or confess $!; open STDOUT, ">& EMAIL_REPORT" or confess $!; diff --git a/tag2upload-obtain-origs b/tag2upload-obtain-origs index 016fa655..0453c2de 100755 --- a/tag2upload-obtain-origs +++ b/tag2upload-obtain-origs @@ -16,6 +16,7 @@ # optional settings: # # bpd defaults to ../bpd +# pristinetar=PRISTINE-TAR-COMMITID set -eu -o pipefail shopt -s inherit_errexit # #514862, wtf @@ -96,7 +97,27 @@ case "$rc" in ;; esac
I noticed I made some typos in my previous message. Sorry, I'm tired. I meant "representing" instead of "represented". I meant "the branch gets **modified** in-between the upload **and the t2u service build**".
Here there should probably also be a `mv "$tarball" ..`, as pristine-tar checkout generates the tarball in the current working directory.
Andrea Pappacoda writes ("Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support"):
Thanks. This seems to be going in the right direction. Very
exciting!
I have a bunch of comments about various details etc. Please don't be
discouraged by my pickiness. It's no reflection on the quality of
your work. Getting feedback like this from me is entirely normal and
expected :-).
This seems right to me.
Should this be a "critical extension" in X.509 terminology?
This would be the first such, (--quilt ought to have been but is
grandfathered) but I think probably if someone hypothetically somehow
sends this option to a t2u service where this support has not yet been
deployed (or has been withdrawn!) what should happen? Probably it
should fail.
So how about
[dgit ... !pristine-tar=... ...]
or
[dgit ... +pristine-tar=... ...]
or some such?
The new metadata item will need to be documented in tag2upload(5).
By "the tarball name is obtained from the tree" you mean that the orig
tarball name is obtained from the pristine-tar commit's tree.
I think this is correct from my understanding of pristine-tar.
Subject to what I write below about pristine-tar vs HEAD.
Right.
...
Don't we want to do this only if the current upstream version actually
has any informaation in pristine-tar ? If not, then the user didn't
import the tarball, presumably ?
What happens if another maintainer did this new upstream version, and
our user hasn't got that up-to-date pristine-tar branch yet? I think
ideally we'd use the orig from the archive, but there is a risk of
lossage if the uploads come too close together - see ##1109130.
I don't know what those might be. I guess we could leave them for now
and see what things go wrong in reality. Ie, I don't think this is a
blocker.
We might want to linewrap this a bit more. That line is getting quite
long. I guess we could put all the upstream and pristine-tar info in
a separate line maybe?
This part looks correct. Except:
I think this option should be called "pristine_tar".
I don't think this is right, is it ? We want to rewind the local
pristine-tar branch to that commit, not the HEAD. I'm assuming
pristine-tar implicitly uses refs/heads/pristine-tar rather than HEAD.
We should definitely have a test case simulating an attack where the
salsa pristine-tar branch has got strange stuff in it.
We definitely don't want it to be random! Let's call this an error
for now.
If this turns out to be a problem in practice, and users don't want to
just git rm the unwanted info from their pristine-tar branch, then I
guess we'll need to add the tarball name to the tag or something.
If you're using git-ls-tree then you need to use the nul-separated
output format I think. Urgh. Feel free to switch implementation
language.
This part of the code needs to be super-careful and cope with all
weird and unexpected git objects. For that matter, we should probably
try to defend pristine-tar from "weird stuff" so we should probably
check that the tarball metadata file is indeed a *file*.
Ian.
Nice to hear!
Don't worry at all! Please be as picky as you can be. I'm not afraid of
negative feedbacks :)
Yeah, I'd make sense. If I have git-debpush adding pristine-tar
metadata, I'm probably expecting the t2u service to use it. But maybe
it'd make sense to make all extensions critical by default? I can't
think of a scenario where git-debpush would add something which the t2u
service could ignore.
I like "!" more, as "+" to me has more of an "cumulative" feel. And, in
case we go for critical by default, I'd use "?" for
a non-critical/optional field.
I'm not super familiar with Git terminology, but yes.
Great!
Yes, makes perfect sense. I have a couple of packages where
a pristine-tar branch is present but outdated, so this would break all
of them actually :)
Maybe in the t2u service we should check if pristine-tar data exists in
the repo and error out if the pushed tag didn't contain pristine-tar
metadata? Because in any case, user's local builds would still be using
a different orig than the intended one, so you're effectively testing
your package against the wrong upstream code.
You mean like:
[dgit distro=$distro split$quilt_mode_text]
[dgit please-upload source=$source version=$version]
[dgit $upstream_info$pristine_tar_info]
If there's no functional difference, why not.
Doesn't the tag2upload-obtain-origs script check that the option name
only contains [0-9a-z-] characters? I could add an underscore there.
Yes, that's what I wanted to do. Didn't know about the existence of `git
update-ref` (which is what I'm supposed to use here, right?). The tool
also has a `--no-deref` option which makes me kinda nervous (having
heard of all the security issues related to symlinks) - maybe it'd make
sense to use that too?
And yes, pristine-tar looks for that ref specifically.
Yes.
Got it. For the record, `gbp export-orig` generates the tarball
mentioned in the most recent pristine-tar branch commit message. If the
messages do not contain the expected tarball version, it proceeds
assuming gzip, and if that fails, generates a new tarball with git
archive and commits it into the pristine-tar branch. Which is probably
less sensible than using a random one :)
Yes, but how? If there are two tarballs of the same version in
pristine-tar, the user might not even realise that. Which of the two
should be added? Should the user be asked about that by git-debpush?
It's a case which should actually never happen in practise
(pristine-tar's objective is being able of recreating the upstream
tarball, and usually there's just one of them).
Maybe we can make git-deborig check for this too, and error out while
creating the tag?
just in case, which languages are acceptable?
Sure! I haven't been *too* careful while writing this initial patch, but
I'll do better as we finalize it.
Thanks for the review! Will update the code and send a v2. Bye :)
Andrea Pappacoda writes ("Bug#1106071: [RFC PATCH dgit v1] tag2upload: add pristine-tar support"):
We've already doucmented (and implemented in our three existing
parsers) that unknown extensions with the current syntax should be
ignored. So I don't think we should change this now.
Let's go with !.
I think even if #1109130 happens, we can make the upload *fail* rather
than use the wrong orig.
But, I suggest we could deal with this by warning the user (with a
failed check) if the upstream pristine-tar branch is ahead of their
own.
Yes. No, there isn't.
Huh. That makes no sense because it tries to make them into shell
varisbles! This seems like a bug in the script.
We completely control the local ref namespace, because we use a
`git fetch` with refspecs, rather than `git clone`. So there won't be
unexpected symrefs.
Blimey.
That would be fine. In general, things that will definitely cause the
upload to fail should be detected locally if that's reasonably
feasible.
I think I would say Perl is fine but we probably don't want Python
without a compelling reason.
You can assume bash. The set of things installed there is already
much bigger than essential.
Great. I look forward to reading it.
Ian.
Hello, There is already some pristine-tar stuff in git-debpush, the "Intent to use pristine-tar for this upload" check. I think that your changes should be integrated with that. I.e., the conditions under which we currently fail because we think the user wants to use pristine-tar, should be precisely the conditions in which pristine_tar= gets added to the tag. As for everything else, I'll assume Ian's review probably covered it, and I'll take a closer look at your v2. Thank you for helping us with this.
---
Ok! Round two.
Here's a summary of the changes from v1. For git-debpush:
- The pristine-tar checking code is now only run if this is a non-native
package (i.e., "if $upstream").
- The upstream version is used instead of the Debian-revised one.
- Differently from the old pristine-tar check, the code is not run just
for the first (i.e., -1 or -0.1) revision, but for any upload. This
way, the t2u service can potentially handle the case where
a pristine-tar upload was intended, but no orig is available in the
archive yet. Please let me know if this makes sense or not!
For tag2upload-obtain-origs:
- Code is a bit more carefully written (using nul terminated command
output when possible). This also applies to the git-debpush script.
- The pristinetar option has been renamed to pristine_tar, and the
option keys glob has been changed to accept underscores instead of
dashes.
- The process will fail if there is more than one pristine-tar orig.
- It is now checked that pristine-tar metadata is a regular file
(according to git ls-tree)
- git update-ref is not used to rewind the pristine-tar branch, instead
of git reset --hard.
I did not add the "critical extension" stuff yet. Also, what should we
do about the signature files which pristine-tar can optionally store and
retrieve?
git-debpush | 41 ++++++++++++++++++++++++++---------------
infra/dgit-repos-server | 7 ++++++-
tag2upload-obtain-origs | 38 ++++++++++++++++++++++++++++++++++++--
3 files changed, 68 insertions(+), 18 deletions(-)
diff --git a/git-debpush b/git-debpush
index e3a4ba39..78e42fb9 100755
--- a/git-debpush
+++ b/git-debpush
@@ -457,6 +457,30 @@ if $upstream; then
to_push+=("$upstream_tag")
fi
+# I obtain the commit ID at the time of the upload, so that I can be sure that
+# the tag2upload service generates the tarball with the expected pristine-tar
+# branch state
+pristine_tar_info=''
+if $upstream; then
+ uversion="${version%-*}"
+
+ if pristine_tar_commit=$(git rev-parse --verify --quiet 'refs/heads/pristine-tar'); then
+ pristine_tar_tarballs=$(git ls-tree -z --name-only -- 'refs/heads/pristine-tar' \
+ | grep -zF -- "${source}_${uversion}.orig.tar." \
+ | grep -zc -- "\.id$")
+
+ if [ "$pristine_tar_tarballs" -gt 1 ]; then
+ fail 'more then one pristine-tar orig'
+ fi
+
+ # If there's no tarball, the user probably stopped using pristine-tar a
+ # while ago, but didn't delete the branch. Just ignore it.
+ if [ "$pristine_tar_tarballs" -eq 1 ]; then
+ pristine_tar_info=" pristine-tar=$pristine_tar_commit"
+ fi
+ fi
+fi
+
#**** Useful sanity checks ****
#---- UNRELEASED suite
@@ -522,20 +546,6 @@ case "$branch" in
fi
esac
-#---- Intent to use pristine-tar for this upload
-
-case "$version" in
- *"-1"|*"-0.1")
- uversion="${version%-*}"
- if $upstream && type pristine-tar >/dev/null 2>/dev/null \
- && pristine-tar list \
- | grep -q "^${source}_${uversion}"'\.orig\.tar\.'
- then
- fail_check pristine-tar \
- "pristine-tar data present for $uversion, but this will be ignored (#1106071)"
- fi
-esac
-
#---- Submodules
# Per gitmodules(7) "FORMS", .gitmodules is always present at the
@@ -837,7 +847,8 @@ fi
tagmessage="$source release $version for $target
[dgit distro=$distro split$quilt_mode_text]
-[dgit please-upload source=$source version=$version$upstream_info]
+[dgit please-upload source=$source version=$version]
+${upstream_info:+[dgit $upstream_info$pristine_tar_info]}
"
git_tag_main_opts_args=(-m "$tagmessage" "$debian_tag" "$branch_commit")
diff --git a/infra/dgit-repos-server b/infra/dgit-repos-server
index f6a3716c..96058d56 100755
--- a/infra/dgit-repos-server
+++ b/infra/dgit-repos-server
@@ -1304,7 +1304,7 @@ our ($t2u_email_noreply, $t2u_email_noreply_addr, $t2u_email_reply_to,
@t2u_email_copies, $t2u_jid, $t2u_url, $t2u_putative_package);
our ($t2u_tagger, $t2u_tagger_addr, $t2u_timeout);
our ($t2u_signing_keyid);
-our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt);
+our ($t2u_upstreamc, $t2u_upstreamt, $t2u_quilt, $t2u_pristine_tar);
sub t2u_dgit_cmd () {
(
@@ -1840,6 +1840,8 @@ sub tag2upload_parsetag ($) {
$package = $1;
} elsif (s/^version=(\S+) //) {
$tagversion = $1;
+ } elsif (s/^pristine-tar=(\w+) //) {
+ $t2u_pristine_tar = $1;
} else {
return 0;
}
@@ -2031,6 +2033,9 @@ END
"s=$suite",
"u=$t2u_upstreamc",
);
+ if (length $t2u_pristine_tar) {
+ push(@obtain_origs, "pristine_tar=$t2u_pristine_tar")
+ }
flush EMAIL_REPORT or confess $!;
open STDOUT, ">& EMAIL_REPORT" or confess $!;
t2u_b_run_fetch_cmd_errok 'work', @obtain_origs;
diff --git a/tag2upload-obtain-origs b/tag2upload-obtain-origs
index 016fa655..73a23bea 100755
--- a/tag2upload-obtain-origs
+++ b/tag2upload-obtain-origs
@@ -16,6 +16,7 @@
# optional settings:
#
# bpd defaults to ../bpd
+# pristine_tar=PRISTINE-TAR-COMMITID
set -eu -o pipefail
shopt -s inherit_errexit # #514862, wtf
@@ -32,7 +33,7 @@ while [ $# != 0 ]; do
k="${1%%=*}"
v="${1#*=}"
case "$k" in
- *[^0-9a-z-]*) fail "bad syntax for setting" ;;
+ *[^0-9a-z_]*) fail "bad syntax for setting" ;;
*)
eval "s_$k=\$v"
;;
@@ -96,7 +97,40 @@ case "$rc" in
;;
esac
Hello,
Thanks. I've included some inline comments below.
I think it would be helpful to work on the spec in tag2upload(5) before
continuing too much with code. It'll make it easier to keep the three
of us on the same page.
ITYM your new code, right?
The old check already had both these properties.
It might make sense, I'm not sure yet. Can you describe a concrete
example that would lead to this being helpful?
Can you explain why you've put this in at this point in the script? I
think that maybe it should go later, after all the sanity checks.
I take it you switched from invoking pristine-tar itself to calling
git-ls-tree in order to use NUL termination? If so, maybe we should
make that change first to the existing check. Perhaps you could prepare
an MR to that effect.
Generally we avoid parentheses on builtin operators and use poetry
style, so
push @obtain_origs, "pristine_tar=$t2u_pristine_tar"
if $t2u_pristine_tar;
Do you think we could extract them and include them in the upload?
I think we can verify them by using the upstream key embedded in the
source package, right? And if that verification fails we should
probably abort the upload -- maintainers who choose to use tarball
signatures had better make sure they verify.
Hi Sean, Ok :) Documentation work has arrived earlier than I had anticipated... Yes. I copied them from the old check :) This is meant as a way to handle the potential issue described by Ian in <26754.44285.166852.764186@chiark.greenend.org.uk>. Unless I have misunderstood the issue, of course! If one does a -2 upload and the archive does not have the orig yet, and t2u has a reference to the pristine-tar branch, it can (safely?) re-create the tarball as it would be bit-by-bit identical to the already uploaded one. Does it make sense? No, I cannot explain that :) There's really no reason why really, I just tried to put everything pristine-tar related in the same place. Thinking about it, these checks can only go before obtaining the pristine_tar_info, because I cannot reasonably get the pristine-tar info before first making sure there's just one orig. Kind of. I first wrote the checks in tag2upload-obtain-origs using plain shell and git, and then simply copied them back here. This was before looking at the existing pristine-tar check. But yes, pristine-tar does not use nul termination. You mean I should write a separate patch for the check and submit it independently from this patch? I'd like to finish this patch in a reasonable time, so maybe it doesn't make sense to fixup a local check which is going to be removed soon anyway? Thanks! Also, the code mixes tabs and spaces for indentation; looking at the diff here made me remember that. Not my fault! Yes, that'd be the correct thing to do. What I wasn't sure about is whether t2u should just checkout the signature and upload it to the archive, verify it as well, or not do anything at all. In other words: do we have to do verification here, or does it happen after sending everything to the archive with dput? In that case, would it make sense to duplicate the verification? Thanks for the review, Sean!
Sean Whitton writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support"):
Yes, I very much agree. Spec work on the protocol ought to precede
the code.
I'm pretty sure Andrea has this right.
If we have pristine-tar support we should engage it whenever we are
doing upstream handling (ie for non-native source formats) and this
ought not to depend on the version number.
The -1 etc. thing is just a guess really. That's fine for a check,
but it's not fine for functional code.
Of course the pristine-tar codepath is entitled to decide that
pristine-tar ought not to be applied to this upload, using its
pristine-tar-specific knowledge.
Does pristine-tar convey signature files? If so we should definitely
support them.
I don't think I agree that we ought to be doing signature verification
that isn't related to our functioning.
In particular, this means that if we were to accept an upload with no
signature file, we should accept an upload with a signature file we
can't verify for whatever reason. Since we are obviously not relying
on the signature if we don't mind if it's totally absent.
I know that this is not standard in Debian tooling but I find the
approach of Debian's tooling contrary to reasonable cryptographic
protocol design.
I guess a signature you can't verify might be a failed check, but,
really, is a maintainer *actually* going to get as far as git-debpush
without having discovered the signature doesn't verify in their local
environment? They've probably *run* the upstream code by then.
Or to put it another way, a signature that won't verify probably means
that the upload was previously done by another maintainer or on
another system where the right public key *was* available, but it's
not available here and now. It doesn't seem to me that it is likely
to mean "this is an attack and we should stop" or anything like that.
Questions like the ones we discuss above are examples of reasons why
it is a good idea to nail down the spec before writing code. Deciding
on correct behaviour in advance saves rework (and rework is extra
effort and often leads to additional confusion and additional bugs).
Ian.
Hi again! I tried to add to the tag2upload.5 manpage the pristine-tar handling design outlined in our discussions, which is inline below. Still, I have a few questions: What should we do with that upstream commit metadata? pristine-tar does not need that, since it'll generate the tarball from the git tree id stored in source_version.orig.tar.id. Still, we might want to make sure that the pristine-tar tree corresponds to the one of the upstream commit id. I don't know how useful this would be though, since the delta may contain additional file additions and removals. Also, what should we do with such tarballs whose contents are not identical to the git tree? In the text below, I assume that: - We want to verify equality of upstreamc's tree and the one used by pristine-tar. - We allow binary deltas (i.e., the .delta file) to contain modifications to files stored in the referenced tree, such as the addition of configure scripts. Here it is: =item C<pristine-tar>=COMMITID Identifies the state of the pristine-tar branch at the time of push, if present and containing data related to the current upstream version. If this metadata item is present, the C<upstream> and C<upstream-tag> items must be present too. The tag2upload service will ensure that the tree contained in the .id file of the pristine-tar branch will correspond to the tree referenced by the commit id contained in the C<upstream> metadata item. If the pristine-tar branch contains a signature file, this will be published together with the orig tarball, and no signature verification will be performed.
Hello, Right, I see, thank you. I would suggest it should go in the section marked "Gather git history information". What I was thinking is that changing to use git(1) instead of pristine-tar(1) is a logically distinct change from changing from a check to embedding pristine-tar info in the tag. So they should be separate commits anyway, and we'd want to run the full test suite against both of them. While we are still discussing design you could get the first change out of the way with a MR now. Yeah, this is our inconsistent use of Emacs, sorry about that. Just don't worry about it. Do you mean whether dak does any verification? I don't know.
Hello, Sorry, but which metadata is that? Trying to read your patch, I think the fact I don't use pristine-tar is really showing. Is the .id file defined somewhere? Is your knowledge of the pristine-tar branch contents from reading a spec, or empirical? Glad we have someone who knows it better working on it.
Hi, Will do! Put this way, it makes sense. Will send another patch soon. Great :) I think that for the time being, publishing the signature without extra processing is the most appropriate solution. We always have time to revise it if needed.
Hello, ACK.
Andrea Pappacoda writes ("Re: Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support"):
I think we decided it should start with !.
I will add something to the spec about critical extensions starting
with !.
This is ind kof in the wrong mood. It reads like a description of
when git-debpush should include it. It ought instead to be a
specification of what meaning of the item is.
Something like
Names a commit containing pristine-tar metadata.
The commit must contain SOMETHING LIKE exactly one .id file with
SOME PROPERTIES OR OTHER. The .id file MUST SATISFY SOME
CONDITIONS THAT I DON'T UNDERSTAND.
The tag must also contain an C<upstream> item, and the tree named in
the .id file must be identical to that of the C<upstream> commit.
The pristine-tar commit may contain SOMEHOW IDENTIFIABLE signature
file. The signature file MUST SATISFY REASONAB.E CONDITIONS SUCH AS
ITS FILENAME BEING SANE. The signature file will then be published
together with the orig tarball. The signature file is treated as
pure data by the service (so will not be verified or even format
checked).
If an orig tarball needs to be (re)generated, the service will use
pristine-tar, using precixely the metadata in the .id file. The
service will check that the generated tarball MATCHES THE HASH IN
THE .ID FILE and that its contained tree is identical to SOMETHING.
The named prstine-tar commit must be reachable from the
C<pristine-tar> branch in the repository.
Ian.
It's likely that I didn't explain myself correctly. I meant the existing
upstream= and upstream-tag= metadata fields which git-debpush already
uses. The pristine-tar tool does not need those to generate a tarball,
but I believe it's still useful to include them alongside the
pristine-tar= metadata field to compare the pristine-tar tree to the
tree of the git commit contained in the upstream= metadata field.
Hope it's clearer now! If not, here's some code which should express my
intent less ambiguously than in English.
pristine_tar_tree_id=$(git cat-file -- blob "${s_pristine_tar}:${tarball}.id")
upstream_commit_tree_id=$(git rev-parse --verify --end-of-options "${s_u}^{tree}")
if [ "$pristine_tar_tree_id" != "$upstream_commit_tree_id" ]; then
fail 'pristine-tar tree id differs from the upstream commit one'
fi
Kind of both. The pristine-tar(1) manpage says, under the `pristine-tar
commit _tarball_ _upstream_` section:
So yes, pristine-tar specifies that it stores the tree id somewhere. It
does not explicitly say where (well, not in the manpages), but it does
store that tree id inside a file named as the input tarball with ".id"
appended (as shown in its source code). This is not configurable, and
pristine-tar also looks for such file when running `pristine-tar
checkout`, so it cannot change really, otherwise new pristine-tar
versions would be unable to extract old tarballs, which defeats the
purpose of the tool.
The ".delta" file is explicitly mentioned in the manpage, just below the
paragraph I quoted before.
Thanks!
Hello, Thanks. We don't want to depend on the pristine-tar field for anything other than obtaining the orig.tar, so we would definitely want to keep the upstream= and upstream-tag= fields no matter what. Thanks, I understand now.
Hi Andrea, Have you had a chance to look at the following?
Hi Sean! Sorry, I thought I had replied already. Thanks for the reminder :) The branch must contain exactly one .id file per upstream release. Its name should correspond to the name of the orig tarball, with the ".id" suffix. The file must be a regular file. Yes. In practise, pristine-tar always stores the signature file as "orig_name.asc". So I think we could just specify this requirement here. Yes. I'm not sure I get this part, but if you meant what I understood, then it's wrong. The .id file does not contain the hash of the tarball, it contains a single line which corresponds to the tree id, as mentioned above. I'm honestly not sure where the hash verification happens, but *i believe* it's part of the reconstruction when pristine-gz and co re run, thanks to information stored in the .delta (VCDIFF) file. Yes. One question remains unanswered. Should we allow .delta files modifying the tarball contents (i.e., do we want to allow generating tarballs which have different contents then the git tree)?
I've come back from a party and am a bit tipsy so I will read this
properly later, but:
Thanks for engaging with these questions!
Andrea Pappacoda writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support"):
I think in principle it might be a .sig.
So the .id contains the tree (git tree object) which uniquely
identifies the *contents* of the tarball.
But how does the pristine-tar information specify the precise hash of
the tarball itself? Does the .delta file say what the output hash is
supposed to be ?
I don't think I fully understand the implications. My default
position is that the answer should be "no" unless one of us *does*
understand the implications :-).
Regards,
Ian.
I'm replying to your email after a small party too, but at least I have
slept a couple of hours :)
Maybe yes, but regardless of the input signature filename, pristine-tar
always stores the signature in Git with a name of orig.asc. Also,
doesn't dpkg-source look for .asc files only?
Yes, but see below.
Yes, I've checked now and the .delta contains the expected SHA256 hash.
the orig tarball contains empty dirs, which are not representable in
Git. As an example:
$ tar -xvzf mypackage_1.0.orig.tar.gz
mypackage/
mypackage/file.txt
mypackage/empty_dir/
$ cd mypackage
$ git init -b upstream/latest
$ git add --all
$ git commit -m init
$ git show pristine-tar:mypackage_1.0.orig.tar.gz.id | xargs git show
tree 385d33e969fefd23b8efaca69c1d2db507ce0daf
file.txt
$ pristine-tar commit ../mypackage_1.0.orig.tar.gz upstream/latest
$ rm ../mypackage_1.0.orig.tar.gz
$ pristine-tar --debug checkout mypackage_1.0.orig.tar.gz
pristine-tar: set subdir to mypackage
pristine-tar: subdir is mypackage
pristine-tar: mypackage/empty_dir/ is listed in the manifest but may not be present in the source directory
pristine-tar: creating missing mypackage/empty_dir/
pristine-tar: doing full tree sweep to catch missing files
pristine-tar: successfully generated mypackage_1.0.orig.tar.gz
$ tar -tzf mypackage_1.0.orig.tar.gz
mypackage/
mypackage/file.txt
mypackage/empty_dir/
One different example which may illustrates the "unexpected" results
which this could lead to is this one. Here, the tarball is created with
a file containing "evil" content, while in the upstream/latest branch
only the "good" content is stored. Upon tarball checkout, the good
content gets replaced with the evil one:
$ mkdir repo
$ echo evil > repo/file.txt
$ tar -czf repo_1.0.orig.tar.gz repo
$ echo good > repo/file.txt
$ cd repo
$ git init -b upstream/latest
$ git add --all
$ git commit -m init
$ pristine-tar commit ../repo_1.0.orig.tar.gz upstream/latest
$ git show pristine-tar:repo_1.0.orig.tar.gz.id
ca1cc63dd18610bc64a150397556d33e850a61e8
$ git rev-parse --verify --end-of-options 'upstream/latest^{tree}'
ca1cc63dd18610bc64a150397556d33e850a61e8
$ git show ca1cc63dd18610bc64a150397556d33e850a61e8:file.txt
good
$ rm ../repo_1.0.orig.tar.gz
$ pristine-tar checkout repo_1.0.orig.tar.gz
$ tar -xvzf repo_1.0.orig.tar.gz
repo/
repo/file.txt
$ cat repo/file.txt
evil
Even though both the pristine-tar .id file and the upstream/latest
branch point to the same tree id, the binary .delta contains
modifications to file.txt which change the contents from "good" (stored
in the git tree) to "evil" upon orig checkout.
Even though this example is artificial (the tarball contents are usually
committed to version control after it has been downloaded, not before),
it would still theoretically be possible for a malicious maintainer to
sneak a backdoor in (like in the xz backdoor case, but with the extra
step of also having a Debian maintainer collaborate). So I'm inclined to
say "sorry, no, this is too dangerous".
It is also true that this is currently allowed in regular Salsa repos,
so allowing this would not really make the situation worse.
The thing is: how do we disallow this? I'm not aware of any pristine-tar
switch which makes it fail when such .delta file performing file content
modifications exists. Do we have to perform our own checking *after* the
tarball is checked out, by e.g. extracting it again on top of the
upstream commit tree and making sure no differences exist? Hacky but may
work.
Let me know! Bye :)
Hello, Okay, could you rewrite this part, then? It might be a good time to open an MR adding the latest version of your text to tag2upload(5). ISTM that we should allow this as otherwise we would not be supporting many pristine-tar users.
Hello, I thought that the .delta files were mostly to cover, for example, the tarball containing autotools-generated files that aren't in git? Isn't that a key use case?
Sean Whitton writes ("Bug#1106071: [RFC PATCH dgit v2] tag2upload: add pristine-tar support"):
Not according to Colin in the "want Jia Tan option" bug,
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1109423#15
Empty directories are a corner case but git will consider them
treesame so if we do the check in git all will be well.
Ian.
Hello, Ah, right, thanks. This is confusing, huh? Then indeed, let's not support them.
The main undeniable advantage of pristine-tar is regenerating a tarball which is bit-by-bit identical to the upstream one, without having to keep the actual tarball around. This is useful for source reproducibility use cases (ignoring that Git is better for this anyway). I argue that containing autotools-generated files is not the main use case because in the usual git-buildpackage workflow you actually import the tarballs into git, so the Debian git tree has the autotools stuff as well. When one uses a mixed upstream git + tarballs gbp workflow, the tarball contents gets applied as a new commit on top of the upstream git tag. So, even there, the contents of the tarball match the contents of the git tree pointed by the upstream/latest branch (minus stuff like empty dirs). So, we can just say: if you want to use pristine-tar, make sure to commit its contents to the upstream/latest branch (gbp does this by default anyway). Note: here I use "upstream/latest" to refer to the branch containing the upstream code to be used for package builds. It could have a different name, of course, but that's what DEP14 recommends.
Yes, exactly. This is what I tried to explain in my previous message, but Colin has done so way better :)
--- This patch adds the pristine-tar item to the tag2upload spec. It's based on Ian's suggested text, with some clarifications. Should be almost ready. tag2upload.5.pod | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/tag2upload.5.pod b/tag2upload.5.pod index 7206fb4e..f934d210 100644 --- a/tag2upload.5.pod +++ b/tag2upload.5.pod @@ -139,6 +139,32 @@ With C<baredebian> quilt modes, this option is mandatory. specifies a native source package format, or if the targeted archive already contains a suitable orig.) +=item C<!pristine-tar>=COMMITID + +Names a commit containing pristine-tar metadata. + +The commit must contain exactly one .id file and one .delta for the current +upstream release, and their names must correspond to the name of the orig +tarball, with ".id" and ".delta" appended, respectively. They must be +regular files. + +The tag must also contain an C<upstream> item, and the tree named in the .id +file must be identical to that of the C<upstream> commit. + +The pristine-tar commit may contain a signature file. The signature file +name must correspond to the name of the orig tarball, with ".asc" appended. +The signature file will then be published together with the orig tarball. +The signature file is treated as pure data by the service (so will not be +verified or even format checked). + +If an orig tarball needs to be (re)generated, the service will use +pristine-tar, using precisely the metadata in the aforementioned files. The +service will check that the generated tarball is treesame to the tree named +in the .id file. + +The named prstine-tar commit must be reachable from the C<pristine-tar> +branch in the repository. + =item C<--quilt=QUILT-MODE> Specifies the git tree format in use,
Hello, Thanks for confirming, Andrea -- we're on the same page.
Hello, Can we say that the .delta file must represent an empty change (or equivalent), since we're not going to support actual deltas? I think your later text implies this but it would be easier to read if we said something here too. Maybe "tree object" or even "git tree object" for readability.
Hi Sean, Well, a .delta file represents a binary diff, so an empty binary diff would result in no modifications to the tar file at all. We want to support deltas which change things like the stored order of files in the tarball, or empty directories. So I don't really know how to say this other than "the resulting tarball must be treesame to the named git tree". Maybe something like "the .delta file must not contain changes to the tarball contents, except for empty directories". Makes sense.
Andrea Pappacoda writes ("Bug#1106071: [PATCH dgit v1] tag2upload: add pristine-tar metadata item"):
I think
this phrase is perfect, for a spec. It is very precise and says
exactly what we mean.
It would be worse to say the same thing again in different words. In
specs that can lead to ambiguity if one of the descriptions can be
interpreted differently.
Ian.
Ian Jackson writes ("Re: Bug#1106071: [PATCH dgit v1] tag2upload: add pristine-tar metadata item"):
Just after writing this I had a thought. If there are things in the
working tree that aren't files or directories or symlinks, what does
git do ?
For our treesame check to be meaningful, we need it to fail, I think.
Since I think we don't want to permit tarballs that contain device
files, sockets, or whatever.
Ian.
Yeah, git seems to completely ignore device files (and tar does not seem to support sockets?). So should we explicitly state that only empty directories are allowed? If so, how do we check that?
Andrea Pappacoda writes ("Re: Bug#1106071: [PATCH dgit v1] tag2upload: add pristine-tar metadata item"):
Hngh. (I bet tar does support sockets. It certainly supports fifos.)
I can't think of a better way than comparing the output of
git ls-files with the output of find \! -type d -print0.
Ian.
Hello, Right, okay. Then I agree with Ian that what you already say is enough.
Hello, We encountered a similar problem when writing mini-git-tag-fsck. I think the right thing to do is to fail, indeed.
Okay so, for v2 should I: 1. Change "The service will check that the generated tarball is treesame to the tree named in the .id file" with "the resulting tarball must be treesame to the named git tree". 2. That, but also add ", except for empty directories", implying that anything else isn't allowed. 3. Something else?
Hello, I think this is everything. Maybe you could make an MR?
Submitted as https://salsa.debian.org/dgit-team/dgit/-/merge_requests/264
Hi!
First of all thanks to Ian and Andrea for working on this tirelessly
since DebConf. For my packages having support for pristine-tar and
using real original tarballs is important and I am looking forward to
see https://salsa.debian.org/dgit-team/dgit/-/merge_requests/264
finished.
I have one question about the design: How will this behave with
repackaged sources?
For example in Godot we use d/copyright Files-Excluded to tell uscan
to repackage the upstream tarball, with a resulting pristine-tar
branch commit and filenames like this:
commit 3aefb8ae3866d43ca1ecd9e58872c2b1cf5c7f39 (HEAD -> pristine-tar)
Author: Travis Wrightsman <travis@wrightsman.org>
Date: Wed Jul 30 21:28:23 2025 +0200
pristine-tar data for godot_4.4.1+ds.orig.tar.xz
diff --git a/godot_4.4.1+ds.orig.tar.xz.delta b/godot_4.4.1+ds.orig.tar.xz.delta
new file mode 100644
index 00000000000..e888d17a93b
Binary files /dev/null and b/godot_4.4.1+ds.orig.tar.xz.delta differ
diff --git a/godot_4.4.1+ds.orig.tar.xz.id b/godot_4.4.1+ds.orig.tar.xz.id
new file mode 100644
index 00000000000..5ddcd1b0750
--- /dev/null
+++ b/godot_4.4.1+ds.orig.tar.xz.id
@@ -0,0 +1 @@
+2fddcc20d38b7f802ff624d597436262e28a1058
If is of course debatable if pristine-tar + repackaging makes sense
anymore, as we intentionally break the supply-chain "seal" and the
upstream tarball signature can't be used to verify the tarball in
Debian anymore. I would also accept the outcome that in case of
repacking, tag2upload would opt out from using pristine-tar. But some
could argue that it should be used anyway for consistency due to how
uscan and git-buildpackage are expected to have a certain branch
layout and contents, and pristine-tar would at least ensure that
people working on the package will have the same source (before
upload, when getting sources from git is still relevant).
What are your thoughts on repackaging?
Hi Otto! I wouldn't say that I've worked tirelessly on this, but thanks :) I think it would behave exactly like non-repackaged origs. The upstream/<num> tag would contain the repackaged code, and the pristine-tar data would contain the usual binary diff. pristine-tar does not "know" that the tarball was repackaged, and tag2upload doesn't care either. Hope it makes sense to you! If you have any doubt, just ask. Yeah, to me it doesn't make much sense. But I do use pristine-tar with some packages with repacked sources, in some packages. I shouldn't, I'll probably stop, but I'm currently doing so. Bye :)
Otto Kekäläinen writes ("Bug#1106071: wanted: tag2upload support for pristine-tar"):
Thanks for bringing this up. However, as you can see from the start
of this report, we wouldn't generally recommend using pristine-tar
anyway. We're intending to support it because it's a thing some of
our users will expect.
When the upstream source code is repacked, prstine-tar makes even less
sense. In those situations, we are making our own tarball anyway.
Our dgit-maint-*(7) workflow manpages give information on maintaining
filtered git branches. I don't think uscan is a particularly good way
of doing this filtering. It thinks about things in a very tarball
way.
But I don't think any of this is particularly relevant for this bug.
git-debpush and tag2upload should use pristine-tar data if it is
available.
If you would like to discuss this further, please file a new bug.
I think it's important to keep *this* bug for details of the behaviour
of git-debpush and tag2uplaod's pristine-tar impleentation. [1]
Ian.
[1] For the avoidance of any doubt:
If you disagree, and still think that this has implications for
pristine-tar support in tag2upload, please *still file a new bug*.
So, speaking as a maintainer of the src:dgit package, please do not
post further messages on this topic to *this* bug. I don't want to
see it derailed with a discussion about uscan and/or tarball
repackaging.
Yes, this makes sense. The upstream signature verification chain will be no longer be intact, but anyone auditing the supply-chain will see the file and the +ds or +ds1 suffix and easily figure out that the file intentionally isn't exactly the same anymore. In this case the pristine-tar feature is kind of moot, but it is still of more "transparent" to a potential auditor to let them find the file in the same places in the git history as they expect but with a +ds1 suffix, than to not find it or find a file with original filename but non-original contents and confuse them on where it got changed and why.