- Package:
- devscripts
- Source:
- devscripts
- Description:
- scripts to make the life of a Debian Package maintainer easier
- Submitter:
- Hans-Christoph Steiner
- Date:
- 2025-03-23 15:51:02 UTC
- Severity:
- wishlist
Whenever mk-origtargz is repacking a zipball, it should zero out the timestamps in the tar format so that the process produces the same tarball every time it runs. This can be done using tar's --mtime= flag. Additionally, it would be very useful if mk-origtargz also had a --mtime option which forced the tarball to be repacked using the date given to the --mtime="Wed Oct 28 10:12:27 2015 -0700" flag. Here's an example of how to do that in perl: https://stackoverflow.com/a/16728218 This gets us ever closer to the goals of reproducible builds, where we can guarantee that a given original source code, the resulting binaries are always the same. For more on that topic: https://reproducible-builds.org/
Hi, This is an "important wishlist" :-) Let's read the date from debian/changelog top entry and set mtime as described here. Currently, mk-origtargz calls gzip with "-n" xz and bzip2 does not seem to have such option and we set no flag. None of these are guranteed to produce the same result since compression seems to be arch dependent (at least gzip) If you know any options to improve REPRODUCEBILITY of gzip/xz/bzip2, let us know. Osamu
Hi, Second thought ... uscan/mk-origtargz/uupdate is not run during the binary package building process. Does the reproducible build aims to create source package in reproducible way? If reproducible build is aiming for binary build reproducibility, changing behavior of uscan/mk-origtargz/uupdate has no impact. ... Why you need this? unzip preserves file timestamps inside of zip archive. Am I right? Is this something we need to do for repacking of tar.gz? Yah, if it is needed. Well ... it is simpler than this as long as we know what date to set. Just run tar with --mtime option in the code with the reference file or date string. Regards, Osamu
Osamu Aoki: We want to have the whole process able to be inspected, including the process of making the source tarballs. But yes, binary reproducibility is more important. In this case, it is pretty easy to make reproducible source tarballs, so I think its worth doing. I believe unzip will preserve the timestamps. As long as mk-origtargz has an --mtime option, then we can use the most appropriate date. For example, with Android SDK packages, we can get the git commit date of the release, since upstream does not post release tarballs, only git tags. It is this use case that made me want mk-origtargz to support --mtime. .hc
Hi, Please also remember that reproducing upstream content including the file time stamp is important factor. So why you wish to overwrite mtime? Does the upstream release zipball with different time stamp everytime user request to download? Please be concrete on the needs with actual example package so we are not expanding on fantasy. If we add features, we need to add infinite number of them unless there is a strong case which makes addition useful. Does android SDK zip ball has rondom timestamp inside zipball? Regards, Osamu
Osamu Aoki: Yes, Google's http://googlesource.com website provides nice .tgz download links for every commit, but those tarballs are different everytime. http://googlesource.com uses the current date/time as the time stamp each time you download it. The timestamp is the mostly likely variation when producing source tarballs from git/etc. .hc
Hi, OK. This is deprecated source but now you are tyalking about not just zapball but tarball. OK so your feature request is to have such option not just for zip but for all archive. That makes some sense to me now. Good night. I need some time to think. Osamu
user reproducible-builds@lists.alioth.debian.org usertags 807270 toolchain thanks Hey all, Adding a Reproducible Builds usertag and pinging the ML -- I hadn't spotted this wishlist bug before. Best wishes,
That parameter was explicitly added for reproducibility here: https://salsa.debian.org/kernel-team/linux/-/commit/ea024852d4 The Debian kernel team switched from their own 'genorig.py' script to using ``uscan``, which IIUC invokes mk-origtargz here: https://salsa.debian.org/kernel-team/linux/-/commit/55243dbd8d6842f But I want to use my local clone of the upstream kernel instead of downloading ~250MB each time, so I want to restore that 'genorig.py' script for myself, but still get identical results. The sha256sums of the uncompressed tar archives are identical and diff-ing the extracted orig.tar.xz archives showed no difference at all. So I went looking what could be the reason why the sha256sums of the orig.tar.xz files were different. And that's when I found the first mentioned commit. And reproducibility is good, so it seems best if mk-origtargz is improved to produce reproducible results. So a +1 on this feature request from me. Cheers, Diederik
+1 on reproducible tarballs. I've been spending way too much time to achieve this for 'make dist' tarballs of a couple of projects (libtasn1, libidn2, inetutils, ...). It is not a simple matter. Modification time of files is used by 'make' for dependency rebuild ordering and may also end up as timestamps inside files. "Diederik de Haas" <didi.debian@cknow.org> writes: Here is one resource to read for more hints: https://www.gnu.org/software/tar/manual/html_node/Reproducibility.html /Simon
sure, +1, patches welcome! :) \o/
Holger Levsen <holger@layer-acht.org> writes: Attached starting point, thoughts? https://salsa.debian.org/debian/devscripts/-/merge_requests/490 The patch needs review/improvement from those more familiar with mk-origtargz and the debian/tests/ framework. My main argument is that solving this is harder than it looks, and I fear that solving the general problem here may actually be infeasible. It can help to realize this, otherwise one may think that solving this is just a matter of adding the right parameters (which is what the patch attempt to do). While we could attempt to continue patch things, how about a bigger question: why do we re-create tarballs? I guess there are many different use-cases, but I believe some of them are symptoms of some bigger problem. The solution in those use-cases isn't to improve reproducability of tarball re-creation, it is to avoid creating our own tarballs. Maybe some use-cases really do require us to re-create tarballs, and maybe in those particular cases designing a solution to the --mtime concern is feasible. For those wanting to understand why solving the --mtime concern is a hard problem, here is a partial helper tool to aid with this: https://lists.gnu.org/archive/html/bug-gnulib/2025-02/msg00166.html I dislike all that complexity though, so for some upstream projects (libtasn1, libidn2, inetutils, ...) I am using a heavy hammer like this: TAR_OPTIONS += --mode=go+u,go-w --mtime=$(abs_top_srcdir)/NEWS mtime-NEWS-to-git-HEAD: $(AM_V_GEN)if test -e $(srcdir)/.git \ && command -v git > /dev/null; then \ touch -m -t "$$(git log -1 --format=%cd --date=format-local:%Y%m%d%H%M.%S)" $(srcdir)/NEWS; \ fi We could do the same in Debian, replacing NEWS with last timestamp of debian/changelog, but it is important to remember that this is an ugly workaround rather than a solution. Solving it like this will lead to other problems. Solving it properly requires going to the root cause of the problem, which is what Bruno is chasing in that e-mail thread. /Simon
I had made some comments on the MR, but I think it's useful to keep it
all together, so I'll redo that here. At the end of the message.
... having looked into this a bit more, I agree. (more later)
I consider that out of scope for this bug, so I won't comment on that.
rules and (especially if everyone uses that) consistency.
There is one 'problem' though: it only supports git (for now?).
The ``genorig.py`` script stored the orig_date like this:
orig_date = time.strftime("%a, %d %b %Y %H:%M:%S +0000",
time.gmtime(
os.stat(os.path.join(self.dir, self.orig, 'Makefile'))
.st_mtime))
And then orig_date is used to set the --mtime parameter to tar.
That ``genorig.py`` script also had a useful comment:
# exclude_files() will change dir mtimes. Capture the original
# release time so we can apply it to the final tarball.
I don't really care which date format is used, but I do care that it's
used consistently. And if the archive is repackaged or not, the mtime
should be the same (which was the whole idea behind storing orig_date).
Similarly it shouldn't matter if the archive is created via ``uscan`` or
via a call to ``mk-origtargz`` directly.
It's indeed a(n ugly) workaround but I do think it's useful; having each
package declare which upstream file to use sounds like a very bad idea.
into too many rabbit holes, I'll settle for a decent one ;-)
And now for a review of the patch/MR itself:
First of all: thanks for a proper commit message :-)
to go with ustar and not go with any of the other archive formats.
I usually put links at the bottom of the commit message which can be
used for background/further reading, but the commit message itself
should contain all the information needed.
https://www.gnu.org/software/tar/manual/html_section/Formats.html
says about ustar: "Archive format defined by POSIX.1-1988 and later."
which I think is a really good argument (I like standards).
I also see 'posix' as archive format:
"The format defined by POSIX.1-2001 and later."
"This archive format will be the default format for future versions of
GNU tar."
POSIX.1-2001 doesn't sound too recent, so why not go with that?
There may be very valid reasons, but please describe why you choose NOT
to go with that.
That can then be used in the future to re-evaluate that choice.
btw: Is this what you mean by 'pax'?
The serverfault page describes it as POSIX.1-2001, but the Formats page
doesn't have the word 'pax'.
The upstream tar git repo does have a 'paxutils' submodule, not to be
confused with the 'pax-utils' Debian package.
Then there's also a 'pax' Debian package "Portable Archive Interchange
(cpio, pax, tar)" which sounds useful (?), but its package description
has this: "This is the MirBSD paxtar implementation supporting the
formats ... old tar, and ustar, but not the format known as pax yet" :-O
[ continuing with 'Formats' ]
"The default format for GNU tar is defined at compilation time. You may
check it by running tar --help, and examining the last lines of its
output. Usually, GNU tar is configured to create archives in ‘gnu’
format, however, a future version will switch to ‘posix’."
```sh
diederik@bagend:~$ tar --version
tar (GNU tar) 1.35
diederik@bagend:~$ tar --help | tail -n3
*This* tar defaults to:
--format=gnu -f- -b20 --quoting-style=escape --rmt-command=/usr/sbin/rmt
--rsh-command=/usr/bin/rsh
```
The ``gnu`` archive format description has:
"Format used by GNU tar versions up to 1.13.25."
I didn't see a format specification in ``debian/rules``, so it seems the
default is still ``gnu``?
It sounds to me that ``ustar`` or ``posix`` are better then ``gnu``, but
is it wise if the Debian tar package uses a different archive format
(by default) then what mk-origtargz does/will do?
The Debian tar maintainer had its last upload 4 years ago and I haven't
found any upload by its official 'uploader' ... ever, so CC-ing them
didn't sound too useful.
And then I stopped myself from going into more rabbit holes ...
Excellent.
Idem.
Why? Does this change the permissions of the files in the archive?
If so, then that sounds like a bad idea.
If it is useful, then the reasoning for doing that should be documented
with an optional link to the Guix page (?) that you used for its
justification.
Without it, this patch won't close bug 807270, but referencing that bug
in this patch seems *very* useful.
And I want to reiterate that "exclude_files() will change dir mtimes",
which IIUC makes things NOT reproducible.
You're using a mix of tabs and spaces above; please use only spaces to
match the rest of the file.
Cheers,
Diederik
Archive format selection: pax POSIX 1003.1-2001 (pax) format posix same as pax Cheers, Diederik
Forgot to mention this, but please also add a link to https://www.gnu.org/software/tar/manual/html_node/Reproducibility.html which you shared in your mail before the patch and is *really* useful! Cheers, Diederik