Hi! While working on the “reproducible builds” effort [1], we have noticed that perl could not be built reproducibly. The attached patches will fix that with our current experimental framework. I hope the description of each patch is enough to understand their purpose. [1]: https://wiki.debian.org/ReproducibleBuilds
Thanks, this is awesome! I only had a quick look so just a couple of notes and questions for now. Is this because of the date header in manpages? Setting the POD_MAN_DATE environment variable could/should suffice for that, I think. See debian/patches/fixes/pod_man_reproducible_date.diff I expect this needs to be made configurable for upstream to accept it. Also, it might be safer to replace __DATE__ and __TIME__ with some placeholders rather than dropping them, at least until this is upstreamed. There might well be some crazy things parsing 'perl -V' output or something like that which could choke if the lines are left out altogether. and later I assume the first is the correct one. Not sure the 'touch' part belongs in gen-patchlevel, which currently just prints to STDOUT. But I can see it would be nice to pick up the mtime while reading the patches anyway. I wonder if we could/should use the changelog date instead, though. The whole thing of writing $patchlevel_date into perlbug to see how old this perl is feels weird...
Niko Tyni: This is needed to have reproducible mtimes in data.tar and control.tar. This is done right before calling dpkg-source. I went ahead with removing the values because there were already #ifdefs. But maybe the value of cf_time should be passed through `-D` or something similar. I'm not sure what the best way is. Oops. The later is the one to pick. 'D' is incompatible with 'u'. I believe this is a matter of taste. :) Thanks for having a look,
Ah, right. Sorry about that.
A few more notes:
- the build system also embeds information about the build host, at
least the kernel version and hostname. Those need to be stripped too.
From 'perl -V':
osname=linux, osvers=3.16.0-4-amd64, archname=x86_64-linux-gnu-thread-multi
uname='linux estella 3.16.0-4-amd64 #1 smp debian 3.16.7-ckt2-1 (2014-12-08) x86_64 gnulinux '
I assume varying uname et al. isn't actively tested yet?
- I would expect some of the generated manual pages to embed the build
date, at least for patched modules like Net::SMTP. Are builds from
different days compared currently and/or are you setting POD_MAN_DATE
externally? (see #759405)
- I don't think 0003-Allow-cf_time-to-be-set-externally is needed,
as config.over can override cf_time without it AFAICS.
Sorry I'm a bit slow with this... :)
Hi Niko, [...] no, not yet. and varying hostname is only tested since last week, so most packages have not yet been tested for that. (but will be in a few months.) are there ways to "properly" fake uname or do I really need to setup something in qemu to test^wsimulate builds under different kernels? cheers, Holger
A quick search indicates that there's no separate namespace for other uname(2) information than the host name and domain name. This suggests that something like http://www.bstern.org/libuname/ is needed. I'm not aware of anything in Debian already that does that. Time for an RFP maybe :)
Hi Niko,
it builds fine but doesn't work:
jenkins@jenkins:~/u/libuname-1.0.0$ make
gcc -Wall -Werror -O2 -fPIC -c -o libuname.o libuname.c
if [ "`uname -s`" = "SunOS" ]; then \
ld -G -dy -z text -Qn -o libuname.so libuname.o; \
else \
ld -shared -fPIC -o libuname.so libuname.o; \
fi
jenkins@jenkins:~/u/libuname-1.0.0$ LD_PRELOAD=$PWD/libuname.so
LIBUNAME='Linux;bar;2.6.15;#1;Mon Feb 37 22:33:44 UTC 2006;i686;unknown' uname
-a
uname: symbol lookup error: /var/lib/jenkins/u/libuname-1.0.0/libuname.so:
undefined symbol: dlsym
cheers,
Holger
This is resolved by the attached patch.
Hi! Here's an update after rebasing my patches on 5.20.2-4. Niko Tyni: We do now test it by calling `linux64 --uname-2.6`. It will make the version look like 2.6.56-4. And indeed, this is an issue. The kernel version shows in Config.pm (`osvers`), Config_heavy.pl (`osvers`). The full uname is shown in Config_heavy.pl (in a comment, and in `myuname`), in CORE/config.h (in a comment, in `OSVERS`), and in the binaries. I'm not sure what's the best answer here. Always use 2.6.42? As in Debian we can't really know which version of the kernel the package is going to be used with, it should stay compatible with older kernels as much as possible. Another issue that surfaced now that we are doing timezone variations is that LOCALTIME_MIN and LOCALTIME_MAX gets different values depending on the value of the TZ environment variable. This shows in CORE/conf.h, in Config_heavy.pl, and in the binaries. If I read it right, `sLOCALTIME_min` and `sLOCALTIME_max` can be overloaded from `Configure`. The minimum I had on my amd64 system is with TZ=UTC-24, -62167305600. The maximum is with TZ=UTC and is 67768036191590399. It feels like a bug to have something that can be configured through an environment variable on a running system affect what gets encoded in the binary.
Hello, Thanks for the update! I noticed that you didn't include your rebased patches as attachments, however. We've now uploaded perl 5.22.0~rc2-2 to experimental, and that will be a good base on which to forward patches upstream, so if you were able to do one more rebasing that'd be excellent. Cheers, Dominic.
clone 774422 -1
retitle -1 perl: build timezone affects LOCALTIME_{MIN,MAX}
severity -1 normal
thanks
Thanks. I had a look at this and will try to get a reproducible 5.22
package into experimental soonish. It looks like the only thing that
needs upstream source changes (as opposed to configuration) is the
__DATE__/__TIME__ stuff. I understand the 'ar D' patch isn't necessary
anymore since binutils was changed.
I'll discuss at least the __DATE__ part upstream, but I think disabling
it at this phase should be good enough.
maybe we shouldn't care about those at this point.
I suspect the uname (stored as $Config{myuname}) doesn't matter much:
codesearch.debian.net only finds libcrypt-openssl-x509-perl using it
(and even that should probably use $^O instead, which gives the runtime
OS name instead of the build time one.)
As for osvers, which has much more hits, I think it should be good enough
to hardcode a version that approximates a ~current Debian stable kernel.
My current candidate for an override in config.debian is this monstrosity:
myhostname=localhost
case "$osname" in
linux)
osvers=3.16.0
osdesc="#1 smp debian $osvers"
os=gnulinux
;;
gnu)
osvers=0.6
osdesc="gnu-mach"
os=gnu
;;
gnukfreebsd)
osvers=9.0
osdesc="#0"
os=gnukfreebsd
;;
esac
if [ -n "$osdesc" ]; then
machine_uname=$(uname -m | tr '[A-Z]' '[a-z]' | sed -e "s,['/],,g")
myuname="$osname $myhostname $osvers $osdesc $machine_uname $os "
fi
which probably is too much work for little gain.
Not sure if "leaking" uname -m output is appropriate, but making
that constant between architectures doesn't feel right either.
This feels like a bug to me too, and should be handled separately.
I'm cloning this and will export TZ=UTC in debian/rules, at least
for now.
Control: found -1 5.30.0-8 The TZ=UTC part was accidentally dropped in the build system debhelper conversion for 5.30 packaging. This resulted in a reproducibility regression that Holger pointed out to me on IRC (thanks!). I'll re-instate TZ=UTC in 5.30.0-9 or so, but clearly the underlying issue remains.
Hi! Just noticed this change from the changelog. :) UTC is not really a proper timezone specification, the format requires an offset, so here it would be UTC0 (see «man timezone»). Thanks, Guillem
Oh! Thanks for the note. This is probably a very common misconception. I think the reproducible builds docs have advised setting TZ=UTC in the past, and I see https://reproducible-builds.org/docs/timezones/ mentions it currently. Also, codesearch.debian.net reports 95 packages matching TZ=UTC but only two match TZ=UTC[0-9]. Time for a mass bug filing? :)
Hi Niko,
I'm struggling to see the practical problem with having the timezone
vary LOCALTIME_{MIN,MAX} (other than reproducibility, which AIUI has
already been addressed). I don't agree with the starting point that
an environment variable shouldn't be able to influence the contents
of the binary (this is clearly a very common and necessary pattern).
Could you elaborate on your reasoning for keeping this bug open?
Thanks
Dominic
Control: submitter -1 !
Control: severity -1 minor
Control: tag -1 upstream
I'm not aware of any practical problems here. I suspect nothing
uses $Config{sLOCALTIME_max} et al.
Reproducibility has been addressed in a Debian-specific way. Ideally,
it would be fixed upstream so that the build result would be reproducible
regardless of the build timezone (which we are currently overriding.)
I think it depends on the environment variable and its main purpose.
Something like BUILD_BZIP2 does and should influence the result, that's
what it's there for. But what's the use for encoding the local timezone
into the binaries? Binaries can be copied between hosts in different time
zones (our buildd results certainly are), users connect to hosts from
different time zones, and even hosts (think laptops) can move between
time zones.
I don't really mind closing this, it's just a minor detail and I obviously
haven't got around to doing anything about it so far. But I do think
the current TZ=UTC solution is more a workaround than a fix.
I'm updating the metadata at least, feel free to close if you're not
convinced :)