glib2.0 fails to build from source on amd64. This happens both on the buildd https://buildd.debian.org/status/fetch.php?pkg=glib2.0&arch=amd64&ver=2.54.2-2&stamp=1513222982&raw=0 and on the reproducible infrastructure https://tests.reproducible-builds.org/debian/logs/unstable/amd64/glib2.0_2.54.2-2.build2.log.gz. Though reproducible experiences more failures (including gwakeup and gmenumodel). It seems unlikely that this fixes itself somehow. Helmut
Control: clone -1 -2 Control: clone -1 -3 Control: clone -1 -4 Control: clone -1 -5 Control: retitle -1 glib2.0: FTBFS on amd64 buildd: gdbus-peer test: assertion 'source->ref_count > 0' failed Control: retitle -2 glib2.0: sometimes FTBFS on reproducible-builds: gwakeup, gwakeup-fallback tests terminated by SIGALRM Control: found -2 2.50.3-2 Control: found -2 2.54.1-1 Control: retitle -3 glib2.0: sometimes FTBFS on reproducible-builds: gdbus-threading test: assertion failed (elapsed_msec < 8000): (8220 < 8000) Control: found -3 2.50.3-2 Control: retitle -4 glib2.0: sometimes FTBFS on reproducible-builds: gmenumodel test: assertion failed (items_changed_count == 1): (0 == 1) Control: retitle -5 glib2.0: sometimes FTBFS on reproducible-builds: tar: ./usr/share/locale/en_??/LC_MESSAGES/glib20.mo/: Cannot savedir: Not a directory Control: found -5 2.54.1-1 I don't think these have the same root cause. r-b doesn't seem to get the assertion failure seen on the buildd, but it does get different test failures. Please use separate bugs for what appear to be separate issues. Let's use #884654 for the failure on the buildd, and its clones for the failures on reproducible-builds. All of these might need to be downgraded to non-RC if they can't be reproduced elsewhere or understood, but I'll leave them all RC for now. We are unlikely to be able to get anywhere with most of these test failures unless someone can reproduce them in an environment that leaves core dumps, or at least captures backtraces. The failure on the buildd is new, although I think I might have seen it before in local testing (but never reproducibly). It's probably some rarely-hit race condition? Clone -2 is not new. It has also happened (more often for gwakeup-fallback) in earlier versions: https://tests.reproducible-builds.org/debian/rbuild/stretch/amd64/glib2.0_2.50.3-2.rbuild.log https://tests.reproducible-builds.org/debian/rbuild/buster/amd64/glib2.0_2.54.1-1.rbuild.log https://tests.reproducible-builds.org/debian/logs/buster/armhf/glib2.0_2.54.1-1.build2.log.gz Presumably it doesn't happen on the real buildds because the reproducible build workers are more heavily-loaded, or have more or fewer CPUs, or some other factor. It isn't the build vs. build2 variation, because this test failure has been seen in both. Clone -3 can also be seen in https://tests.reproducible-builds.org/debian/rbuild/buster/amd64/glib2.0_2.54.1-1.rbuild.log Clone -4 might be new, or just rare. Clone -5 (a build failure inside dpkg-builddeb, not a test failure) I don't know what is going on, and it doesn't seem particularly likely to be a GLib bug - GLib just puts files in a tree like any other package, so I'm not sure how it would trigger this particular failure. It can be seen in these logs: https://tests.reproducible-builds.org/debian/rbuild/buster/i386/glib2.0_2.54.1-1.rbuild.log https://tests.reproducible-builds.org/debian/rbuild/unstable/armhf/glib2.0_2.54.2-2.rbuild.log (not build2, so we presumably can't blame disorderfs either). smcv
tar and dpkg maintainers: does this look at all familiar to you, or
can you think of anything that GLib might be doing strangely with its
translations that would somehow make tar think it needed to treat the
regular file glib20.mo as a directory? It's an ordinary GNU gettext .gmo
file, with nothing GLib-specific that I'm aware of, and in particular
File::StripNondeterminism was able to open and rewrite it like a
regular file.
This is on the reproducible-builds infrastructure, so if there are any
oddities implied by that, they apply here (for example I think it's a
tmpfs - although I've been able to build GLib in a large tmpfs on my
laptop without problems).
I don't know whether it's significant or just coincidence that the two
languages affected in the failing builds that I've seen are the only two
of the form en_??.
Unfortunately this is pbuilder, not sbuild, so the log doesn't list the
versions of tar and dpkg used.
The most relevant bits of the (armhf + all) build log (the i386 + all
failure is similar, but en_GB is the one that fails there):
make[4]: Entering directory '/build/1st/glib2.0-2.54.2/debian/build/deb/po'
mkdir -p /build/1st/glib2.0-2.54.2/debian/tmp/usr/share; \
catalogs='af.gmo am.gmo an.gmo ar.gmo as.gmo ast.gmo az.gmo be.gmo be@latin.gmo bg.gmo bn.gmo bn_IN.gmo bs.gmo ca.gmo ca@valencia.gmo cs.gmo cy.gmo da.gmo de.gmo dz.gmo el.gmo en_CA.gmo en_GB.gmo en@shaw.gmo eo.gmo es.gmo et.gmo eu.gmo fa.gmo fi.gmo fr.gmo fur.gmo ga.gmo gd.gmo gl.gmo gu.gmo he.gmo hi.gmo hr.gmo hu.gmo hy.gmo id.gmo is.gmo it.gmo ja.gmo ka.gmo kk.gmo kn.gmo ko.gmo ku.gmo lt.gmo lv.gmo mai.gmo mg.gmo mk.gmo ml.gmo mn.gmo mr.gmo ms.gmo nb.gmo nds.gmo ne.gmo nl.gmo nn.gmo oc.gmo or.gmo pa.gmo pl.gmo ps.gmo pt.gmo pt_BR.gmo ro.gmo ru.gmo rw.gmo si.gmo sk.gmo sl.gmo sq.gmo sr.gmo sr@latin.gmo sr@ije.gmo sv.gmo ta.gmo te.gmo tg.gmo th.gmo tl.gmo tr.gmo ug.gmo tt.gmo uk.gmo vi.gmo wa.gmo xh.gmo yi.gmo zh_CN.gmo zh_HK.gmo zh_TW.gmo'; \
for cat in $catalogs; do \
cat=`basename $cat`; \
case "$cat" in \
*.gmo) destdir=/usr/share/locale;; \
*) destdir=/usr/lib/arm-linux-gnueabihf/locale;; \
esac; \
lang=`echo $cat | sed 's/\.gmo$//'`; \
dir=/build/1st/glib2.0-2.54.2/debian/tmp$destdir/$lang/LC_MESSAGES; \
mkdir -p $dir; \
if test -r $cat; then \
/usr/bin/install -c -m 644 $cat $dir/glib20.mo; \
echo "installing $cat as $dir/glib20.mo"; \
else \
/usr/bin/install -c -m 644 ../../../../po/$cat $dir/glib20.mo; \
echo "installing ../../../../po/$cat as" \
"$dir/glib20.mo"; \
fi; \
...
installing ../../../../po/af.gmo as /build/1st/glib2.0-2.54.2/debian/tmp/usr/share/locale/af/LC_MESSAGES/glib20.mo
... and the same for many more languages ...
dh_strip_nondeterminism
...
Normalized debian/libglib2.0-data/usr/share/locale/dz/LC_MESSAGES/glib20.mo
... and the same for many more languages ...
...
dh_builddeb
...
dpkg-deb: building package 'libglib2.0-data' in '../libglib2.0-data_2.54.2-2_all.deb'.
tar: ./usr/share/locale/en_CA/LC_MESSAGES/glib20.mo/: Cannot savedir: Not a directory
tar: Exiting with failure status due to previous errors
...
dpkg-deb: error: tar -cf subprocess returned error exit status 2
dh_builddeb: dpkg-deb --build debian/libglib2.0-data .. returned exit code 2
Thanks,
smcv
Control: reassign 884662 jenkins.debian.org Control: retitle 884662 reproducible-builds.org: regular files sometimes treated as directories Control: severity 884662 normal Control: user jenkins.debian.org@packages.debian.org Control: usertags 884662 = reproducible This seems to be a symptom of some more general problem on the reproducible-builds builders - I would guess it's either the (FUSE?) filesystem, or a LD_PRELOAD hack that intercepts stat(), like fakeroot does. Build logs indicate that regular files are sometimes treated as directories by install(1), resulting in the build being incomplete and unreproducible even in cases where it doesn't FTBFS: https://tests.reproducible-builds.org/debian/rb-pkg/unstable/i386/glib2.0.html--- b1/build.log 2017-12-21 23:47:26.259266661 +0000 +++ b2/build.log 2017-12-22 00:01:34.693734003 +0000 @@ -14556,6 +14571,7 @@ /usr/bin/install -c -m 644 ./html/api-index-full.html /usr/bin/install -c -m 644 ./html/ch01s02.html /usr/bin/install -c -m 644 ./html/chapter-gobject.html +/usr/bin/install: omitting directory './html/chapter-gobject.html' /usr/bin/install -c -m 644 ./html/chapter-gtype.html /usr/bin/install -c -m 644 ./html/chapter-intro.html /usr/bin/install -c -m 644 ./html/chapter-signal.html @@ -14599,7 +14615,6 @@ /usr/bin/install -c -m 644 ./html/left.png /usr/bin/install -c -m 644 ./html/pr01.html /usr/bin/install -c -m 644 ./html/pt01.html -/usr/bin/install: omitting directory './html/pt01.html' /usr/bin/install -c -m 644 ./html/pt02.html /usr/bin/install -c -m 644 ./html/pt03.html /usr/bin/install -c -m 644 ./html/right-insensitive.png (In a correct build, all of those files are regular files produced by gtk-doc, and none are directories or symlinks.) I am not aware of any reason why gtk-doc would produce ./html/chapter-gobject.html or ./html/pt01.html that are sometimes a regular file and sometimes a directory, and I've never seen this happen on the production buildds, so I suspect this might be some issue with the filesystem or LD_PRELOADs used on the reproducible builders non-deterministically producing incorrect stat() results that make install(1) and tar(1) do the wrong thing. What filesystem is used for the build, and which LD_PRELOAD hacks are applied? The next upload of glib2.0 is going to have Rules-Requires-Root: no, which will mitigate this if it's a problem with the implementation of fakeroot used on these builders (but I'm not going to upload that until after Christmas unless glib2.0 develops a new RC bug, because we need to let the current version migrate so it won't block the rest of GNOME). Regards, smcv
Hi Simon, we use pbuilder with eatmydata on tmpfs (on amd64, i386 and arm64, only on armel we build on ext3), iirc, but I believe I do ;) we don't use disorderfs, despite we want to.
That can't be the only thing that might be intercepting stat(), because
one of the various implementations of fakeroot is also visible in glib2.0
build logs. Do you know which one it is - original fakeroot, fakeroot-ng,
pseudo, proot (probably not that one since it doesn't register itself
as a fakeroot alternative), or something else?
smcv
Actually, we are using tmpfs on amd64 and arm64, and there we don't use eatmydata, whereas i386 and armhf are building on regular ext[34] file systems with eatmydata. regular fakeroot (from the 'fakeroot' package). The pbuilder version we use is the one in stretch, so it doesn't support R³ (yet, I do plan on backporting it to stretch-bpo), therefore adding R³ won't do anything special for now.
Hi! Nothing I've seen before, no. So I tried to invoke tar with some paths via -T (which is what dpkg-deb is using) with a final ‘/’ for a filename and that does not get handled like a directory. Skimmed over tar's code, which is the one failing here, and didn't see anything obvious. So without further analysis this does smell like a problem in the repro environment, one of the nested fakers, filesystem or similar, or a combination of those perhaps. Thanks, Guillem
build logs: https://buildd.debian.org/status/fetch.php?pkg=pgagent&arch=alpha&ver=4.0.0-7&stamp=1602009768&raw=0 Dropping cluster 13/regress ... ### End 13 installcheck ### make[1]: Leaving directory '/<<PKGBUILDDIR>>' dh_strip_nondeterminism -a dh_compress -a dh_fixperms -a dh_missing -a dh_dwz -a -a Can't opendir(debian/pgagent/usr/share/doc/pgagent/changelog.Debian.gz): Not a directory at /usr/bin/dh_dwz line 119. https://buildd.debian.org/status/fetch.php?pkg=pgagent&arch=sparc64&ver=4.0.0-7&stamp=1601991893&raw=0 It has also been seen on amd64. PG extension packages have started to run tests at build time, which is done via pg_virtualenv. Internally, for creating the temporary PostgreSQL server instance, LD_PRELOAD is unset, and that seems to be what confuses the "dh" and dpkg-buildpackage processes that share the same fakeroot instance. jwilk did some debugging (thanks!) and came up with this simple recipe: $ fakeroot sh -c 'mkdir foo; env -u LD_PRELOAD rmdir foo; touch bar; stat bar | grep directory' Size: 0 Blocks: 0 IO Block: 4096 directory So, if "foo" is removed without fakeroot knowing, the "bar" file is reported as a directory. (It doesn't get it wrong for me, it depends on inode numbers being recycled and similar.) jwilk also noted that glib-2.0's debian/rules unsets LD_PRELOAD for the test suite too which strengthens the evidence that fakeroot is to blame. I'll see to move the PG extension packages to `R³: no`, that seems to be the best option in my case. Christoph
Control: reassign -1 fakeroot 1.22-1 Control: retitle -1 fakeroot: regular files sometimes treated as directories when they are removed without fakeroot knowing Control: affects -1 jenkins.debian.org yay, at least we can't say it's a r-b infra problem only! \o/ Thank you for providing such simple reproducer! The fact that it depends on inode number recylcing also expains why we saw it more often in tests.r-b.o, since there we have many more concurrent builds and as such more writes. As such, I'm finally reasigning this bug to fakeroot, thank you to all you involved in debugging the matter!
FTR, we did that because it confuses unit tests that talk to each other
via D-Bus (I think the in-band communication based on geteuid() was faked
and said the client was uid 0, but the SCM_CREDENTIALS message still told
the truth and said it was uid > 0, making authentication fail). If it
avoids other fakeroot issues then that's just a happy coincidence.
smcv
Re: Mattia Rizzolo Thanks! Fwiw there is some other issue that looks similar but is afaict not related to fakeroot because it persists with RRR:no. Debhelper thinks some symlinks in debian/ were executable files even when they point to plain files: https://salsa.debian.org/salsa-ci-team/pipeline/-/issues/177 https://salsa.debian.org/postgresql/postgresql/-/jobs/1340632 If someone has an idea there that would connect the missing dots I'd be very happy. Christoph