- Package:
- src:nocache
- Source:
- nocache
- Submitter:
- Santiago Vila
- Date:
- 2024-06-04 11:48:03 UTC
- Severity:
- normal
- Tags:
Dear maintainer: I tried to build this package in sid but it failed: -------------------------------------------------------------------------------- [...] debian/rules build-arch dh build-arch dh_update_autotools_config -a dh_autoreconf -a dh_auto_configure -a dh_auto_build -a make -j1 "INSTALL=install --strip-program=true" make[1]: Entering directory '/<<PKGBUILDDIR>>' cc -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -o cachedel cachedel.c cc -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -o cachestats cachestats.c cc -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -fPIC -c -o nocache.o nocache.c cc -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -fPIC -c -o fcntl_helpers.o fcntl_helpers.c cc -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -fPIC -c -o pageinfo.o pageinfo.c cc -g -O2 -fdebug-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -Wdate-time -D_FORTIFY_SOURCE=2 -Wl,-z,relro -Wl,-z,now -pthread -shared -Wl,-soname,nocache.so -o nocache.so nocache.o fcntl_helpers.o pageinfo.o -ldl sed 's!##libdir##!$(dirname "$0")!' <nocache.in >nocache chmod a+x nocache make[1]: Leaving directory '/<<PKGBUILDDIR>>' debian/rules override_dh_auto_test make[1]: Entering directory '/<<PKGBUILDDIR>>' ## #916415 timeout 11 ./nocache apt show coreutils 1>>/dev/null WARNING: apt does not have a stable CLI interface. Use with caution in scripts. make[1]: *** [debian/rules:21: override_dh_auto_test] Error 124 make[1]: Leaving directory '/<<PKGBUILDDIR>>' make: *** [debian/rules:10: build-arch] Error 2 dpkg-buildpackage: error: debian/rules build-arch subprocess returned exit status 2 -------------------------------------------------------------------------------- To be sure, I have tried to build the package 151 times on 8 different machines and it failed 151 times. Here are the full build logs: https://people.debian.org/~sanvila/build-logs/nocache/ A very similar failure happened here in mipsel, a release architecture: https://buildd.debian.org/status/fetch.php?pkg=nocache&arch=mipsel&ver=1.1-1&stamp=1546582253&raw=0 If you need help to reproduce this, please say so, I would gladly offer access to a system where this seems to happen all the time. Thanks.
tags 918316 + patch thanks The patch below works for me:--- a/debian/rules +++ b/debian/rules @@ -18,5 +18,5 @@ override_dh_auto_test: ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS))) # -NOCACHE_NR_FADVISE=2 dh_auto_test -v ## #916415 - timeout 11 ./nocache apt show coreutils 1>>/dev/null + timeout 60 ./nocache apt show coreutils 1>>/dev/null endif Note: I don't quite understand the purpose of the timeout. Is it really useful/required to set a timeout at all? Normally sbuild (the autobuilder program used by the build daemons) has already a built-in timeout mechanism which prevents the autobuilder to be stuck forever, and by looking at build logs from reproducible builds, I believe pbuilder has also a timeout by default. Thanks.
I get a different error here:
,----
| ## #916415
| timeout 11 ./nocache apt show coreutils 1>>/dev/null
| apt: nocache.c:148: init_mutexes: Assertion `fds_lock != NULL' failed.
| Aborted
| make[1]: *** [debian/rules:21: override_dh_auto_test] Error 134
`----
Increasing the timeout to 60 as you suggested does not help.
Cheers,
Sven
Hello, Bug #918316 in nocache reported by you has been fixed in the Git repository and is awaiting an upload. You can see the commit message below and you can check the diff of the fix at: https://salsa.debian.org/debian/nocache/commit/4cc35e3d2042b7f80bec7f31f7ed4d1fef329c75 ------------------------------------------------------------------------ rules: increase test timeout (Closes: #918316). Thanks, Santiago Vila. ------------------------------------------------------------------------ (this message was generated automatically) -- Greetings https://bugs.debian.org/918316
Hi Santiago, Thanks for the patch. I see, this issue is environment specific and seems to fail on sloe(er) machines like MIPS. In this case it is _necessary_. As you could notice from comment, this is a regression test for #916415. Timeout is required because process never exit (hangs) when test fails. Timeout here is to abort a particular test if/when it fails. It is better to fail quickly (within a minute) rather than needlessly occupy builder for an hour.--- A man does what he must - in spite of personal consequences, in spite of obstacles and dangers and pressures - and that is the basis of all human morality. -- Winston Churchill
Exit code suggests that APT is not happy hence timeout have nothing to do with that so I suspect this is unrelated to "nocache". Can you reproduce manually by "apt show coreutils"? Also, on which architecture is this? Thanks.--- You have to start with the truth. The truth is the only way that we can get anywhere. Because any decision-making that is based upon lies or ignorance can't lead to a good conclusion. -- Julian Assange, 2010
Control: clone -1 -2
Control: retitle -2 nocache.c:148: init_mutexes: Assertion `fds_lock != NULL' failed.
Control: severity -2 normal
That is unrelated to Santiago's problem, and I should have reported it
separately. Creating a new clone now, will followup when I have the
cloned bug's number.
Cheers,
Sven
[Following up on the cloned bug 918464 and dropping Santiago from CC.]
ITYM it has nothing to do with timeout. As the failed assertion comes
from nocache.c, it certainly has to do with nocache. ;-)
No, but with any program under nocache, e.g. "nocache true".
Plain amd64.
The good news is that I seem to have found the explanation for the
failed assertion. In line 147 of nocache.c we have
fds_lock = malloc(max_fds * sizeof(*fds_lock));
and malloc obviously returned NULL. With a debug printf statement I
found out that max_fds == 1073741816, with sizeof(*fds_lock) == 40 it is
not too surprising that malloc failed.
Why is max_fds so high? In the systemd changelog I found out the
following:
,----
| systemd (240-2) unstable; urgency=medium
|
| * Don't bump fs.nr_open in PID 1.
| In v240, systemd bumped fs.nr_open in PID 1 to the highest possible
| value. Processes that are spawned directly by systemd, will have
| RLIMIT_NOFILE be set to 512K (hard).
| pam_limits in Debian defaults to "set_all", i.e. for limits which are
| not explicitly configured in /etc/security/limits.conf, the value from
| PID 1 is taken, which means for login sessions, RLIMIT_NOFILE is set to
| the highest possible value instead of 512K. Not every software is able
| to deal with such an RLIMIT_NOFILE properly.
| While this is arguably a questionable default in Debian's pam_limit,
| work around this problem by not bumping fs.nr_open in PID 1.
| (Closes: #917167)
|
| -- Michael Biebl <biebl@debian.org> Thu, 27 Dec 2018 14:03:57 +0100
`----
And this sid system has an uptime of 13 days, so was booted with systemd
240-1 which explains the high RLIMIT_NOFILE. On a freshly booted
laptop, I get max_fds == 1048576 instead, and obviously malloc'ing 40
Megabytes rather than 40 Gigabytes of RAM is easily possible.
I guess I should reboot in the near future. Feel free to close the bug
if you think that dealing with a too high value of RLIMIT_NOFILE is not
possible for nocache.
Cheers,
Sven
Hi, Following a full-upgrade on two Debian Sid hosts of mine on 2024-06-02 around 21:55 UTC, I have just stumbled upon this issue. It matches the explanation provided by Sven and can be worked around by lowering the hard NOFILE rlimit, e.g. ulimit -Hn 10000 However, the fact that nocache, a program typically used to leave global memory usage untouched, triggers an OOM is particularly ironic. It would be nice if this could be fixed, either in nocache itself or by adjusting default rlimits.
Addendum: this issue was seemingly fixed upstream: https://github.com/Feh/nocache/commit/7451e161997d4282dd6b66fd1514b5b157b41f8a Therefore, this bug could be fixed by packaging nocache v1.2, tagged two years ago.