- Package:
- dracut-install
- Source:
- dracut-install
- Description:
- dracut is an event driven initramfs infrastructure (dracut-install)
- Submitter:
- Tj
- Date:
- 2025-08-28 09:01:08 UTC
- Severity:
- normal
- Tags:
debvm in the last week is failing armhf/armel build tests because virtio_blk kernel module is not installed in the initrd.img. This was reported and help requested by Helmut Grohne on IRC #debian-devel. Builds a week ago succeed [0] but latest builds [1] fail. I've been diagnosing the issue and began by focusing on any packages being installed for the test. That is kernel, udev, initramfs-tools, and dracut-install. I reproduced the issue on a Bookworm amd64 host but Helmut was unable to on an Unstable amd64 host. After instrumenting initramfs-tools with additional log messages it confirmed that the correct list of modules, and module directories, is being passed to dracut-install. I tried adding --debug to gain more insight but it seems to ignore it or not hit any code paths that use log_debug(). However a log from update-initramfs -vu or executing dracut-install directly in the armhf chroot both show it ignores =drivers/XXX entries in the module list. A simple reproducer is: $ apt-get install debvm $ git clone https://salsa.debian.org/helmutg/debvm.git $ cd debvm $ tests/create-and-run.sh armhf unstable 2>&1 | /usr/bin/tee c-a-r.01.log $ # kill the 'stuck' qemu guest process $ mkdir debvm.fs $ ldev="$( /usr/sbin/losetup --find --show test.ext4 )" $ sudo mount $ldev ./debvm.fs $ sudo /usr/sbin/chroot ./debvm.fs/ /usr/lib/dracut/dracut-install --debug -D /var/tmp/ --kerneldir /lib/modules/6.10.6-armmp --firmwaredirs /lib/firmware/updates/6.10.6-armmp:/lib/firmware/updates:/lib/firmware/6.10.6-armmp:/lib/firmware --debug -v -o -m 8139cp =drivers/block acenic dracut-install: mkdir '/var/tmp/usr' dracut-install: mkdir '/var/tmp/usr/lib' dracut-install: ln -s 'usr/lib' '/var/tmp/lib' dracut-install: mkdir '/var/tmp/lib/modules' dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp' dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel' dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers' dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net' dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet' dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/realtek' dracut-install: cp '/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/realtek/8139cp.ko.xz' '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/realtek/8139cp.ko.xz' dracut-install: cp '/lib/modules/6.10.6-armmp/kernel/drivers/net/mii.ko.xz' '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/mii.ko.xz' dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/alteon' dracut-install: cp '/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/alteon/acenic.ko.xz' '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/alteon/acenic.ko.xz' dracut-install: Missing firmware acenic/tg2.bin for kernel module acenic dracut-install: Missing firmware acenic/tg1.bin for kernel module acenic Note how the =drivers/block is apparently ignored here. [0] https://salsa.debian.org/helmutg/debvm/-/jobs/6127012 [1] https://salsa.debian.org/helmutg/debvm/-/jobs/6164782
Results of further diagnosis. First, discovered that -v overrides --debug: removing -v from the command line allowed debug messages and removing -o means any missing modules causes an error, so now the small reproducer reports: dracut-install: Handle module '=drivers/block' dracut-install: Handling =drivers/block dracut-install: Ignoring /lib/modules/6.10.6-armmp/extra/drivers/block dracut-install: Ignoring /lib/modules/6.10.6-armmp/kernel/drivers/block dracut-install: Ignoring /lib/modules/6.10.6-armmp/kernel/drivers/block dracut-install: Ignoring /lib/modules/6.10.6-armmp/updates/drivers/block dracut-install: ERROR: installing '=drivers/block' The "Ignoring" message is generated in install_modules() when an FTSENT is not a file nor a symbolic link. There are 13 files and sub-directories in the directory but we only see 2 messages for the module's path. Since the directory is visited twice by the fts* functions, once for preorder and again for postorder, that suggests something interesting is going on. I note fts_open( ... FTS_NOSTAT ...) and confirm with strace that no stat*() functions are called on the contents of the directory. This would imply that the FTSENT fts_info == FTS_NSOK and that would explain the results being seen.
I've also tried to reproduce this problem, and can report a few interesting factoids. My test environment is an unstable arm64 VM, with qemu-arm-static setup to execute for armhf binaries. qemu-arm-static comes from bookworm; the running kernel too (6.1.0-23-arm64). The test chroots were created fresh today. On this same VM, I can reproduce the problem, when the chroot resides on ext4. When the chroot is created and resides on xfs, the problem is not reproducible. I've added some debugging output to dracut-install, but cannot see any entries with FTS_NSOK. debug messages with newly added debug messages: dracut-install: Handle module '=drivers/block' dracut-install: Handling =drivers/block dracut-install: path1: /lib/modules/6.10.6-armmp/extra/drivers/block dracut-install: path2: /lib/modules/6.10.6-armmp/kernel/drivers/block dracut-install: path3: /lib/modules/6.10.6-armmp/updates/drivers/block dracut-install: Checking /lib/modules/6.10.6-armmp/extra/drivers/block dracut-install: Ignoring /lib/modules/6.10.6-armmp/extra/drivers/block fts_info=10 dracut-install: Checking /lib/modules/6.10.6-armmp/kernel/drivers/block dracut-install: Ignoring /lib/modules/6.10.6-armmp/kernel/drivers/block fts_info=1 dracut-install: Checking /lib/modules/6.10.6-armmp/kernel/drivers/block dracut-install: Ignoring /lib/modules/6.10.6-armmp/kernel/drivers/block fts_info=6 dracut-install: Checking /lib/modules/6.10.6-armmp/updates/drivers/block dracut-install: Ignoring /lib/modules/6.10.6-armmp/updates/drivers/block fts_info=10 It looks like fts_read just doesn't see anything inside kernel/drivers/block. Chris
user debian-arm@lists.debian.org usertag 1079443 time-t thanks Hi debian-arm, in case you don't know yet, here is a bug affecting dracut-install on armhf (and probably armel), causing the built initramfs to lack a lot of kernel modules. Probably makes a lot of things unbootable. It looks like the bug is somewhere in the fts_* glibc routines or maybe deeper down. Tj and I think t64 might be a source of the problem. On affected systems, this reproducer will output a single line (but it should give about 13 or so): $ pax -w /usr/lib/modules/6.10.6-armmp/kernel/drivers/block | tar -t (Assuming you have linux-image-6.10.6-armmp installed.) As noted in my earlier reply, might only happen on ext4. Good luck, Chris
I've managed to set up a gdb-multilib debug session and have been
single-stepping through the glibc fts_* code that seems to be affected.
As Chris has found that xfs file-system doesn't seem to be affected but
ext4 is (the host file-system, not the file-system created by debvm)
that confuses the issue somewhat since the reproducers also work on a
several different kernel versions. I'm using a mainline build of
v6.10.6 amd64; Chris reported 6.1.0-23-arm64.
In my debug case the setup is in two terminals. I execute dracut-install
directly using qemu (rather than in the chroot) in order to be able to
do:
debian/debvm$ base=$PWD/debvm.fs; qemu-arm-static -g 9999 -L $base $base/usr/lib/dracut/dracut-install -D $base/var/tmp/ --kerneldir $base/lib/modules/6.10.6-armmp --firmwaredirs $base/lib/firmware/updates/6.10.6-armmp:$base/lib/firmware/updates:$base/lib/firmware/6.10.6-armmp:$base/lib/firmware --debug -m =drivers/block
I fetched the glibc source so gdb can use it:
$ cd /srv/NAS/Sunny/SourceCode/glibc
$ dget -x http://deb.debian.org/debian/pool/main/g/glibc/glibc_2.39-7.dsc
In the second terminal, in the dracut-ng base directory, I do:
dracut-ng$ gdb-multiarch -ex "directory /srv/NAS/Sunny/SourceCode/glibc/glibc-2.39" -ex "file /srv/NAS/Sunny/SourceCode/debian/debvm/debvm.fs/usr/lib/dracut/dracut-install"
-ex "set sysroot /srv/NAS/Sunny/SourceCode/debian/debvm/debvm.fs" -ex "target remote localhost:9999" -ex "break install_modules" -ex "break fts_open" -ex "break fts_build if sp->fts_path[2] != 'y'"
That last conditional breakpoint is to avoid it stopping for all nodes at
or under "/sys/devices/platform".
Then using 'c' to fast-forward to the useful points, looking for the
".../kernel/drivers/block" in sp->fts_path
(gdb) p *sp
$8 = {fts_cur = 0x400160f8, fts_child = 0x0, fts_array = 0x0, fts_dev = 1792, fts_path = 0x40012450 "/srv/NAS/Sunny/SourceCode/debian/debvm/debvm.fs/lib/modules/6.10.6-armmp/kernel/drivers/block", fts_rfd = 0,
fts_pathlen = 4352, fts_nitems = 0, fts_compar = 0x0, fts_options = 15}
From here the loop that I believe should be iterating over the files and
sub-dirs via calling __readdir() does zero iterations due to __readdir()
presumably returning NULL and setting an error code.
I've not yet traced into that; that's for another day I think.
The loop:
https://codesearch.debian.net/show?file=glibc_2.39-7%2Fio%2Ffts.c&line=723#L723
Further debugging leads to the getdents64 syscall returning -1 at https://sources.debian.org/src/glibc/2.39-7/sysdeps/unix/sysv/linux/getdents.c/?hl=56#L58 I'm including a dump of the gdb session including my attempts to examine values as it went. Due to the compiler optimsing the code gdb jumps around a bit and it is confusing since some apparent statement executions aren't reached.
I've used qemu-arm-static -d strace ... to attempt to analyse the
syscall(s) and the file is attached. The getdent64 is near the end.
The SIGTRAPs are gdb single-stepping breakpoints so can be ignored.
Looking at the gdb tracing again the syscall is not failing but the
function __getdents() is reaching:
return INLINE_SYSCALL_ERROR_RETURN_VALUE (EOVERFLOW);
which is in sysdeps/unix/sysv/linux/sysdep.h:
/* Set error number and return -1. A target may choose to return the
internal function, __syscall_error, which sets errno and returns -1.
We use -1l, instead of -1, so that it can be casted to (void *). */
#define INLINE_SYSCALL_ERROR_RETURN_VALUE(err) \
({ \
__set_errno (err); \
-1l; \
})
and we have:
/usr/include/asm-generic/errno.h:58:#define EOVERFLOW 75 /* Value too large for defined data type */
So it isn't clear right now why the error is not being handled and where
that should occur.
I think this will need re-assigning to another package but not clear
which as yet.
Thanks to Chris's investigations in the last few hours he's identified a
couple of bug reports that may be the cause or related; they do seem to
shed some needed insights from glibc experts onto this and contain quite
a few patch attempts.
From my initial reading of both and considering our experience here with
the debvm arch-test succeeding a week ago but failing now I wonder if
this is simply because the host running the CI jobs previously had
offsets that fitted inside 32 bits but later exceeded that and so an
overflow occurred (since host is returning 64 bits but the emulated
qemu-arm 32-bit glibc throws away the top 32 bits).
glibc <> qemu-user (going on since 2018):
[2.28 Regression]: New getdents{64} implementation breaks qemu-user
https://sourceware.org/bugzilla/show_bug.cgi?id=23960
kernel ext4 <> glibc:
Ext4 64 bit hash breaks 32 bit glibc 2.28+
https://bugzilla.kernel.org/show_bug.cgi?id=205957
After considerable discussion and investigation on IRC #debian-devel on 20240824 after 1000 UTC the consensus is this is a glibc issue with the (rarely used) fts_* functions calling __getdents() instead of getdents().
mjt identified that as well as this report's armhf emulation on 64 bit host he can reproduce the apparent problem with i386: Quotes: see comment #12 in [0] it looks like the prob is in fts in glibc, who calls __getdents() instead of __getdents64 gdb'ing this on i386 also leads to __readdir and not __readdir64 if it called getdents() instead of __getdents(), it would be aliased to getdents64 with LFS [0] https://sourceware.org/bugzilla/show_bug.cgi?id=23960#c12
To ensure we capture all relavent info I'm copying here some test results Chris produced using a custom-written executable. Using emulation: $ sudo chroot /mnt/e4/armhf /pdents-armhf fd = 3 dp = 0x403190 entry 0 ino 10302 off 1335313586018546964 name ublk_drv.ko.xz dp = 0x4031b8 entry 1 ino 10298 off 1441466563022220210 name nbd.ko.xz dp = 0x4031d8 entry 2 ino 10304 off 2208640688691778334 name xen-blkback dp = 0x4031f8 entry 3 ino 10216 off 2400073000029322655 name .. dp = 0x403210 entry 4 ino 10301 off 2568612365796029161 name rbd.ko.xz dp = 0x403230 entry 5 ino 10295 off 2642141484322030168 name loop.ko.xz dp = 0x403250 entry 6 ino 10292 off 3028993804646130172 name brd.ko.xz dp = 0x403270 entry 7 ino 10307 off 3830214992999727188 name zram dp = 0x403288 entry 8 ino 10290 off 3870617626237255389 name aoe dp = 0x4032a0 entry 9 ino 10303 off 4740941134542001648 name virtio_blk.ko.xz dp = 0x4032c8 entry 10 ino 10293 off 5108466411937333823 name drbd dp = 0x4032e0 entry 11 ino 10296 off 6213628172795130112 name mtip32xx dp = 0x403300 entry 12 ino 10306 off 7520475764088490632 name xen-blkfront.ko.xz dp = 0x403328 entry 13 ino 10289 off 8124502982319486926 name . dp = 0x403340 entry 14 ino 10299 off 9223372036854775807 name null_blk dp = 0x403360 entry 15 ino 0 off 0 name --- fts: /lib/modules/6.10.6-armmp/kernel/drivers/block fts: /lib/modules/6.10.6-armmp/kernel/drivers/block --- Direct: $ ./pdents fd = 3 dp = 0xaaaae272e2a0 entry 0 ino 10302 off 1335313586018546964 name ublk_drv.ko.xz dp = 0xaaaae272e2c8 entry 1 ino 10298 off 1441466563022220210 name nbd.ko.xz dp = 0xaaaae272e2e8 entry 2 ino 10304 off 2208640688691778334 name xen-blkback dp = 0xaaaae272e308 entry 3 ino 10216 off 2400073000029322655 name .. dp = 0xaaaae272e320 entry 4 ino 10301 off 2568612365796029161 name rbd.ko.xz dp = 0xaaaae272e340 entry 5 ino 10295 off 2642141484322030168 name loop.ko.xz dp = 0xaaaae272e360 entry 6 ino 10292 off 3028993804646130172 name brd.ko.xz dp = 0xaaaae272e380 entry 7 ino 10307 off 3830214992999727188 name zram dp = 0xaaaae272e398 entry 8 ino 10290 off 3870617626237255389 name aoe dp = 0xaaaae272e3b0 entry 9 ino 10303 off 4740941134542001648 name virtio_blk.ko.xz dp = 0xaaaae272e3d8 entry 10 ino 10293 off 5108466411937333823 name drbd dp = 0xaaaae272e3f0 entry 11 ino 10296 off 6213628172795130112 name mtip32xx dp = 0xaaaae272e410 entry 12 ino 10306 off 7520475764088490632 name xen-blkfront.ko.xz dp = 0xaaaae272e438 entry 13 ino 10289 off 8124502982319486926 name . dp = 0xaaaae272e450 entry 14 ino 10299 off 9223372036854775807 name null_blk dp = 0xaaaae272e470 entry 15 ino 0 off 0 name --- fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/ublk_drv.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/nbd.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/xen-blkback fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/xen-blkback/xen-blkback.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/xen-blkback fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/rbd.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/loop.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/brd.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/zram fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/zram/zram.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/zram fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/aoe fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/aoe/aoe.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/aoe fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/virtio_blk.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/drbd fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/drbd/drbd.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/drbd fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/mtip32xx fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/mtip32xx/mtip32xx.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/mtip32xx fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/xen-blkfront.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/null_blk fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/null_blk/null_blk.ko.xz fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/null_blk fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block ---
fts_* functions, and fts_read() -> fts_build() end up calling #if !_DIRENT_MATCHES_DIRENT64 __readdir_unlocked() So follow this back to see why it is (not) set (on armhf) #if defined __OFF_T_MATCHES_OFF64_T && defined __INO_T_MATCHES_INO64_T /* Inform libc code that these two types are effectively identical. */ # define _DIRENT_MATCHES_DIRENT64 1 #else # define _DIRENT_MATCHES_DIRENT64 0 #endif #if defined __LP64__ || (__TIMESIZE == 64 && __WORDSIZE == 32) /* Tell the libc code that off_t and off64_t are actually the same type for all ABI purposes, even if possibly expressed as different base types for C type-checking purposes. */ # define __OFF_T_MATCHES_OFF64_T 1 /* Same for ino_t and ino64_t. */ # define __INO_T_MATCHES_INO64_T 1 #define __TIMESIZE 32 #define __TIMESIZE 64 #define __WORDSIZE 32 #define __WORDSIZE 64 Debian 2.39-7 build log: $ getbuildlog glibc 2.39-7 armhf ... echo -n "Build started: " ; date --rfc-2822; \ echo "---------------"; \ cd build-tree/armhf-libc && \ CC="arm-linux-gnueabihf-gcc-13 -U_FILE_OFFSET_BITS -U_TIME_BITS" \ CXX="arm-linux-gnueabihf-g++-13 -U_FILE_OFFSET_BITS -U_TIME_BITS" \ ... and gcc's options have early in their list: #define __TIMESIZE 32
This seems to confirm the cause: debvm$ sudo chroot debvm.fs apt-get install libc6-dev debvm$ grep -rn 'define .*TIMESIZE' debvm.fs/usr/include debvm.fs/usr/include/arm-linux-gnueabihf/bits/timesize.h:20:#define __TIMESIZE 32
control: forcemerge 916276 1079443 Hi This bug keeps coming, but porters do not work on getting it fixed upstream. My position explained in the two other merged bugs still stands. Given it only affects the qemu-user case, I do not want to take any risk applying a patch that has not been reviewed and merged upstream. If we end up "missing" files in the non qemu-user case, it might have some security implications. Therefore, 32-bit porters, please work on providing a patch to upstream and get it merged. I'll then backport it to the debian package. Regards Aurelien
the fts_* functions are calling non-LFS __readdir. This will also fail on large inode numbers, even without qemu.
control: unmerge 1079443 control: retitle 1079443 fts_* calling non-LFS __readdir Ok, then I got misled by the earlier messages in that bug that pointed the same upstream bugs. Unmerging them, and retitling because the existing title is also misleading. As you seems to have investigated seems more than me could you please take care of reporting the bug in the upstream bugzilla? A simple reproducer would be ideal. Regards Aurelien
Any news on that? Regards Aurelien