#1079443 dracut-install ... -m =drivers/XXX is ignored

Package:
dracut-install
Source:
dracut-install
Description:
dracut is an event driven initramfs infrastructure (dracut-install)
Submitter:
Tj
Date:
2025-08-28 09:01:08 UTC
Severity:
normal
Tags:
#1079443#5
Date:
2024-08-23 10:50:59 UTC
From:
To:
debvm in the last week is failing armhf/armel build tests because
virtio_blk kernel module is not installed in the initrd.img.
This was reported and help requested by Helmut Grohne on IRC #debian-devel.

Builds a week ago succeed [0] but latest builds [1] fail.

I've been diagnosing the issue and began by focusing on any packages
being installed for the test. That is kernel, udev, initramfs-tools, and
dracut-install.

I reproduced the issue on a Bookworm amd64 host but Helmut was unable to
on an Unstable amd64 host.

After instrumenting initramfs-tools with additional log messages it
confirmed that the correct list of modules, and module directories, is
being passed to dracut-install.

I tried adding --debug to gain more insight but it seems to ignore it or
not hit any code paths that use log_debug().

However a log from update-initramfs -vu or executing dracut-install
directly in the armhf chroot both show it ignores =drivers/XXX entries
in the module list.

A simple reproducer is:

$ apt-get install debvm
$ git clone https://salsa.debian.org/helmutg/debvm.git
$ cd debvm
$ tests/create-and-run.sh armhf unstable 2>&1 | /usr/bin/tee c-a-r.01.log
$ # kill the 'stuck' qemu guest process
$ mkdir debvm.fs
$ ldev="$( /usr/sbin/losetup --find --show test.ext4 )"
$ sudo mount $ldev ./debvm.fs
$ sudo /usr/sbin/chroot ./debvm.fs/ /usr/lib/dracut/dracut-install --debug -D /var/tmp/ --kerneldir /lib/modules/6.10.6-armmp --firmwaredirs /lib/firmware/updates/6.10.6-armmp:/lib/firmware/updates:/lib/firmware/6.10.6-armmp:/lib/firmware --debug -v -o -m 8139cp =drivers/block acenic
dracut-install: mkdir '/var/tmp/usr'
dracut-install: mkdir '/var/tmp/usr/lib'
dracut-install: ln -s 'usr/lib' '/var/tmp/lib'
dracut-install: mkdir '/var/tmp/lib/modules'
dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp'
dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel'
dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers'
dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net'
dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet'
dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/realtek'
dracut-install: cp '/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/realtek/8139cp.ko.xz' '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/realtek/8139cp.ko.xz'
dracut-install: cp '/lib/modules/6.10.6-armmp/kernel/drivers/net/mii.ko.xz' '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/mii.ko.xz'
dracut-install: mkdir '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/alteon'
dracut-install: cp '/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/alteon/acenic.ko.xz' '/var/tmp/lib/modules/6.10.6-armmp/kernel/drivers/net/ethernet/alteon/acenic.ko.xz'
dracut-install: Missing firmware acenic/tg2.bin for kernel module acenic
dracut-install: Missing firmware acenic/tg1.bin for kernel module acenic

Note how the =drivers/block is apparently ignored here.

[0] https://salsa.debian.org/helmutg/debvm/-/jobs/6127012
[1] https://salsa.debian.org/helmutg/debvm/-/jobs/6164782

#1079443#10
Date:
2024-08-23 13:04:08 UTC
From:
To:
Results of further diagnosis.

First, discovered that -v overrides --debug: removing -v from the
command line allowed debug messages and removing -o means any missing
modules causes an error, so now the small reproducer reports:

dracut-install: Handle module '=drivers/block'
dracut-install: Handling =drivers/block
dracut-install: Ignoring /lib/modules/6.10.6-armmp/extra/drivers/block
dracut-install: Ignoring /lib/modules/6.10.6-armmp/kernel/drivers/block
dracut-install: Ignoring /lib/modules/6.10.6-armmp/kernel/drivers/block
dracut-install: Ignoring /lib/modules/6.10.6-armmp/updates/drivers/block
dracut-install: ERROR: installing '=drivers/block'

The "Ignoring" message is generated in install_modules() when an FTSENT
is not a file nor a symbolic link. There are 13 files and
sub-directories in the directory but we only see 2 messages for the
module's path.

Since the directory is visited twice by the fts* functions, once for
preorder and again for postorder, that suggests something interesting is
going on.

I note fts_open( ... FTS_NOSTAT ...) and confirm with strace that no
stat*() functions are called on the contents of the directory. This
would imply that the FTSENT fts_info == FTS_NSOK and that would explain
the results being seen.

#1079443#17
Date:
2024-08-23 15:32:53 UTC
From:
To:
I've also tried to reproduce this problem, and can report a few
interesting factoids.

My test environment is an unstable arm64 VM, with qemu-arm-static
setup to execute for armhf binaries. qemu-arm-static comes from
bookworm; the running kernel too (6.1.0-23-arm64).

The test chroots were created fresh today.

On this same VM, I can reproduce the problem, when the chroot
resides on ext4.

When the chroot is created and resides on xfs, the problem is not
reproducible.

I've added some debugging output to dracut-install, but cannot see
any entries with FTS_NSOK.

debug messages with newly added debug messages:

dracut-install: Handle module '=drivers/block'
dracut-install: Handling =drivers/block
dracut-install: path1: /lib/modules/6.10.6-armmp/extra/drivers/block
dracut-install: path2: /lib/modules/6.10.6-armmp/kernel/drivers/block
dracut-install: path3: /lib/modules/6.10.6-armmp/updates/drivers/block
dracut-install: Checking /lib/modules/6.10.6-armmp/extra/drivers/block
dracut-install: Ignoring /lib/modules/6.10.6-armmp/extra/drivers/block fts_info=10
dracut-install: Checking /lib/modules/6.10.6-armmp/kernel/drivers/block
dracut-install: Ignoring /lib/modules/6.10.6-armmp/kernel/drivers/block fts_info=1
dracut-install: Checking /lib/modules/6.10.6-armmp/kernel/drivers/block
dracut-install: Ignoring /lib/modules/6.10.6-armmp/kernel/drivers/block fts_info=6
dracut-install: Checking /lib/modules/6.10.6-armmp/updates/drivers/block
dracut-install: Ignoring /lib/modules/6.10.6-armmp/updates/drivers/block fts_info=10

It looks like fts_read just doesn't see anything inside
kernel/drivers/block.

Chris

#1079443#22
Date:
2024-08-23 17:01:47 UTC
From:
To:
user debian-arm@lists.debian.org
usertag 1079443 time-t
thanks

Hi debian-arm,

in case you don't know yet, here is a bug affecting dracut-install
on armhf (and probably armel), causing the built initramfs to lack a
lot of kernel modules.  Probably makes a lot of things unbootable.

It looks like the bug is somewhere in the fts_* glibc routines or
maybe deeper down. Tj and I think t64 might be a source of the
problem.

On affected systems, this reproducer will output a single line (but
it should give about 13 or so):

  $ pax -w /usr/lib/modules/6.10.6-armmp/kernel/drivers/block | tar -t

(Assuming you have linux-image-6.10.6-armmp installed.)

As noted in my earlier reply, might only happen on ext4.

Good luck,
Chris

#1079443#27
Date:
2024-08-23 19:36:22 UTC
From:
To:
I've managed to set up a gdb-multilib debug session and have been
single-stepping through the glibc fts_* code that seems to be affected.

As Chris has found that xfs file-system doesn't seem to be affected but
ext4 is (the host file-system, not the file-system created by debvm)
that confuses the issue somewhat since the reproducers also work on a
several different kernel versions. I'm using a mainline build of
v6.10.6 amd64; Chris reported 6.1.0-23-arm64.

In my debug case the setup is in two terminals. I execute dracut-install
directly using qemu (rather than in the chroot) in order to be able to
do:

debian/debvm$ base=$PWD/debvm.fs; qemu-arm-static -g 9999 -L $base $base/usr/lib/dracut/dracut-install -D $base/var/tmp/ --kerneldir $base/lib/modules/6.10.6-armmp --firmwaredirs $base/lib/firmware/updates/6.10.6-armmp:$base/lib/firmware/updates:$base/lib/firmware/6.10.6-armmp:$base/lib/firmware --debug -m =drivers/block

I fetched the glibc source so gdb can use it:

$ cd /srv/NAS/Sunny/SourceCode/glibc
$ dget -x http://deb.debian.org/debian/pool/main/g/glibc/glibc_2.39-7.dsc

In the second terminal, in the dracut-ng base directory, I do:

dracut-ng$ gdb-multiarch -ex "directory /srv/NAS/Sunny/SourceCode/glibc/glibc-2.39" -ex "file /srv/NAS/Sunny/SourceCode/debian/debvm/debvm.fs/usr/lib/dracut/dracut-install"
-ex "set sysroot /srv/NAS/Sunny/SourceCode/debian/debvm/debvm.fs" -ex "target remote localhost:9999" -ex "break install_modules" -ex "break fts_open"  -ex "break fts_build if sp->fts_path[2] != 'y'"

That last conditional breakpoint is to avoid it stopping for all nodes at
or under "/sys/devices/platform".

Then using 'c' to fast-forward to the useful points, looking for the
".../kernel/drivers/block" in sp->fts_path

(gdb) p *sp
$8 = {fts_cur = 0x400160f8, fts_child = 0x0, fts_array = 0x0, fts_dev = 1792, fts_path = 0x40012450 "/srv/NAS/Sunny/SourceCode/debian/debvm/debvm.fs/lib/modules/6.10.6-armmp/kernel/drivers/block", fts_rfd = 0,
fts_pathlen = 4352, fts_nitems = 0, fts_compar = 0x0, fts_options = 15}

From here the loop that I believe should be iterating over the files and
sub-dirs via calling __readdir() does zero iterations due to __readdir()
presumably returning NULL and setting an error code.

I've not yet traced into that; that's for another day I think.

The loop:

https://codesearch.debian.net/show?file=glibc_2.39-7%2Fio%2Ffts.c&line=723#L723

#1079443#32
Date:
2024-08-23 21:31:49 UTC
From:
To:
Further debugging leads to the getdents64 syscall returning -1 at

https://sources.debian.org/src/glibc/2.39-7/sysdeps/unix/sysv/linux/getdents.c/?hl=56#L58

I'm including a dump of the gdb session including my attempts to examine
values as it went. Due to the compiler optimsing the code gdb jumps
around a bit and it is confusing since some apparent statement
executions aren't reached.

#1079443#37
Date:
2024-08-23 22:10:20 UTC
From:
To:
I've used qemu-arm-static -d strace ... to attempt to analyse the
syscall(s) and the file is attached. The getdent64 is near the end.
The SIGTRAPs are gdb single-stepping breakpoints so can be ignored.

Looking at the gdb tracing again the syscall is not failing but the
function __getdents() is reaching:

 return INLINE_SYSCALL_ERROR_RETURN_VALUE (EOVERFLOW);

which is in sysdeps/unix/sysv/linux/sysdep.h:

/* Set error number and return -1.  A target may choose to return the
internal function, __syscall_error, which sets errno and returns -1.
We use -1l, instead of -1, so that it can be casted to (void *).  */
#define INLINE_SYSCALL_ERROR_RETURN_VALUE(err)  \
({            \
__set_errno (err);        \
-1l;          \
})

and we have:

/usr/include/asm-generic/errno.h:58:#define     EOVERFLOW       75      /* Value too large for defined data type */

So it isn't clear right now why the error is not being handled and where
that should occur.

#1079443#42
Date:
2024-08-24 08:00:25 UTC
From:
To:
I think this will need re-assigning to another package but not clear
which as yet.

Thanks to Chris's investigations in the last few hours he's identified a
couple of bug reports that may be the cause or related; they do seem to
shed some needed insights from glibc experts onto this and contain quite
a few patch attempts.

From my initial reading of both and considering our experience here with
the debvm arch-test succeeding a week ago but failing now I wonder if
this is simply because the host running the CI jobs previously had
offsets that fitted inside 32 bits but later exceeded that and so an
overflow occurred (since host is returning 64 bits but the emulated
qemu-arm 32-bit glibc throws away the top 32 bits).

glibc <> qemu-user (going on since 2018):

[2.28 Regression]: New getdents{64} implementation breaks qemu-user

https://sourceware.org/bugzilla/show_bug.cgi?id=23960

kernel ext4 <> glibc:

Ext4 64 bit hash breaks 32 bit glibc 2.28+

https://bugzilla.kernel.org/show_bug.cgi?id=205957

#1079443#49
Date:
2024-08-24 10:23:08 UTC
From:
To:
After considerable discussion and investigation on IRC #debian-devel on
20240824 after 1000 UTC the consensus is this is a glibc issue with the
(rarely used) fts_* functions calling __getdents() instead of
getdents().

#1079443#58
Date:
2024-08-24 10:37:36 UTC
From:
To:
mjt identified that as well as this report's armhf emulation on 64 bit
host he can reproduce the apparent problem with i386:

Quotes:
see comment #12 in [0]
it looks like the prob is in fts in glibc, who calls __getdents() instead of __getdents64
gdb'ing this on i386 also leads to __readdir and not __readdir64
if it called getdents() instead of __getdents(), it would be aliased to getdents64 with LFS

[0] https://sourceware.org/bugzilla/show_bug.cgi?id=23960#c12

#1079443#65
Date:
2024-08-24 10:43:43 UTC
From:
To:
To ensure we capture all relavent info I'm copying here some test
results Chris produced using a custom-written executable.

Using emulation:

$ sudo chroot /mnt/e4/armhf /pdents-armhf
fd = 3
dp = 0x403190
entry   0 ino 10302 off 1335313586018546964 name ublk_drv.ko.xz
dp = 0x4031b8
entry   1 ino 10298 off 1441466563022220210 name nbd.ko.xz
dp = 0x4031d8
entry   2 ino 10304 off 2208640688691778334 name xen-blkback
dp = 0x4031f8
entry   3 ino 10216 off 2400073000029322655 name ..
dp = 0x403210
entry   4 ino 10301 off 2568612365796029161 name rbd.ko.xz
dp = 0x403230
entry   5 ino 10295 off 2642141484322030168 name loop.ko.xz
dp = 0x403250
entry   6 ino 10292 off 3028993804646130172 name brd.ko.xz
dp = 0x403270
entry   7 ino 10307 off 3830214992999727188 name zram
dp = 0x403288
entry   8 ino 10290 off 3870617626237255389 name aoe
dp = 0x4032a0
entry   9 ino 10303 off 4740941134542001648 name virtio_blk.ko.xz
dp = 0x4032c8
entry  10 ino 10293 off 5108466411937333823 name drbd
dp = 0x4032e0
entry  11 ino 10296 off 6213628172795130112 name mtip32xx
dp = 0x403300
entry  12 ino 10306 off 7520475764088490632 name xen-blkfront.ko.xz
dp = 0x403328
entry  13 ino 10289 off 8124502982319486926 name .
dp = 0x403340
entry  14 ino 10299 off 9223372036854775807 name null_blk
dp = 0x403360
entry  15 ino 0 off 0 name
---
fts: /lib/modules/6.10.6-armmp/kernel/drivers/block
fts: /lib/modules/6.10.6-armmp/kernel/drivers/block
---

Direct:

$ ./pdents
fd = 3
dp = 0xaaaae272e2a0
entry   0 ino 10302 off 1335313586018546964 name ublk_drv.ko.xz
dp = 0xaaaae272e2c8
entry   1 ino 10298 off 1441466563022220210 name nbd.ko.xz
dp = 0xaaaae272e2e8
entry   2 ino 10304 off 2208640688691778334 name xen-blkback
dp = 0xaaaae272e308
entry   3 ino 10216 off 2400073000029322655 name ..
dp = 0xaaaae272e320
entry   4 ino 10301 off 2568612365796029161 name rbd.ko.xz
dp = 0xaaaae272e340
entry   5 ino 10295 off 2642141484322030168 name loop.ko.xz
dp = 0xaaaae272e360
entry   6 ino 10292 off 3028993804646130172 name brd.ko.xz
dp = 0xaaaae272e380
entry   7 ino 10307 off 3830214992999727188 name zram
dp = 0xaaaae272e398
entry   8 ino 10290 off 3870617626237255389 name aoe
dp = 0xaaaae272e3b0
entry   9 ino 10303 off 4740941134542001648 name virtio_blk.ko.xz
dp = 0xaaaae272e3d8
entry  10 ino 10293 off 5108466411937333823 name drbd
dp = 0xaaaae272e3f0
entry  11 ino 10296 off 6213628172795130112 name mtip32xx
dp = 0xaaaae272e410
entry  12 ino 10306 off 7520475764088490632 name xen-blkfront.ko.xz
dp = 0xaaaae272e438
entry  13 ino 10289 off 8124502982319486926 name .
dp = 0xaaaae272e450
entry  14 ino 10299 off 9223372036854775807 name null_blk
dp = 0xaaaae272e470
entry  15 ino 0 off 0 name
---
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/ublk_drv.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/nbd.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/xen-blkback
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/xen-blkback/xen-blkback.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/xen-blkback
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/rbd.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/loop.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/brd.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/zram
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/zram/zram.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/zram
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/aoe
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/aoe/aoe.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/aoe
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/virtio_blk.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/drbd
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/drbd/drbd.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/drbd
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/mtip32xx
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/mtip32xx/mtip32xx.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/mtip32xx
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/xen-blkfront.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/null_blk
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/null_blk/null_blk.ko.xz
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block/null_blk
fts: /mnt/e4/armhf/lib/modules/6.10.6-armmp/kernel/drivers/block
---

#1079443#70
Date:
2024-08-24 12:33:36 UTC
From:
To:
fts_* functions, and fts_read() -> fts_build() end up calling

#if !_DIRENT_MATCHES_DIRENT64
__readdir_unlocked()

So follow this back to see why it is (not) set (on armhf)

#if defined __OFF_T_MATCHES_OFF64_T && defined __INO_T_MATCHES_INO64_T
/* Inform libc code that these two types are effectively identical.  */
# define _DIRENT_MATCHES_DIRENT64 1
#else
# define _DIRENT_MATCHES_DIRENT64 0
#endif

#if defined __LP64__ || (__TIMESIZE == 64 && __WORDSIZE == 32)
/* Tell the libc code that off_t and off64_t are actually the same type
for all ABI purposes, even if possibly expressed as different base types
for C type-checking purposes.  */
# define __OFF_T_MATCHES_OFF64_T  1

/* Same for ino_t and ino64_t.  */
# define __INO_T_MATCHES_INO64_T  1

#define __TIMESIZE       32

#define __TIMESIZE   64

#define __WORDSIZE                       32

#define __WORDSIZE                       64


Debian 2.39-7 build log:

$ getbuildlog glibc 2.39-7 armhf

...
echo -n "Build started: " ; date --rfc-2822; \
echo "---------------"; \
cd build-tree/armhf-libc && \
CC="arm-linux-gnueabihf-gcc-13 -U_FILE_OFFSET_BITS -U_TIME_BITS" \
CXX="arm-linux-gnueabihf-g++-13 -U_FILE_OFFSET_BITS -U_TIME_BITS" \
...

and gcc's options have early in their list:


#define __TIMESIZE       32

#1079443#75
Date:
2024-08-24 14:59:19 UTC
From:
To:
This seems to confirm the cause:

debvm$ sudo chroot debvm.fs apt-get install libc6-dev
debvm$ grep -rn 'define .*TIMESIZE' debvm.fs/usr/include
debvm.fs/usr/include/arm-linux-gnueabihf/bits/timesize.h:20:#define __TIMESIZE  32

#1079443#78
Date:
2024-08-24 15:20:44 UTC
From:
To:
control: forcemerge 916276 1079443

Hi

This bug keeps coming, but porters do not work on getting it fixed
upstream.

My position explained in the two other merged bugs still stands. Given
it only affects the qemu-user case, I do not want to take any risk
applying a patch that has not been reviewed and merged upstream. If we
end up "missing" files in the non qemu-user case, it might have some
security implications.

Therefore, 32-bit porters, please work on providing a patch to upstream
and get it merged. I'll then backport it to the debian package.

Regards
Aurelien

#1079443#97
Date:
2024-08-24 17:40:02 UTC
From:
To:
the fts_* functions are calling non-LFS __readdir. This will also fail on
large inode numbers, even without qemu.

#1079443#102
Date:
2024-08-24 17:57:15 UTC
From:
To:
control: unmerge 1079443
control: retitle 1079443 fts_* calling non-LFS __readdir

Ok, then I got misled by the earlier messages in that bug that pointed
the same upstream bugs. Unmerging them, and retitling because the
existing title is also misleading.

As you seems to have investigated seems more than me could you please
take care of reporting the bug in the upstream bugzilla? A simple
reproducer would be ideal.

Regards
Aurelien

#1079443#111
Date:
2024-11-03 10:25:48 UTC
From:
To:
Any news on that?

Regards
Aurelien