- Package:
- initramfs-tools
- Source:
- initramfs-tools
- Submitter:
- Lukas Wunner
- Date:
- 2025-04-14 07:54:01 UTC
- Severity:
- wishlist
Dracut, which provides linux-initramfs-tool and is thus an alternative to initramfs-tools, supports restoring the initrd on shutdown and pivoting into it: https://www.kernel.org/pub/linux/utils/boot/dracut/dracut.html#_dracut_on_shutdown One example where this is needed is a ZFS root filesystem: A clean shutdown requires unmounting the root filesystem and exporting the ZFS storage pool containing that filesystem. Dracut modules may contain shutdown scripts which are called after the system has pivoted to the initrd. In the case of ZFS, the shutdown script looks like this: https://github.com/zfsonlinux/zfs/blob/master/dracut/90zfs/export-zfs.sh.in This is not specific to ZFS but affects anyone having the root filesystem on an LVM. Currently, dracut is the only option for these folks. It would be nice if initramfs-tools supported a shutdown procedure akin to dracut to give people a choice.
Hi, Lukas Wunner: I might try to come up with a hackish PoC for Tails soon (rationale for the curious: we will soon start relying on the kernel's memory poisoning to erase most memory on shutdown; this can only work if the read-write branch of our aufs filesystem is unmounted on shutdown, and switching to dracut is a longer-term project, so our options so far are either hacking this support into initramfs-tools, or using a dracut-generated initrd for shutdown only). FWIW Arch Linux' mkinitcpio also does: https://git.archlinux.org/mkinitcpio.git/tree/shutdown Details of the needed interface can be found in: * https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/ * https://www.freedesktop.org/wiki/Software/systemd/RootStorageDaemons/ * systemd-shutdown(8) Cheers,
Hi,
intrigeri:
Here we go! Installing the four following files (slightly adapted to
drop a couple Tails-specific bits) on a Stretch system seems to do the
job. I hope it can allow interested people to validate this approach,
and then if there's enough demand I bet someone will integrate it into
initramfs-tools properly :)
If additional cleanup must be done from inside the initramfs after
returning to it, drop snippets in /usr/share/initramfs-tools/hooks/*
that install the required scripts into /lib/systemd/system-shutdown/
*in the initramfs*. E.g. for Tails I had to do quite more work there
to ensure the aufs stack our root filesystem uses is disassembled
properly (again in order to have the aufs read-write branch, on tmpfs,
cleaned up and its content erased by Linux' memory poisoning); I'll
contribute this code to live-boot if/when this feature is properly
integrated into initramfs-tools.
I don't know if I'll work more on this wrt. initramfs-tools.
It'll depend a lot on the timing of Tails moving to dracut, which is
entirely unclear at this time. Sorry!
/lib/systemd/system/initramfs-shutdown.service
----------------------------------------------
[Unit]
Description=Restore /run/initramfs on shutdown
Documentation=https://www.freedesktop.org/wiki/Software/systemd/InitrdInterface/
After=local-fs.target boot.mount boot.automount
Wants=local-fs.target
Conflicts=shutdown.target umount.target
DefaultDependencies=no
ConditionPathExists=!/run/initramfs/bin/sh
[Service]
RemainAfterExit=yes
Type=oneshot
ExecStart=/bin/true
ExecStop=/usr/share/initramfs-tools/initramfs-restore
[Install]
WantedBy=multi-user.target
/usr/share/initramfs-tools/initramfs-restore
--------------------------------------------
#!/bin/sh
set -e
set -u
WORKDIR=$(mktemp -d)
/usr/bin/unmkinitramfs /initrd.img "$WORKDIR"
mv "$WORKDIR"/main/* /run/initramfs/
rm -rf "$WORKDIR"
/lib/systemd/system-shutdown/initramfs-tools
--------------------------------------------
#!/bin/sh
# Otherwise systemd-shutdown cannot execute /run/initramfs/shutdown
mount -o remount,exec /run
/usr/share/initramfs-tools/hooks/shutdown
-----------------------------------------
#!/bin/sh
set -e
PREREQ=""
prereqs () {
echo "${PREREQ}"
}
case "${1}" in
prereqs)
prereqs
exit 0
;;
esac
. /usr/share/initramfs-tools/hook-functions
# systemd-shutdown itself
mkdir -p $DESTDIR/lib/systemd
copy_exec /lib/systemd/systemd-shutdown /shutdown
# Ensure systemd detects when we're in the initramfs on shutdown
# (see the in_initrd function in the systemd source tree)
touch $DESTDIR/etc/initrd-release
exit 0
Cheers,
I have yet to investigate intrigeri's suggestions from 2017, however I would suggest that this is something that needs to be upgraded from wishlist in 2022, and here's the reason simply enough: root@aki:~# nvme smart-log /dev/nvme0 Smart Log for NVME device:nvme0 namespace-id:ffffffff [..] unsafe_shutdowns : 106 [..] num_err_log_entries : 284 [..] root@aki:~# nvme smart-log /dev/nvme1 Smart Log for NVME device:nvme1 namespace-id:ffffffff [..] unsafe_shutdowns : 121 [..] num_err_log_entries : 291 [..] Given that the frequency and number of SMART errors are deemed an indicator of drive health, that's bad. Also, improper shutdown on NVMe devices could be particularly problematic because they have caches and wear leveling and cleanup cycles that could happen any time the drive is "running" until a shutdown command is issued and responded to. There might actually be some risk of data corruption/loss. (I doubt it with commodity consumer SSDs, but Debian isn't just run on those.) For a few weeks, we tried on #debian to sort out the cause of the above errors. We thought NVMe drive quirk Linux doesn't support? Maybe Linux is issuing the shutdown command and not waiting long enough? There's Google bait suggesting that's a problem, and there's some BS factoids in dpkg I should remove the next time I connect to OFTC describing the "solution" which I've since discovered doesn't work. This was hard to test because obviously no logger is running at this point of the shutdown process. The root cause of the problem isn't an unknown quirk, it's that I have LVM on LUKS. (See what I did there?) Connected a drive with an unencrypted Debian system on it that mounted my main installation's /boot and even the LUKS/LVM root somewhere and never got a single unsafe shutdown despite multiple reboots/shutdowns. Because that temp install's root was not on LVM on LUKS backing. Dracut is a suboptimal solution. In part because after three days of trying to get it to boot my system, I've yet to see it do so, and because while there's lots of documentation for it, it's for other distributions, it's wrong, it's obsolete, or it's misleading. Including one rantthrough from 2017 that offers a profanity-laden survey of most of the others and why they don't work for Debian systems or at all. As far as I can tell you either need to significantly modify grub or switch to systemd-boot or set up Dracut to generate an EFI executable blob using files that aren't available on a Debian system or throw up my hands and go use Fedora until I understand Dracut enough to try and use it on Debian. Or something. Again: What sparse documentation exists is spotty, inconsistent, and at least five years out of date. Dracut is not how Debian does things, just like OpenRC and rEFInd are not how Debian does things. That's all there if you want to set it up, but you're not going to find many Debian resources on using it. I think unsafe shutdowns of NVMe devices is actually a bug. And I think it could cause data loss or corruption on more advnaced hardware than I'm using. There's a few options for addressing it and most of them become problems beyond initramfs-tools' scope. But this seven year old bug might be the path of least resistance. Joseph
Hi Joseph. The last paragraph of this e-mail is specifically addressed to you, but most of this e-mail is addressed generally. Also, apologies if this message is a bit rushed. I have a few things to do today. I was planning to try out intrigeri's solution on a VM but have not had the chance to do this. I agree that this should be higher than wishlist for the above reason plus Lukas's ZFS shutdown problem mentioned in the initial description/submission of this bug. This really should be fixed for Bookworm. Awhile back, I did have a look around the fix. From what I remembered, intrigeri's solution used a systemd shutdown 'script' to check for devmaps or whatever of LVMs, ZFS partitions, etc... and runs specific commands to umount the partitions. However, I think my memory may be bad because I "now" don't see evidence of such umounts in intrigeri's solution! I would like to try things out today but maybe too rushed. Jo, have you been able to try out intrigeri's solution (in GENERAL as opposed to his specific patch/fix, which is mentioned in this bug report and may have bits missing)? The reason I say this is because you would have the exact recreation steps and be able to do it easily. For me, it would be a shot in the dark or awkward for me to recreate. I would only be able to check that root LVM on LUKS would not cause any untoward problems. Thanks, Gervase.
Just in case it is not obvious (I did not see it until I toggled "useless messages"!)... This Bug#778849 (Severity: wishlist) blocks Bug#978642 [Wipe LUKS Disk Encryption Key for Root Disk from RAM during Shutdown to defeat Cold Boot Attacks from Initial Ramdisk (initramfs-tools or dracut)].
Apparently, I got confused. What I saw is the script called 'shutdown' from the mkinitcpio package used in Arch Linux (see https://gitlab.archlinux.org/archlinux/mkinitcpio/mkinitcpio/-/blob/master/shutdown ). What it does is (1) recursively umount the devices, (2) detaches loop back devices and then (3) disassembles stacked devices (i.e. encrypted devices, lvm and raid). In contrast, what intrigeri's solution SEEMS to do (I haven't done any experimentation using the solution) is provide a way for Debian's initrd process to "pivot" back to a systemd shutdown procedure within an initramfs environment, as opposed to running the Arch Linux shutdown script. This shutdown procedure differs from Arch Linux's because its initramfs infrastructure differs from Debian's, I assume? As intrigeri wrote in his instructions, the relevant scripts would need to be written for dismantling devices ('virtual' or physical) and placed in /usr/share/initramfs-tools/hooks/* (if I understood things correctly). So, if ZFS was installed as root, there would need to be a script for that and/or if LUKS was installed as root, there would need to be a script for that, etc... The way that intrigeri's solution sets up the shutdown executable by just copying it to initramfs seems very clunky to me. Shouldn't it be in the initramfs image file already even before the system is switched on/booted up!? Anyway, the above is my understanding of the situation. It may be completely wrong because I barely understand the initrd process! Thanks, Gervase.
The following initrd info may or may not be pertinent to this bug in respect to how initrds may be created in future versions of Debian...
a. netbooting Debian Live on diskless hosts.
b. "zpool export -a" on servers.
I am only considering case (a), below.
I tried intrigeri's approach for Debian Live but I ran into a couple of problems:
1. it assumes /initrd.img inside the rootfs exists and
is consistent with the already-running system.
This is not the case for me (I remove it to save space), and
also not necessarily the case during upgrades.
2. it tries to unpack /initrd.img after systemd-networkd stops.
Without KeepConfiguration= (which is a pain to guarantee),
that means no network access, which means no access to remote rootfs.
I instead tried just keeping the boot initrd around.
Using a simple bind-mount didn't work (I don't understand why) – SOME files are missing after switch_root.
Doing a full cp -a did work, though.
This method seems to work for my very simple test case of failed-to-unmount-rootfs error going away.
I'm really not happy with it overall, though.
I've run out of "time budget" to work on this in the short term.
https://github.com/cyberitsolutions/bootstrap2020/tree/twb/doc/workaround-778849
PS: I looked at dracut, but it's simply unsupported for live-boot (Debian Live / Tails), and
for servers, I found it unreliable (much worse than initramfs-tools).
(e.g. if bash has a security update, dracut doesn't trigger and the embedded copy of bash in the initrd remains vulnerable.)
(e.g. telling dracut to use only busybox/klibc and not bash breaks, because lots of dracut components need bash but don't declare a dependency on it.)
(e.g. dracut is written in bash and regularly has errors but doesn't exit non-zero, so you do not notice until the server doesn't actually boot anymore.)
https://github.com/systemd/systemd/blob/v252/src/shutdown/shutdown.c#L422 i.e. it's similar to arch's script, except it's 1) C code; 2) distro-agnostic; and 3) a bit feature-limited. I think if you want it to run arbitrary other commands (e.g. "zpool export -a"), you would need more code. I think for that you'd want systemdize /run/initramfs/shutdown (i.e. be a copy of systemd's /bin/init), and then run some subset of https://github.com/systemd/systemd/blob/v252/man/bootup.xml#L291-L330 Note that systemd can "be" the boot initrd, too, which is the previous flow chart: https://github.com/systemd/systemd/blob/v252/man/bootup.xml#L236-L288 AFAIK Debian initramfs-tools doesn't support this at all. AFAIK ArchLinux supports this, but it is opt in (off by default). Last time I looked (around Debian 10), Debian dracut theoretically supported putting systemd in charge of boot initrd (and shutdown initrd?), BUT it also installed a zillion bits of coreutils that systemd itself doesn't use. Since my goal was to REDUCE the attack surface of the boot initrd, I gave up on dracut at the time. I think it'd be better if /run/initramfs/shutdown used existing code -- either /lib/systemd/systemd-shutdown/*.shutdown, or maybe .service units, if that's appropriate. But I confess I still do not understand how a "pure systemd" boot initrd + shutdown initrd would actually look.
I was also experiencing improper shutdown of a root filesystem on nvme drives :( This is a new Bookworm install with the root file system on an LVM thin pool which in turn resides on two nvme drives configured for Raid10 via mdadm. I tried the suggestion from intrigeri but did not accomplish a successful pivot to initramfs. I -believe- the problem was that the "mount -o remount,exec /run" occurred too "late" when it was attempted in /usr/lib/systemd/system-shutdown/initramfs-tools. I moved the "mount -o remount,exec /run" to /usr/share/initramfs-tools/initramfs-restore and was able to get systemd to successfully pivot into the initramfs, and detach all drives ... YMMV :)
Hello, I sneak in just to report another situation where this feature would be useful: root filesystem placed on an MD Raid array with an external write-intent bitmap. To be effective the write-intent bitmap file needs to be placed on an external partition outside of the raid array. This creates a chicken and egg problem where the raid array needs the partition where the file is placed which in turn needs the root partition to be mounted. Haven't tried it, but this probably could be solved within initramfs by mounting the external partition somewhere under /run before assembling the raid array. This needs to be unwinded at shutdown and I see no way to do it properly outside of initramfs, as the unwinding would require the following steps: - Unmount root - Stop the raid array, so that the bitmap file is no longer used - Unmount the write-intent bitmap partition Another solution might consider the following steps: - Mount root read only, so no further writes could happen - Remove the bitmap file from the raid array - Unmount the write-intent bitmap partition May be this could be done outside of initramfs but I'm not sure if the bitmap file would be used again after reboot. If I find some spare time I'll try to experiment using some VM. FYI, with the root filesystem on an hybrid NVME/HDD Raid 1 I'm experiencing the same NVME "unsafe_shutdowns" problem reported by Joseph Carter. Not unexpected considering that this situation is pretty similar. Hope it helps. Bye,