- Package:
- btrfs-progs
- Source:
- btrfs-progs
- Description:
- Checksumming Copy on Write Filesystem utilities
- Submitter:
- Graham Cobb
- Date:
- 2025-08-13 19:01:03 UTC
- Severity:
- minor
The new checks at mount time cause mount times for large filesystems to be much longer. My roughly 10TB filesystem now takes over 90 seconds to mount. Unfortunately, this is longer than the default systemd mount timer and systemd assumes the mount has failed (and, in fact, cancels it). If the mount is not marked with "nofail" this causes boot to fail and to drop into the rescue console. This new checking is a good thing, and neither the new checks, nor the systemd behaviour are bugs, nor are they the responsibility of btrfs-progs. It is likely that users of large btrfs filesystems will have btrfs-progs installed. So, I believe it would be extremely useful to add a NEWS item to the next btrfs-progs package release to warn btrfs users of this change and recommend that they consider making changes to /etc/fstab for any btrfs volumes mounted at boot time. Alternatively, the NEWS item could be included in the kernel release, however that would target a much larger group of people than just btrfs users. A release notes entry for bullseye should also be considered. Suggested NEWS text: BTRFS filesystems are now checked for tree corruption at mount time. For filesystems of around 10TB or larger, this is likely to cause the mount time to exceed the default systemd mount timeout of 90 seconds. If the filesystem is mounted at boot time, and not marked as "nofail", this will cause the boot to fail. The systemd mount timeout can be increased in /etc/fstab by setting the mount option x-systemd.mount-timeout= with a new timeout value in seconds. See systemd.mount(5) for full documentation on the systemd mount options that can be specified in /etc/fstab.
We see that, too, at the institute... any larger (few TB) filesystems in /etc/fstab make systemd cause the system to fail at boot... leaving it a state with no remote resuce (ssh) being possible. Since such filesystems would mount just fine... I would rather say that this functionality is a severe bug. Cheers, Chris.
(Sorry for the delay -- I investigated the cause but somehow failed to post here. Recently someone on the mailing list reported it also, with same conclusion.) 90 seconds for 10TB sounds like something is terribly wrong in your case. I currently have only one spinning rust array, of 3 disks 7TB, and it completes mount in around a second. But, there are folks with massive arrays with mount times of several minutes or tent of minutes, so the issue is real. How can a mount be "cancelled"? The syscall provides no way to abort a mount attempt -- it either succeeds or fails, with no possible timeouts on userspace side. I'm not experienced at debugging systemd issues, but it appears to me that systemd has a separate thread doing the mount() call, reports timeout despite the call being still in progress, then upon successful mount _unmounts_ the filesystem again. From: Christoph Anton Mitterer <calestyo@scientia.net> } We see that, too, at the institute... any larger (few TB) filesystems } in /etc/fstab make systemd cause the system to fail at boot... leaving } it a state with no remote resuce (ssh) being possible. } } Since such filesystems would mount just fine... I would rather say that } this functionality is a severe bug. They do mount successfully with a manual mount, so do they with initscripts. The timeout is done on systemd side, with the kernel doing as expected. Also, it can't be fixed in btrfs-progs, as no code in this package is run at mount time. Meow!
Thanks for looking into it. I agree with your analysis but the issue is real. I think the long mount times tend to occur when there are large numbers of subvolumes and other large trees. Although the problem is not fixable in btrfs-progs, I think it needs a NEWS item in btrfs-progs - otherwise many people who use btrfs disks will hit big problems after their Debian upgrade with no idea what the problem is or how they can fix it. I think including the NEWS text I proposed in the btrfs-progs package would avoid problems for a significant percentage of existing Debian btrfs users.
Hi, Graham Cobb <g+debian@cobb.uk.net> writes: I'm curious how "aged" the fs is, (largest generation from btrfs subvol list), how many subvolumes, if qgroups are used, raid56, are reflink copies of large trees regularly made, deduplication, compression, etc. I've updated the btrfs page on wiki.debian.org with your report and suggested workaround, and I'd like to provide more context to not scare people off btrfs. Ie, as Adam said, it's not normal for mount to take this long. Christoph Anton Mitterer <calestyo@scientia.net> writes: I'm also curious about if there are any factors here that are causing the failure. If this is truly typical btrfs+systemd scaling problem then, yes, it's unconscionable to pretend that everything is ok...and it at least needs to be documented. Please feel free to contribute to the wiki.debian.org page, by the way! Adam Borowski <kilobyte@angband.pl> writes: Hm, yes, I agree this sounds odd, but Fedora 33 shipped with btrfs by default in Oct 2020, and has stayed with that default, so I wonder if this bug was fixed in systemd by then? If not, I wonder if we're "doing things differently", so to speak, and triggering a bug that they're not (initramfs vs dracut, etc). Cheers, Nicholas
More recent kernels have definitely improved the situation. My largest
filesystem now takes about 60 seconds to mount at boot, although I have
not checked how long it takes if it was not cleanly unmounted.
This is an online backup disk (btrbk). It stores snapshots. Lots of
them. From many different systems. It is btrfs-over-lvm-over-luks. Data
is Single; Metadata is RAID1. There are two disks, each has an LV on a
VG on a LUKS volume.
# btrfs fi usage /mnt/snapshots/
Overall:
Device size: 18.00TiB
Device allocated: 15.22TiB
Device unallocated: 2.77TiB
Device missing: 0.00B
Device slack: 0.00B
Used: 15.10TiB
Free (estimated): 2.88TiB (min: 1.49TiB)
Free (statfs, df): 2.88TiB
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:15.01TiB, Used:14.90TiB (99.29%)
/dev/mapper/cryptdata16tb2--vg-backupsnapshot 10.52TiB
/dev/mapper/cryptbackup16tb--vg-backupsnapshot 4.49TiB
Metadata,RAID1: Size:110.00GiB, Used:103.12GiB (93.74%)
/dev/mapper/cryptdata16tb2--vg-backupsnapshot 110.00GiB
/dev/mapper/cryptbackup16tb--vg-backupsnapshot 110.00GiB
System,RAID1: Size:32.00MiB, Used:1.80MiB (5.62%)
/dev/mapper/cryptdata16tb2--vg-backupsnapshot 32.00MiB
/dev/mapper/cryptbackup16tb--vg-backupsnapshot 32.00MiB
Unallocated:
/dev/mapper/cryptdata16tb2--vg-backupsnapshot 1.37TiB
/dev/mapper/cryptbackup16tb--vg-backupsnapshot 1.40TiB
Some answers to your questions:
I'm curious how "aged" the fs is, (largest generation from btrfs subvol
list) -- 1248641
how many subvolumes -- 1810
if qgroups are used -- no
raid56 -- no - single data, raid1 metadata
are reflink copies of large trees regularly made -- yes (btrbk)
deduplication -- no
compression, etc. -- some
Note: there are two other btrfs filesystems on this system.
Same approach (btrfs-over-lvm-over-luks). Similar disks.
Both are around 6-8 TB. These mount much more quickly.
The main difference is they do not have anything like the number of
subvolumes, nor the continuous creation and deletion of subvolumes that
this backup disk has.
I think the way to frame this is not that it is a problem people are
likely to see, but that if they find themselves in that situation, the
answer is simple: add x-systemd.mount-timeout=180s (and consider adding
nofail).
I will try to take a look at the wiki in the next few days.
Graham Cobb <g+debian@cobb.uk.net> writes: I'm closing this bug, as a documentation issue, because the documentation on our wiki appears to have been sufficient--there has been no activity at this bug in two years. Cheers, Nicholas