I'd like to cherry-pick a number of upstream stable bug fixes to
trixie, targeting data consistency, crash, panic and deadlocks. Since
this is the first proposed update of the trixie version, the debdiff
looks quite huge (~250KB). I'll go through them so that it might be
easier to review. Also I'd like to thank for the patience.
This update adds 35 cherry-picks from upstream zfs-2.3-release branch,
on top of the 6 cherry-picks already shipped in 2.3.2-2. Roughly half
are under 20 lines each. The largest individual patches are
0030/0031/0034 (each 100-235 lines, where most of the bulk are tests).
The full zfs-2.3.2..zfs-2.3.7 range contains ~350 commits. Only small
ones that fix real issues are considered, some test hunks are dropped
to avoid dependencies to not absolutely required commits.
Best regards,
Aron
-----------------------------
It might be easier to review through salsa's web interface:
https://salsa.debian.org/zfsonlinux-team/zfs/-/commit/059671d2629716520c47c7c7df2f349945e42713
Notation: each patch line shows the diffstat as [Nf +X/-Y] meaning
N files changed, X insertions, Y deletions.
Patches by category:
Data corruption / on-disk consistency:
0013 Fix off-by-one bug in range tree code [1f +1/-1]
Range tree could report false overlaps. One-line fix. (b9324a1e7)
0023 BRT: Fix ranges to blocks conversion math [1f +1/-1]
Missing parentheses caused memory corruption on vdevs >64TB
when block cloning was used. One-line fix. (19b9d9397)
0028 draid: fix data corruption after disk clear [8f +74/-16]
Cleared faulted disk + detached spare path could corrupt
other still-attached dRAID spares; observable via scrub
cksum errors / data loss in multi-spare scenarios.
(b8cc4c504; 4 of the 8 files are tests/)
0029 draid: fix import failure after disks repl. [1f +4/-2]
ASIZE-rounding issue meant replacing dRAID disks with a
slightly smaller disk could prevent pool import. (b5d344cf5)
0030 draid: allow seq resilver reads from degraded [6f +162/-35]
Previous check was too strict and could skip valid replicas
during sequential resilver, leading to reconstruction
failures. (ed932ff54; 4 of the 6 files are tests/, including
a new redundancy_draid_degraded1.ksh)
0031 draid: fix cksum errors w/ degraded disks [8f +235/-19]
With more than nparity disks faulted, only the first nparity
were marked faulted; spare rebuilds for the others did not
track properly and later scrubs saw cksum errors.
(6741f501e; 3 of the 8 files are tests/, including a new
redundancy_draid_degraded2.ksh)
0034 Fix read corruption after block clone+truncate [7f +160/-1]
copy_file_range over a recent truncate could cause subsequent
reads to return holes instead of the cloned data. Triggers
under high I/O (compilation workloads).
(dceca0d4a; the dbuf.c fix itself is 6/+2/-1; the bulk is a
new clone_after_trunc.c test binary + ksh test.)
0035 Prevent range tree corruption race (dnode_sync) [4f +87/-45]
Race in zfs_range_tree_walk caused stale reads / range-tree
inconsistency in sync context. (84fbeba11)
0038 Fix redundant declaration of dsl_pool_t [1f +5/-6]
Small cleanup. Pulled in solely to make 0039 build: 0039
uses 'dp' which upstream had moved to the top of
vdev_rebuild_thread in this commit. (dbf4e74e5)
0039 Fix rare cksum errors after rebuild [2f +10/-1]
Race in vdev_rebuild_thread re-enables metaslab before the
txg with rebuilt ranges is sync-ed, allowing new allocations
to interfere. Adds a txg sync wait. (ffdedd441)
0040 Initialize vr_last_txg for rebuild [1f +4/-1]
Companion to 0039: avoid spurious txg_wait_synced on empty
first metaslab. (c2673ffb7)
0041 Fix vdev_rebuild_range() tx commit [1f +3/-1]
Ordering bug: child zio could be added after txg_sync had
waited. (b652eb69e)
Crash / UAF / panic:
0012 Fix null deref in spa_vdev_remove_cancel_sync [1f +3/-4]
ms_sm may be NULL; don't dereference it. (64e77fdf3)
0022 Synchronize the update of feature refcount [2f +9]
Concurrent feature_sync() could panic from an unprotected
refcount update. (8e7a31086)
0024-0026 HIGHMEM kmap API violation trio
Three related fixes: ZFS assumed multiple pages can be
kmap'd at once and ignored required LIFO ordering. Crashes
and possible memory corruption on 32-bit HIGHMEM systems
and on x86_64 under PaX KERNSEAL.
(0dcb88203 [1f +15/-2], 445879656 [1f +8/-8],
4f77b3013 [1f +6/-4])
0036 dmu_direct: avoid UAF in dmu_write_direct_done [1f +7/-1]
Direct I/O error path dereferenced freed dsa->dsa_tx.
Save in local var before freeing dsa. (a188a58d5)
0037 Fix 'kernel BUG at mm/usercopy.c' [1f +10/-3]
zfs_uiomove() returned wrong errno on short copy, causing
panic when a cgroup-OOM-killed process was doing ZFS I/O.
(748d0525e)
Deadlocks / leaks / NULL handling:
0011 dmu_objset_hold_flags rele on error [1f +1/-1]
Reference leak on error path. (25ad9ce69)
0014 linux/zvol_os: don't try disk ops on alloc fail [1f +4/-2]
NULL deref of zvo_disk on gendisk alloc failure. (04493ca81)
0019 Skip dbuf_evict_one for reclaim thread [6f +46/-1]
Deadlock when kswapd entered dbuf eviction and tried to take
a dbuf hash lock already held. (c405a7a35)
0027 Fix deadlock on dmu_tx_assign from vdev_rebuild [3f +6/-7]
vdev_rebuild held spa_config_lock as writer while waiting
for txg, but txg_sync also wanted spa_config_lock; rebuild
could hang indefinitely. (a97fba427)
0032 fix memleak in spa_errlog.c [1f +1/-1]
(8e21c8856)
0033 Fix s_active leak in zfsvfs_hold [1f +1]
Permanently leaks the VFS superblock s_active ref, leaving
the pool unexportable (EBUSY) until reboot. (a9358748c)
Other correctness:
0007 Fix double spares for failed vdev [4f +209/-4]
ZED could attach two spares to one failed vdev when the
replacement disk also failed during resilver. The 209-line
figure is mostly a new auto_spare_double.ksh test (~160
lines); the spa.c fix itself is ~45 lines. (4b014840e)
0008 Fix race resilver wait vs offline/detach [1f +8/-5]
scn_state was cleared before vdev_dtl_reassess, so a
follow-up offline/detach could fail with "no valid
replicas". (101edf7ed)
0009 spa: clear checkpoint info during retry [1f +1]
Cherry-pick from main. (1c11d3a54)
0010 icp: explicit_memset() in gcm_clear_ctx [1f +2/-2]
Compiler may elide a plain memset of sensitive crypto
state before free; harden by always using explicit_memset.
(a4de1d38d)
0015 vdev: skip faulting disks pending removal [1f +4/-1]
Race where vdev_remove_wanted set after probe init caused
redundant fault+removal. (f292b0f14)
0016 Set spa_final_txg in spa_unload [1f +6]
Triggered an assertion about ms_defer tree on reboot/
shutdown after dedup workloads. (bf4baee81)
0017 zfs_log_write: callback on last itx only [1f +5/-2]
Write callback fired once per itx for split writes, making
cleanup hard. (9c0f5bc18)
0018 ZED: Fix device type detection and pool iter [1f +36/-31]
Hotplug events on partitioned spare devices were
misidentified as l2arc. (0c928f7a3)
0020 zvol: Fix blk-mq sync [3f +61/-20]
zvol blk-mq path sent FLUSH and TRIM down the read code
path instead of write, so sync writes were not actually
sync. (0bb5950e7; 2 of the 3 files are test updates)
0021 Fix two infinite loops if dmu_prefetch_max=0 [1f +4/-2]
User-tunable foot-gun. (81ceee0cf)