Severity: important
Tags: upstream
Dear Maintainer,
On 2026-05-24, a routine `ps` invocation on Debian 13.5 (trixie-security
kernel 6.12.90-1) triggered a kernel oops in cap_capable() during a
/proc/<pid>/stat read. The faulting task died inside the kernel while
still holding a non-sleepable lock (`exited with irqs disabled` /
`preempt_count 1`); ~21s later another process (system redis-server)
walked the same /proc path, hit the orphaned spinlock, and spun for
~13 minutes in native_queued_spin_lock_slowpath while the
soft-lockup watchdog re-reported the same CPU. The machine became
unresponsive to new logins (systemd-logind: "Failed to start session
scope ... Connection timed out") and required a hard reboot.
efi_pstore could not preserve the crash record across reboot
(`pstore: backend (efi_pstore) writing error (-22)`), so the
journal copy below is the only record.
=== Faulting oops (CPU 2, PID 24864, comm: ps, UID 2000) ===
May 24 15:55:58 kernel: BUG: kernel NULL pointer dereference, address:
00000000000000c8
May 24 15:55:58 kernel: #PF: supervisor read access in kernel mode
May 24 15:55:58 kernel: #PF: error_code(0x0000) - not-present page
May 24 15:55:58 kernel: PGD 0 P4D 0
May 24 15:55:58 kernel: Oops: Oops: 0000 [#1] PREEMPT SMP PTI
May 24 15:55:58 kernel: CPU: 2 UID: 2000 PID: 24864 Comm: ps Not tainted
6.12.90+deb13-amd64 #1 Debian 6.12.90-1
May 24 15:55:58 kernel: Hardware name: ASUS All Series/Z97-A, BIOS 3503
04/18/2018
May 24 15:55:58 kernel: RIP: 0010:cap_capable+0x1c/0x80
May 24 15:55:58 kernel: RSP: 0018:ffffd1c8cca8f898 EFLAGS: 00010203
May 24 15:55:58 kernel: RAX: ffff8ed685c42840 RBX: ffff8ed684e85480 RCX:
0000000000000002
May 24 15:55:58 kernel: RDX: 0000000000000013 RSI: 0000000000000000 RDI:
ffff8ed684e85480
May 24 15:55:58 kernel: RBP: 0000000000000000 R08: ffffffff9f64c6e0 R09:
0000000000000000
May 24 15:55:58 kernel: R10: ffffd1c8cca8fa90 R11: 0000000000001000 R12:
0000000000000013
May 24 15:55:58 kernel: R13: 0000000000000002 R14: ffff8ed685d5cc80 R15:
0000000000000000
May 24 15:55:58 kernel: CR2: 00000000000000c8 CR3: 000000010c0b2003 CR4:
00000000001706f0
May 24 15:55:58 kernel: Call Trace:
May 24 15:55:58 kernel: <TASK>
May 24 15:55:58 kernel: security_capable+0x58/0x180
May 24 15:55:58 kernel: ns_capable_noaudit+0x31/0x60
May 24 15:55:58 kernel: __ptrace_may_access+0x108/0x170
May 24 15:55:58 kernel: ptrace_may_access+0x2b/0x50
May 24 15:55:58 kernel: do_task_stat.isra.0+0xd1/0xeb0
May 24 15:55:58 kernel: proc_tgid_stat+0x14/0x20
May 24 15:55:58 kernel: proc_single_show+0x54/0xc0
May 24 15:55:58 kernel: seq_read_iter+0x11f/0x460
May 24 15:55:58 kernel: seq_read+0x12d/0x160
May 24 15:55:58 kernel: vfs_read+0xeb/0x360
May 24 15:55:58 kernel: ksys_read+0x6d/0xf0
May 24 15:55:58 kernel: do_syscall_64+0x87/0x1b0
May 24 15:55:58 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
May 24 15:55:58 kernel: </TASK>
May 24 15:55:58 kernel: CR2: 00000000000000c8
May 24 15:55:58 kernel: ---[ end trace 0000000000000000 ]---
May 24 15:55:58 kernel: pstore: backend (efi_pstore) writing error (-22)
May 24 15:55:58 kernel: note: ps[24864] exited with irqs disabled
May 24 15:55:58 kernel: note: ps[24864] exited with preempt_count 1
=== Follow-on soft lockup (CPU 3, PID 719, comm: redis-server) ===
May 24 15:56:46 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 45s!
[redis-server:719]
May 24 15:56:46 kernel: CPU: 3 UID: 0 PID: 719 Comm: redis-server
Tainted: G D
6.12.90+deb13-amd64 #1 Debian 6.12.90-1
May 24 15:56:46 kernel: Tainted: [D]=DIE
May 24 15:56:46 kernel: RIP:
0010:native_queued_spin_lock_slowpath+0x6e/0x2a0
May 24 15:56:46 kernel: Call Trace:
May 24 15:56:46 kernel: <TASK>
May 24 15:56:46 kernel: _raw_spin_lock+0x29/0x30
May 24 15:56:46 kernel: ptrace_may_access+0x21/0x50
May 24 15:56:46 kernel: proc_pid_permission+0x9e/0xf0
May 24 15:56:46 kernel: inode_permission+0xdb/0x190
May 24 15:56:46 kernel: link_path_walk.part.0.constprop.0+0xad/0x390
May 24 15:56:46 kernel: path_openat+0x9b/0x12d0
May 24 15:56:46 kernel: do_filp_open+0xc4/0x170
May 24 15:56:46 kernel: do_sys_openat2+0xae/0xe0
May 24 15:56:46 kernel: __x64_sys_openat+0x55/0xa0
May 24 15:56:46 kernel: do_syscall_64+0x87/0x1b0
May 24 15:56:46 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
May 24 15:56:46 kernel: </TASK>
This soft-lockup message recurred 16 more times for the same CPU#3
[redis-server:719], climbing through "stuck for" reports of:
45s, 71s, 104s, 130s, 160s, 186s, 220s, 246s, 279s, 305s, 339s,
365s, 395s, 421s, 454s, 481s, 514s, 540s, 574s, 600s, 630s, 656s,
689s, 715s, 749s, 775s
Last journal entry was 16:09:58 (775s stuck). System was hard-rebooted
at ~16:10. The full window (oops + all repeats, ~2,300 lines) is
attached as kernel-oops-2026-05-24.log.
=== Reproducer ===
No deliberate reproducer is known. The faulting task was a normal `ps`
invocation as an unprivileged user (UID 2000). The system had been
up since 2026-05-23 16:54 with ordinary workloads (no out-of-tree
modules, no debugger attached). I have not attempted to reproduce
because of the system-wide impact.
=== Analysis ===
The call stack -- vfs_read on /proc/<pid>/stat -> seq_read ->
proc_tgid_stat -> do_task_stat -> ptrace_may_access ->
__ptrace_may_access -> ns_capable_noaudit -> security_capable ->
cap_capable+0x1c, dereferencing NULL at offset 0xc8 -- is identical
in shape (and exact offset) to the older Red Hat report at
https://access.redhat.com/solutions/6049691 (RHEL 7.9 / 3.10.0-1160),
which has long been suspected as an exit-path race between exit_mm()
(clearing task->mm) and a concurrent /proc/<pid>/stat reader running
__ptrace_may_access.
CVE-2026-46333 ("ssh-keysign-pwn") addresses what appears to be the
same race in __ptrace_may_access:
https://www.openwall.com/lists/oss-security/2026/05/15/5
https://blog.qualys.com/vulnerabilities-threat-research/2026/05/20/cve-2026-46333-loca
l-root-privilege-escalation-and-credential-disclosure-in-the-linux-kernel-ptrace-path
The Qualys advisory describes the fix as adding a `user_dumpable` bit
to task_struct so that __ptrace_may_access no longer needs to
dereference task->mm when checking dumpable. Per LKML/Qualys, the
stable backport landed in 6.12.89 -- which is included in this package
(6.12.90-1, dated 2026-05-22).
However the crash occurred ON this fixed kernel. Possible
interpretations:
(a) The 6.12.89 fix closes the fd-disclosure path but a residual
NULL-deref window remains in __ptrace_may_access against an
exiting task with task->mm == NULL.
(b) A regression in the new helper handling tasks without mm --
the call goes through ns_capable_noaudit -> security_capable
-> cap_capable+0x1c, and 0xc8 is consistent with a deref into
a field of a struct that the new code path may still assume
exists.
I have not bisected. Forwarding to upstream (stable and
security@kernel.org) may be warranted given the proximity to
CVE-2026-46333.
=== Severity rationale ===
Setting "important" rather than "grave": the trigger is a normal
unprivileged `ps`; the failure mode is full system unresponsiveness
requiring a hard reboot; only one occurrence observed in ~7 days of
uptime so it is not reliably reproducible. Happy to raise if it
recurs or if a reproducer is identified.
=== Mitigation in place ===
None. Same kernel package is still installed (older 6.12.88 and
6.12.86 are present as fallback grub entries). I am not running any
ptrace-using software intentionally. I will note here if this recurs.