#1137548 linux-image-6.12.90+deb13-amd64: kernel NULL deref in cap_capable() during /proc/<pid>/stat read causes system hang via orphaned spinlock

Package:
src:linux
Source:
src:linux
Submitter:
Zac Hester
Date:
2026-05-27 19:37:02 UTC
Severity:
normal
Tags:
#1137548#5
Date:
2026-05-25 01:36:52 UTC
From:
To:
  Severity: important
  Tags: upstream

  Dear Maintainer,

  On 2026-05-24, a routine `ps` invocation on Debian 13.5 (trixie-security
  kernel 6.12.90-1) triggered a kernel oops in cap_capable() during a
  /proc/<pid>/stat read. The faulting task died inside the kernel while
  still holding a non-sleepable lock (`exited with irqs disabled` /
  `preempt_count 1`); ~21s later another process (system redis-server)
  walked the same /proc path, hit the orphaned spinlock, and spun for
  ~13 minutes in native_queued_spin_lock_slowpath while the
  soft-lockup watchdog re-reported the same CPU. The machine became
  unresponsive to new logins (systemd-logind: "Failed to start session
  scope ... Connection timed out") and required a hard reboot.

  efi_pstore could not preserve the crash record across reboot
  (`pstore: backend (efi_pstore) writing error (-22)`), so the
  journal copy below is the only record.


  === Faulting oops (CPU 2, PID 24864, comm: ps, UID 2000) ===

  May 24 15:55:58 kernel: BUG: kernel NULL pointer dereference, address:
  00000000000000c8
  May 24 15:55:58 kernel: #PF: supervisor read access in kernel mode
  May 24 15:55:58 kernel: #PF: error_code(0x0000) - not-present page
  May 24 15:55:58 kernel: PGD 0 P4D 0
  May 24 15:55:58 kernel: Oops: Oops: 0000 [#1] PREEMPT SMP PTI
  May 24 15:55:58 kernel: CPU: 2 UID: 2000 PID: 24864 Comm: ps Not tainted
  6.12.90+deb13-amd64 #1  Debian 6.12.90-1
  May 24 15:55:58 kernel: Hardware name: ASUS All Series/Z97-A, BIOS 3503
04/18/2018
  May 24 15:55:58 kernel: RIP: 0010:cap_capable+0x1c/0x80
  May 24 15:55:58 kernel: RSP: 0018:ffffd1c8cca8f898 EFLAGS: 00010203
  May 24 15:55:58 kernel: RAX: ffff8ed685c42840 RBX: ffff8ed684e85480 RCX:
  0000000000000002
  May 24 15:55:58 kernel: RDX: 0000000000000013 RSI: 0000000000000000 RDI:
  ffff8ed684e85480
  May 24 15:55:58 kernel: RBP: 0000000000000000 R08: ffffffff9f64c6e0 R09:
  0000000000000000
  May 24 15:55:58 kernel: R10: ffffd1c8cca8fa90 R11: 0000000000001000 R12:
  0000000000000013
  May 24 15:55:58 kernel: R13: 0000000000000002 R14: ffff8ed685d5cc80 R15:
  0000000000000000
  May 24 15:55:58 kernel: CR2: 00000000000000c8 CR3: 000000010c0b2003 CR4:
  00000000001706f0
  May 24 15:55:58 kernel: Call Trace:
  May 24 15:55:58 kernel:  <TASK>
  May 24 15:55:58 kernel:  security_capable+0x58/0x180
  May 24 15:55:58 kernel:  ns_capable_noaudit+0x31/0x60
  May 24 15:55:58 kernel:  __ptrace_may_access+0x108/0x170
  May 24 15:55:58 kernel:  ptrace_may_access+0x2b/0x50
  May 24 15:55:58 kernel:  do_task_stat.isra.0+0xd1/0xeb0
  May 24 15:55:58 kernel:  proc_tgid_stat+0x14/0x20
  May 24 15:55:58 kernel:  proc_single_show+0x54/0xc0
  May 24 15:55:58 kernel:  seq_read_iter+0x11f/0x460
  May 24 15:55:58 kernel:  seq_read+0x12d/0x160
  May 24 15:55:58 kernel:  vfs_read+0xeb/0x360
  May 24 15:55:58 kernel:  ksys_read+0x6d/0xf0
  May 24 15:55:58 kernel:  do_syscall_64+0x87/0x1b0
  May 24 15:55:58 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  May 24 15:55:58 kernel:  </TASK>
  May 24 15:55:58 kernel: CR2: 00000000000000c8
  May 24 15:55:58 kernel: ---[ end trace 0000000000000000 ]---
  May 24 15:55:58 kernel: pstore: backend (efi_pstore) writing error (-22)
  May 24 15:55:58 kernel: note: ps[24864] exited with irqs disabled
  May 24 15:55:58 kernel: note: ps[24864] exited with preempt_count 1


  === Follow-on soft lockup (CPU 3, PID 719, comm: redis-server) ===

  May 24 15:56:46 kernel: watchdog: BUG: soft lockup - CPU#3 stuck for 45s!
  [redis-server:719]
  May 24 15:56:46 kernel: CPU: 3 UID: 0 PID: 719 Comm: redis-server
Tainted: G      D
           6.12.90+deb13-amd64 #1  Debian 6.12.90-1
  May 24 15:56:46 kernel: Tainted: [D]=DIE
  May 24 15:56:46 kernel: RIP:
0010:native_queued_spin_lock_slowpath+0x6e/0x2a0
  May 24 15:56:46 kernel: Call Trace:
  May 24 15:56:46 kernel:  <TASK>
  May 24 15:56:46 kernel:  _raw_spin_lock+0x29/0x30
  May 24 15:56:46 kernel:  ptrace_may_access+0x21/0x50
  May 24 15:56:46 kernel:  proc_pid_permission+0x9e/0xf0
  May 24 15:56:46 kernel:  inode_permission+0xdb/0x190
  May 24 15:56:46 kernel:  link_path_walk.part.0.constprop.0+0xad/0x390
  May 24 15:56:46 kernel:  path_openat+0x9b/0x12d0
  May 24 15:56:46 kernel:  do_filp_open+0xc4/0x170
  May 24 15:56:46 kernel:  do_sys_openat2+0xae/0xe0
  May 24 15:56:46 kernel:  __x64_sys_openat+0x55/0xa0
  May 24 15:56:46 kernel:  do_syscall_64+0x87/0x1b0
  May 24 15:56:46 kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  May 24 15:56:46 kernel:  </TASK>

  This soft-lockup message recurred 16 more times for the same CPU#3
  [redis-server:719], climbing through "stuck for" reports of:

    45s, 71s, 104s, 130s, 160s, 186s, 220s, 246s, 279s, 305s, 339s,
    365s, 395s, 421s, 454s, 481s, 514s, 540s, 574s, 600s, 630s, 656s,
    689s, 715s, 749s, 775s

  Last journal entry was 16:09:58 (775s stuck). System was hard-rebooted
  at ~16:10. The full window (oops + all repeats, ~2,300 lines) is
  attached as kernel-oops-2026-05-24.log.


  === Reproducer ===

  No deliberate reproducer is known. The faulting task was a normal `ps`
  invocation as an unprivileged user (UID 2000). The system had been
  up since 2026-05-23 16:54 with ordinary workloads (no out-of-tree
  modules, no debugger attached). I have not attempted to reproduce
  because of the system-wide impact.


  === Analysis ===

  The call stack -- vfs_read on /proc/<pid>/stat -> seq_read ->
  proc_tgid_stat -> do_task_stat -> ptrace_may_access ->
  __ptrace_may_access -> ns_capable_noaudit -> security_capable ->
  cap_capable+0x1c, dereferencing NULL at offset 0xc8 -- is identical
  in shape (and exact offset) to the older Red Hat report at
https://access.redhat.com/solutions/6049691 (RHEL 7.9 / 3.10.0-1160),
  which has long been suspected as an exit-path race between exit_mm()
  (clearing task->mm) and a concurrent /proc/<pid>/stat reader running
  __ptrace_may_access.

  CVE-2026-46333 ("ssh-keysign-pwn") addresses what appears to be the
  same race in __ptrace_may_access:
https://www.openwall.com/lists/oss-security/2026/05/15/5


https://blog.qualys.com/vulnerabilities-threat-research/2026/05/20/cve-2026-46333-loca

l-root-privilege-escalation-and-credential-disclosure-in-the-linux-kernel-ptrace-path

  The Qualys advisory describes the fix as adding a `user_dumpable` bit
  to task_struct so that __ptrace_may_access no longer needs to
  dereference task->mm when checking dumpable. Per LKML/Qualys, the
  stable backport landed in 6.12.89 -- which is included in this package
  (6.12.90-1, dated 2026-05-22).

  However the crash occurred ON this fixed kernel. Possible
  interpretations:

    (a) The 6.12.89 fix closes the fd-disclosure path but a residual
        NULL-deref window remains in __ptrace_may_access against an
        exiting task with task->mm == NULL.

    (b) A regression in the new helper handling tasks without mm --
        the call goes through ns_capable_noaudit -> security_capable
        -> cap_capable+0x1c, and 0xc8 is consistent with a deref into
        a field of a struct that the new code path may still assume
        exists.

  I have not bisected. Forwarding to upstream (stable and
  security@kernel.org) may be warranted given the proximity to
  CVE-2026-46333.


  === Severity rationale ===

  Setting "important" rather than "grave": the trigger is a normal
  unprivileged `ps`; the failure mode is full system unresponsiveness
  requiring a hard reboot; only one occurrence observed in ~7 days of
  uptime so it is not reliably reproducible. Happy to raise if it
  recurs or if a reproducer is identified.


  === Mitigation in place ===

  None. Same kernel package is still installed (older 6.12.88 and
  6.12.86 are present as fallback grub entries). I am not running any
  ptrace-using software intentionally. I will note here if this recurs.

#1137548#16
Date:
2026-05-25 05:58:30 UTC
From:
To:
Control: tags -1 + moreinfo

Thanks for your report. We would need ideally here a way to reproduce
the issue, thus tagging moreinfo for now.

Regards,
Salvatore