#1135235 linux-image-6.19.13+deb14-amd64: Reoccuring host crash "Invalid SPTE change" with gaming win kvm/qemu guest and device passthrough

Package:
src:linux
Source:
src:linux
Submitter:
Maximilian Senftleben
Date:
2026-06-25 20:27:02 UTC
Severity:
normal
Tags:
#1135235#5
Date:
2026-04-29 19:17:14 UTC
From:
To:
Dear Maintainer,

- I have a Windows kvm/qemu guest that uses device passthrough for my GPU.
- Sometimes while playing the host system crashes/freezes, this only happens
during load/gaming, and sometimes 1-2 times a day, sometimes not at all.


System:
Linux myhost 6.19.13+deb14-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.19.13-1
(2026-04-18) x86_64 GNU/Linux

CPU:
vendor_id       : GenuineIntel
cpu family      : 6
model           : 183
model name      : Intel(R) Core(TM) i5-14400

#1135235#10
Date:
2026-05-05 18:22:34 UTC
From:
To:
Control: tags -1 + moreinfo

From the log we see that the looking-glass application. AFAICS you do
not use the related dkms module? (that would taint the kernel).

Is this a recent regression? And if yes since you can reporoduce it,
can you bisect the changes between the last working kernel and
6.19.13?

Test please as well 7.0.3 in unstable (as 6.19.y is EOL) to check if
the issue is resolved there or still present.

Regards,
Salvatore

#1135235#17
Date:
2026-05-10 09:58:28 UTC
From:
To:
Sry, I didn't play that much the last week, and it took some days before
the issue occured again.
Sometime, as this time the system remains somewhat usable, i.e. no force
reboot required
afaik I do not use the looking-glass dkms module
I had crashs like this for some longer time, but didn't bother as I
thought it might be related to debian/testing instability and/or
problems in my setup. (FYI ran memtest several times without issues)
I switched to the 7.0.4 kernel:
uname -a
 > Linux mspc2024debian 7.0.4+deb14-amd64 #1 SMP PREEMPT_DYNAMIC Debian
7.0.4-1 (2026-05-07) x86_64 GNU/Linux


Just for comparison, the latest crash journalctl log:


Regards
Maximilian

#1135235#24
Date:
2026-05-17 13:02:38 UTC
From:
To:
Hi Maximilian,

[I noticed your reply did not went to the bugreport so I'm including
again for this reply the bug address]

Right if the module would have been loaded then there should have been
a tained kernel.

But it is still odd as the issues in the trace you posted happens
around the looking-glass application. I will ask upstream if they have
some input on how to debug that.

Regards,
Salvatore

#1135235#29
Date:
2026-05-17 13:24:27 UTC
From:
To:
Hi

Maximilian Senftleben reported the following in Debian (cf.
https://bugs.debian.org/1135235), it should be noted while Maximilian
uses the looking-glass application (which is acompanied with dkms
modules, they are not loaded and do not tain the kernel). Do you have
an idea how to debug this?

Any ideas here?

Regards,
Salvatore

#1135235#34
Date:
2026-05-17 13:28:40 UTC
From:
To:
Hi Maximilian,

do you have any idea when this started occurring? Can you try an
earlier 6.19.x kernel?

Thanks,

Paolo

#1135235#41
Date:
2026-05-18 13:43:57 UTC
From:
To:
Odds are very good this is due to host memory corruption, and is not a bug in
KVM's MMU.  We (Google) had a period of time where our kernel was triggering stack
overflows if a networking IRQ hit at just the right/wrong time, and whenever the
overflow wandered into KVM page tables, it would result in failures like these.
I got quite familiar with the signature :-)

If you aren't already, can you try running with CONFIG_VMAP_STACK=y?  Stack
overflow doesn't seem likely in this case since the gfn would put the SPTE in the
middle of the page table, but it's easy enough to rule out.

The other thing to try would be to run with CONFIG_KASAN=y.  That might make your
gaming quite miserable, but if this is indeed due to a rogue write, it's the best
shot for catching the culprit.

Or as Paolo suggested, you could try bisecting.

#1135235#48
Date:
2026-06-03 13:54:01 UTC
From:
To:
Hi,

sorry for the late reply, took me a while to first built the kernel with
that options and then actually find time to play long enough.

If I did everything correctly, then I build 7.0.7 with
- CONFIG_VMAP_STACK=y
- CONFIG_KASAN=y

I did not get it to crash on that built kernel yet,
however I booted 7.0.9+deb14-amd64 once, and after playing a while got a
crash again.

I will try using the built kernel next week to see if I can get it to
crash as well.

Or do I have to look somewhere else if kasan is active?
Not sure if it could be something else, however I at least run memtest
for ~12h without problems.


Regards

#1135235#53
Date:
2026-06-05 21:42:03 UTC
From:
To:
Hmm, can you try 7.0.9 with KASAN?  Or even just a 7.0.9 kernel that you built?
It's possible there's a bug somewhere between 7.0.7 and 7.0.9.

KASAN reports issues in dmesg.  But generally speaking, if the error is bad
enough to crash the kernel, you'll see a KASAN splat *and* a crash.

#1135235#58
Date:
2026-06-24 20:05:10 UTC
From:
To:
I tried a 7.0.10 with KASAN for several days, and now I am running
7.0.12+deb14.1-amd64 since a couple of days, and at least so far I was
not able to reproduce my issue, i.e. I had no crash so far.

Regards

#1135235#63
Date:
2026-06-25 15:46:14 UTC
From:
To:
+lists to capture this for posterity

That, and the fact that 7.0.7 was fine, strongly suggests a broken fix got
backported and landed in 7.0.8 or 7.0.9, and then a fix-for-the-fix landed in
7.10.  There aren't any KVM commits of interest anywhere in that range, which
supports my theory that KVM is an innocent bystander that ran afoul of memory
corruption due to a bug elsewhere in the kernel.

Unless you want to bisect to figure out exactly what commit broken things, and
what commit fixed things, I think it makes sense to consider this resolved unless
the problem occurs on a 7.0.10+ kernel.

#1135235#68
Date:
2026-06-25 20:24:36 UTC
From:
To:
Source: linux
Source-Version: 7.0.10-1

Ack, I'm marking this as fixed with 7.0.10 based version in Debian then.

Regards,
Salvatore