#1054514 linux-image-6.1.0-13-amd64: Debian VM with qxl graphics freezes frequently

Package:
src:linux
Source:
src:linux
Submitter:
Timo Lindfors
Date:
2025-12-10 12:39:03 UTC
Severity:
normal
Tags:
#1054514#5
Date:
2023-10-24 20:14:32 UTC
From:
To:
Steps to reproduce:
1) Install Debian 12 as a virtual machine using virt-manager, choose qxl
    graphics card. You only need basic installation without wayland or X.
2) Login from the console and save thë following to reproduce.bash:

#!/bin/bash

chvt 3
for j in $(seq 80); do
     echo "$(date) starting round $j"
     if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != ""
]; then
         echo "bug was reproduced after $j tries"
         exit 1
     fi
     for i in $(seq 100); do
         dmesg > /dev/tty3
     done
done

echo "bug could not be reproduced"
exit 0


3) Run chmod a+x reproduce.bash
4) Run ./reproduce.bash and wait for up to 20 minutes.

Expected results:
4) The system prints a steady flow of text without kernel error messages

Actual messages:
4) At some point the text stops flowing and the script prints "bug was
    reproduced". If you run "journalctl --boot" you see

kernel: [TTM] Buffer eviction failed
kernel: qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM
BO



More info:
1) The bug does not occur if I downgrade the kernel to
    linux-image-5.10.0-26-amd64_5.10.197-1_amd64.deb from Debian 11.
2) I used the following test_linux.bash to bisect this issue against
    upstream source:

#!/bin/bash
set -x

gitversion="$(git describe HEAD|sed 's@^v@@')"

git checkout drivers/gpu/drm/ttm/ttm_bo.c include/drm/ttm/ttm_bo_api.h
git show bec771b5e0901f4b0bc861bcb58056de5151ae3a | patch -p1
# Build
cp ~/kernel.config .config
# cp /boot/config-$(uname -r) .config
# scripts/config --enable LOCALVERSION_AUTO
# scripts/config --disable DEBUG_INFO
# scripts/config --disable SYSTEM_TRUSTED_KEYRING
# scripts/config --set-str SYSTEM_TRUSTED_KEYS ''
# scripts/config --disable STACKPROTECTOR_STRONG
make olddefconfig
# make localmodconfig
make -j$(nproc --all) bindeb-pkg
rc="$?"
if [ "$rc" != "0" ]; then
     exit 125
fi
git checkout drivers/gpu/drm/ttm/ttm_bo.c include/drm/ttm/ttm_bo_api.h

package="$(ls --sort=time ../linux-image-*_amd64.deb|head -n1)"
version=$(echo $package | cut -d_ -f1|cut -d- -f3-)

if [ "$gitversion" != "$version" ]; then
     echo "Build produced version $gitversion but got $version, ignoring"
     #exit 255
fi

# Deploy
scp $package target:a.deb
ssh target sudo apt install ./a.deb
ssh target rm -f a.deb
ssh target ./grub_set_default_version.bash $version
ssh target sudo shutdown -r now
sleep 40

detected_version=$(ssh target uname -r)
if [ "$detected_version" != "$version" ]; then
     echo "Booted to $detected_version but expected $version"
     exit 255
fi

# Test
exec ssh target sudo ./reproduce.bash


Bisect printed the following log:

git bisect start
# bad: [ed29c2691188cf7ea2a46d40b891836c2bd1a4f5] drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.
git bisect bad ed29c2691188cf7ea2a46d40b891836c2bd1a4f5
# bad: [762949bb1da78941b25e63f7e952af037eee15a9] drm: fix drm_mode_create_blob comment
git bisect bad 762949bb1da78941b25e63f7e952af037eee15a9
# bad: [e40f97ef12772f8eb04b6a155baa1e0e2e8f3ecc] drm/gma500: Drop DRM_GMA600 config option
git bisect bad e40f97ef12772f8eb04b6a155baa1e0e2e8f3ecc
# bad: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
git bisect bad 5a838e5d5825c85556011478abde708251cc0776
# bad: [d2b6f8a179194de0ffc4886ffc2c4358d86047b8] Merge tag 'xfs-5.13-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
git bisect bad d2b6f8a179194de0ffc4886ffc2c4358d86047b8
# bad: [68a32ba14177d4a21c4a9a941cf1d7aea86d436f] Merge tag 'drm-next-2021-04-28' of git://anongit.freedesktop.org/drm/drm
git bisect bad 68a32ba14177d4a21c4a9a941cf1d7aea86d436f
# bad: [0698b13403788a646073fcd9b2294f2dce0ce429] drm/amdgpu: skip PP_MP1_STATE_UNLOAD on aldebaran
git bisect bad 0698b13403788a646073fcd9b2294f2dce0ce429
# bad: [e1a5e6a8c48bf99ea374fb3e535661cfe226bca4] drm/doc: Add RFC section
git bisect bad e1a5e6a8c48bf99ea374fb3e535661cfe226bca4
# bad: [ed29c2691188cf7ea2a46d40b891836c2bd1a4f5] drm/i915: Fix userptr so we do not have to worry about obj->mm.lock, v7.
git bisect bad ed29c2691188cf7ea2a46d40b891836c2bd1a4f5
# bad: [2c8ab3339e398bbbcb0980933e266b93bedaae52] drm/i915: Pin timeline map after first timeline pin, v4.
git bisect bad 2c8ab3339e398bbbcb0980933e266b93bedaae52
# bad: [2eb8e1a69d9f8cc9c0a75e327f854957224ba421] drm/i915/gem: Drop relocation support on all new hardware (v6)
git bisect bad 2eb8e1a69d9f8cc9c0a75e327f854957224ba421
# bad: [b5b6f6a610127b17f20c0ca03dd27beee4ddc2b2] drm/i915/gem: Drop legacy execbuffer support (v2)
git bisect bad b5b6f6a610127b17f20c0ca03dd27beee4ddc2b2
# bad: [06debd6e1b28029e6e77c41e59a162868f377897] Merge tag 'drm-intel-next-2021-03-16' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
git bisect bad 06debd6e1b28029e6e77c41e59a162868f377897
# good: [e19eede54240d64b4baf9b0df4dfb8191f7ae48b] Merge branch 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
git bisect good e19eede54240d64b4baf9b0df4dfb8191f7ae48b
# good: [1e28eed17697bcf343c6743f0028cc3b5dd88bf0] Linux 5.12-rc3
git bisect good 1e28eed17697bcf343c6743f0028cc3b5dd88bf0
# bad: [6af70eb3b40edfc8bdf2373cdc2bcf9d5a20c8c7] drm/atmel-hlcdc: Rename custom plane state variable
git bisect bad 6af70eb3b40edfc8bdf2373cdc2bcf9d5a20c8c7
# good: [4ca77c513537700d3fae69030879f781dde1904c] drm/qxl: release shadow on shutdown
git bisect good 4ca77c513537700d3fae69030879f781dde1904c
# bad: [4a11bd1e88af130f50a72e0f54391c1c7d268e03] drm/ast: Add constants for VGACRCB register bits
git bisect bad 4a11bd1e88af130f50a72e0f54391c1c7d268e03
# bad: [5c209d8056b9763ce544ecd7dadb3782cdaf96ed] drm/gma500: psb_spank() doesn't need it's own file
git bisect bad 5c209d8056b9763ce544ecd7dadb3782cdaf96ed
# bad: [db0c6bd2c0c0dada8927cd46a7c34c316a3a6c04] drm/gem: Export drm_gem_vmap() and drm_gem_vunmap()
git bisect bad db0c6bd2c0c0dada8927cd46a7c34c316a3a6c04
# bad: [f4a84e165e6d58606097dd07b5b78767a94b870c] drm/qxl: allocate dumb buffers in ram
git bisect bad f4a84e165e6d58606097dd07b5b78767a94b870c
# good: [a7709b9b89a67f3ead2d188b1d0c261059b1f291] drm/qxl: handle shadow in primary destroy
git bisect good a7709b9b89a67f3ead2d188b1d0c261059b1f291
# bad: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait
git bisect bad 5a838e5d5825c85556011478abde708251cc0776
# good: [5f6c871fe919999774e8535ea611a6f84ee43ee4] drm/qxl: properly free qxl releases
git bisect good 5f6c871fe919999774e8535ea611a6f84ee43ee4
# first bad commit: [5a838e5d5825c85556011478abde708251cc0776] drm/qxl: simplify qxl_fence_wait

I took a look at

commit 5a838e5d5825c85556011478abde708251cc0776 (refs/bisect/bad)
Author: Gerd Hoffmann <kraxel@redhat.com>
Date:   Thu Feb 4 15:57:10 2021 +0100

     drm/qxl: simplify qxl_fence_wait

     Now that we have the new release_event wait queue we can just
     use that in qxl_fence_wait() and simplify the code a lot.

     Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
     Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
     Link: http://patchwork.freedesktop.org/patch/msgid/20210204145712.1531203-10-kraxel@redhat.com


and noticed that the bug does not occur if I boot 6.1 kernel with this
patch reverted (see attached file).

#1054514#12
Date:
2023-10-24 21:09:10 UTC
From:
To:
Hi Timo,

Thanks for the excelent constructed report! I think it's best to
forward this directly to upstream including the people for the
bisected commit to get some idea.

Can you reproduce the issue with 6.5.8-1 in unstable as well?

If not, are you able to isolate an upstream fix which should be
backported to the 6.1.y series as well?

Regards,
Salvatore

#1054514#17
Date:
2023-10-24 21:39:47 UTC
From:
To:
Hi,

Thanks for the quick reply!

Unfortunately yes:

ansible@target:~$ uname -r
6.5.0-3-amd64
ansible@target:~$ time sudo ./reproduce.bash
Wed 25 Oct 2023 12:27:00 AM EEST starting round 1
Wed 25 Oct 2023 12:27:24 AM EEST starting round 2
Wed 25 Oct 2023 12:27:48 AM EEST starting round 3
bug was reproduced after 3 tries

real    0m48.838s
user    0m1.115s
sys     0m45.530s

I also tested upstream tag v6.6-rc6:

...
+ detected_version=6.6.0-rc6
+ '[' 6.6.0-rc6 '!=' 6.6.0-rc6 ']'
+ exec ssh target sudo ./reproduce.bash
Wed 25 Oct 2023 12:37:16 AM EEST starting round 1
Wed 25 Oct 2023 12:37:42 AM EEST starting round 2
Wed 25 Oct 2023 12:38:10 AM EEST starting round 3
Wed 25 Oct 2023 12:38:36 AM EEST starting round 4
Wed 25 Oct 2023 12:39:01 AM EEST starting round 5
Wed 25 Oct 2023 12:39:27 AM EEST starting round 6
bug was reproduced after 6 tries


For completeness, here is also the grub_set_default_version.bash script
that I had to write to automate this (maybe these could be in debian
wiki?):

#!/bin/bash
set -x

version="$1"

idx=$(expr $(grep "menuentry " /boot/grub/grub.cfg | sed 1d |grep -n "'Debian GNU/Linux, with Linux $version'"|cut -d: -f1) - 1)
exec sudo grub-set-default "1>$idx"

#1054514#24
Date:
2023-10-24 23:55:09 UTC
From:
To:
Thanks for the regression report. I'm adding it to regzbot:

#regzbot ^introduced: 5a838e5d5825c8
#regzbot title: simplifying qxl_fence_wait() makes VRAM BO allocation fail
#regzbot from: Timo Lindfors <timo.lindfors@iki.fi>

#1054514#33
Date:
2023-12-06 09:56:40 UTC
From:
To:
Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Gerd, it seems this regression[1] fell through the cracks. Could you
please take a look? Or is there a good reason why this can't be
addressed? Or was it dealt with and I just missed it?

[1] apparently caused by 5a838e5d5825c8 ("drm/qxl: simplify
qxl_fence_wait") [v5.13-rc1] from Gerd; for details see
https://lore.kernel.org/regressions/ZTgydqRlK6WX_b29@eldamar.lan/

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

#1054514#38
Date:
2024-03-08 01:08:50 UTC
From:
To:
Hi,
As initially reported by Timo in the QXL driver will crash given enough
workload:
https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b9af@leemhuis.info/
I initially came across this problem when migrating Debian VMs from Bullseye
to Bookworm. This bug will somewhat randomly but consistently happen, even
just by using neovim with plugins or playing a video. This exception would
then cascade and make Xorg crash too.

The error log from dmesg would have `[TTM] Buffer eviction failed` followed
by either a `failed to allocate VRAM BO` or `failed to allocate GEM object`.
And the error log from Xorg would have `qxl(0): error doing QXL_ALLOC`
followed by a backtrace and segmentation fault.

I can confirm the problem still exists in latest kernel versions:
https://gitlab.freedesktop.org/drm/kernel @ c6d6a82d8a9f
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git @ 1870cdc0e8de

When I was investigating this issue I ended up creating a script which
triggers the issue in just a couple of minutes when executed under uxterm.
YMMV according to your system, for example when using urxvt crashes were
not as consistent, likely due to it being more efficient and having less
video memory allocations.
For me this is the fastest way to trigger the bug. Here follows:
```
#!/bin/bash
print_gradient_with_awk() {
    local arg="$1"
    if [[ -n $arg ]]; then
        arg=" ($arg)"
    fi
    awk -v arg="$arg" 'BEGIN{
        s="/\\/\\/\\/\\/\\"; s=s s s s s s s s;
        for (colnum = 0; colnum<77; colnum++) {
            r = 255-(colnum*255/76);
            g = (colnum*510/76);
            b = (colnum*255/76);
            if (g>255) g = 510-g;
            printf "\033[48;2;%d;%d;%dm", r,g,b;
            printf "\033[38;2;%d;%d;%dm", 255-r,255-g,255-b;
            printf "%s\033[0m", substr(s,colnum+1,1);
        }
        printf "%s\n", arg;
    }'
}
for i in {1..10000}; do
    print_gradient_with_awk $i
done
```

Timo initially reported:
commit 5f6c871fe919 ("drm/qxl: properly free qxl releases") as working fine
commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") introducing the bug

The bug occurs whenever a timeout is reached in wait_event_timeout.
To fix this issue I updated the code to include a busy wait logic, which
was how the last working version operated. That fixes this bug while still
keeping the code simple (which I suspect was the motivation for the
5a838e5d5825 commit in the first place), as opposed to just reverting to
the last working version at 5f6c871fe919
The choice for the use of HZ as a scaling factor for the loop was that it
is also used by ttm_bo_wait_ctx which is one of the indirect callers of
qxl_fence_wait, with the other being ttm_bo_delayed_delete

To confirm the problem no longer manifests I have:
- executed my own test case pasted above
- executed Timo's test case pasted below
- played a video stream in mplayer for 3h (no audio stream because
  apparently pulseaudio and/or alsa have memory leaks that make the
  system run out of memory)

For quick reference here is Timo's script:
```
#!/bin/bash
chvt 3
for j in $(seq 80); do
    echo "$(date) starting round $j"
    if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" ]; then
        echo "bug was reproduced after $j tries"
        exit 1
    fi
    for i in $(seq 100); do
        dmesg > /dev/tty3
    done
done
echo "bug could not be reproduced"
exit 0
```

From what I could find online it seems that users that have been affected
by this problem just tend to move from QXL to VirtIO, that is why this bug
has been hidding for over 3 years now.
This issue was initially reported by Timo 4 months ago but the discussion
seems to have stalled.
It would be great if this could be addressed and avoid it falling through
the cracks.

Thank you for your time.
--- Alex Constantino (1): drm/qxl: fixes qxl_fence_wait drivers/gpu/drm/qxl/qxl_release.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) base-commit: 1870cdc0e8dee32e3c221704a2977898ba4c10e8 -- 2.39.2
#1054514#43
Date:
2024-03-08 01:08:51 UTC
From:
To:
Fix OOM scenario by doing multiple notifications to the OOM handler through
a busy wait logic.
Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
result in a '[TTM] Buffer eviction failed' exception whenever it reached a
timeout.

Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
Link: https://lore.kernel.org/regressions/fb0fda6a-3750-4e1b-893f-97a3e402b9af@leemhuis.info
Reported-by: Timo Lindfors <timo.lindfors@iki.fi>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514
Signed-off-by: Alex Constantino <dreaming.about.electric.sheep@gmail.com>
---
 drivers/gpu/drm/qxl/qxl_release.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 368d26da0d6a..51c22e7f9647 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -20,8 +20,6 @@
  * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  */

-#include <linux/delay.h>
-
 #include <trace/events/dma_fence.h>

 #include "qxl_drv.h"
@@ -59,14 +57,24 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
 {
 	struct qxl_device *qdev;
 	unsigned long cur, end = jiffies + timeout;
+	signed long iterations = 1;
+	signed long timeout_fraction = timeout;

 	qdev = container_of(fence->lock, struct qxl_device, release_lock);

-	if (!wait_event_timeout(qdev->release_event,
+	// using HZ as a factor since it is used in ttm_bo_wait_ctx too
+	if (timeout_fraction > HZ) {
+		iterations = timeout_fraction / HZ;
+		timeout_fraction = HZ;
+	}
+	for (int i = 0; i < iterations; i++) {
+		if (wait_event_timeout(
+				qdev->release_event,
 				(dma_fence_is_signaled(fence) ||
-				 (qxl_io_notify_oom(qdev), 0)),
-				timeout))
-		return 0;
+					(qxl_io_notify_oom(qdev), 0)),
+				timeout_fraction))
+			break;
+	}

 	cur = jiffies;
 	if (time_after(cur, end))

#1054514#48
Date:
2024-03-08 08:58:59 UTC
From:
To:
Thx for working on this.
https://lore.kernel.org/regressions/ZTgydqRlK6WX_b29@eldamar.lan/ , as
that the report and not just a reply to prod things.

Ciao, Thorsten

#1054514#53
Date:
2024-03-20 15:25:48 UTC
From:
To:
I just added to the CC), it seems to me this regression fix did not
maybe any progress since it was posted. Did I miss something, is it just
"we are busy with the merge window", or is there some other a reason?
Just wondering, I just saw someone on a Fedora IRC channel complaining
about the regression, that's why I'm asking. Would be really good to
finally get this resolved...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

#1054514#58
Date:
2024-03-27 13:27:29 UTC
From:
To:
Hi,

I've ping'd Gerd last week about it, but he couldn't remember the
details of why that patch was warranted in the first place.

If it works, I'd prefer to revert the original patch that we know used
to work instead of coming up with some less proven logic, which seems to
be quite different to what it used to be.

Alex, could you try reverting 5a838e5d5825c85556011478abde708251cc0776
and letting us know the result?

Thanks!
Maxime

#1054514#63
Date:
2024-04-04 18:14:47 UTC
From:
To:
Changes since v1:
- replace new code logic in v1 with past code version by reverting
  commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
- add missing code dependency from
  commit d72277b6c37d ("dma-buf: nuke DMA_FENCE_TRACE macros v2")
--- Hi, To clarify, the reason for my original patch, as explained in more detail in my previous email, was that it fixed the issue while keeping the code simpler (which was the original reason for the commit being reverted here). But I perfectly understand opting for previously battle tested code. Makes sense. As requested I've reverted commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") and then executed both Timo's and my test cases, and 1h video playback. I was unable to reproduce the bug with any of those cases. So the revert seems to fix the bug. Please note, and as stated in the commit message, due to a dependency to DMA_FENCE_WARN this patch also restores the relevant code deleted by commit d72277b6c37d ("dma-buf: nuke DMA_FENCE_TRACE macros v2"). A couple of things I've observed from dmesg: - (1) it always triggers a single warning at boot, this is issued by `WARN_ON(list_empty(&release->bos));` @ qxl_release_free @ qxl_release.c Maybe better for this to be addressed separately from this patch? - (2) there are quite a few `failed to wait on release xx after spincount 301` messages as printed by the patch v2 code when the test case shell scripts are being executed. - (3) there can be a single error message `[drm:qxl_release_from_id_locked [qxl]] *ERROR* failed to find id in release_idr` - (4) occasional error messages about `[drm:drm_atomic_helper_commit_planes [drm_kms_helper]] *ERROR* head 9 wrong:`. Issue (1) relates to this patch v2 and also happened with kernel from base-commit 1870cdc0e8de (March 1st). Issue (2) also relates to this patch v2 but only happens with kernel from base-commit a6bd6c933339 (March 30th). Both (3) and (4) are unrelated to this patch as they can occur independently of it and I'm guessing these may be related to the recent changes discussed in https://lore.kernel.org/dri-devel/38d38331-3848-4995-b78e-a87ecae722d5@linux.intel.com/T/#u For reference here is the output of (1): ``` [ 20.779514] ------------[ cut here ]------------ [ 20.779525] workqueue: WQ_MEM_RECLAIM ttm:ttm_bo_delayed_delete [ttm] is flushing !WQ_MEM_RECLAIM events:qxl_gc_work [qxl] [ 20.779666] WARNING: CPU: 1 PID: 601 at kernel/workqueue.c:3692 check_flush_dependency+0xfa/0x110 [ 20.779683] Modules linked in: nfsv3 nfs_acl nfs lockd grace intel_rapl_msr intel_rapl_common intel_pmc_core intel_vsec pmt_telemetry pmt_class kvm_intel rfkill kvm snd_hda_codec_generic crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel snd_hda_intel snd_intel_dspcfg sha512_ssse3 sha512_generic snd_hda_codec sha256_ssse3 snd_hwdep sha1_ssse3 snd_hda_core sunrpc binfmt_misc snd_pcm aesni_intel qxl drm_ttm_helper ttm crypto_simd snd_timer cryptd rapl snd virtio_balloon virtio_console drm_kms_helper pcspkr soundcore button evdev joydev serio_raw drm loop fuse efi_pstore dm_mod configfs qemu_fw_cfg virtio_rng autofs4 ext4 crc32c_generic crc16 mbcache jbd2 virtio_net ata_generic net_failover virtio_blk failover uhci_hcd ata_piix ehci_hcd libata scsi_mod usbcore crc32c_intel i2c_piix4 virtio_pci virtio psmouse virtio_pci_legacy_dev virtio_pci_modern_dev virtio_ring floppy scsi_common usb_common [ 20.779825] CPU: 1 PID: 601 Comm: kworker/u13:1 Not tainted 6.9.0-rc1-next-20240328-amd64-00001-g756220c4615c #81 [ 20.779833] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [ 20.779837] Workqueue: ttm ttm_bo_delayed_delete [ttm] [ 20.779862] RIP: 0010:check_flush_dependency+0xfa/0x110 [ 20.779869] Code: ff ff 49 8b 55 18 48 8d 8b c0 00 00 00 49 89 e8 48 81 c6 c0 00 00 00 48 c7 c7 c0 16 44 8d c6 05 e7 75 b3 01 01 e8 86 97 fd ff <0f> 0b e9 21 ff ff ff 80 3d d5 75 b3 01 00 75 96 e9 4d ff ff ff 90 [ 20.779875] RSP: 0000:ffffb59600dd7cc8 EFLAGS: 00010082 [ 20.779880] RAX: 0000000000000000 RBX: ffff9af88104ee00 RCX: 0000000000000027 [ 20.779902] RDX: ffff9af8fdd21708 RSI: 0000000000000001 RDI: ffff9af8fdd21700 [ 20.779906] RBP: ffffffffc0882570 R08: 0000000000000000 R09: 0000000000000003 [ 20.779910] R10: ffffb59600dd7b58 R11: ffffffff8dcc83e8 R12: ffff9af894498000 [ 20.779914] R13: ffff9af89558d780 R14: ffffb59600dd7cf8 R15: 0000000000000001 [ 20.779918] FS: 0000000000000000(0000) GS:ffff9af8fdd00000(0000) knlGS:0000000000000000 [ 20.779924] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 20.779928] CR2: 00005574b0bd4148 CR3: 000000001fb40002 CR4: 0000000000370ef0 [ 20.779994] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 20.779999] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 20.780003] Call Trace: [ 20.780135] <TASK> [ 20.780144] ? __warn+0x7c/0x120 [ 20.780153] ? check_flush_dependency+0xfa/0x110 [ 20.780161] ? report_bug+0x160/0x190 [ 20.780169] ? prb_read_valid+0x17/0x20 [ 20.780179] ? handle_bug+0x41/0x70 [ 20.780186] ? exc_invalid_op+0x13/0x60 [ 20.780193] ? asm_exc_invalid_op+0x16/0x20 [ 20.780201] ? __pfx_qxl_gc_work+0x10/0x10 [qxl] [ 20.780221] ? check_flush_dependency+0xfa/0x110 [ 20.780228] ? check_flush_dependency+0xfa/0x110 [ 20.780234] __flush_work+0xce/0x2c0 [ 20.780244] qxl_queue_garbage_collect+0x7f/0x90 [qxl] [ 20.780268] qxl_fence_wait+0xa0/0x190 [qxl] [ 20.780287] dma_fence_wait_timeout+0x5e/0x130 [ 20.780313] dma_resv_wait_timeout+0x7b/0xe0 [ 20.780327] ttm_bo_delayed_delete+0x26/0x80 [ttm] [ 20.780359] process_one_work+0x184/0x3a0 [ 20.780370] worker_thread+0x273/0x390 [ 20.780379] ? __pfx_worker_thread+0x10/0x10 [ 20.780388] kthread+0xcb/0x100 [ 20.780396] ? __pfx_kthread+0x10/0x10 [ 20.780404] ret_from_fork+0x2d/0x50 [ 20.780416] ? __pfx_kthread+0x10/0x10 [ 20.780421] ret_from_fork_asm+0x1a/0x30 [ 20.780435] </TASK> [ 20.780437] ---[ end trace 0000000000000000 ]--- ``` TLDR: this patch fixes the instability issues. But there may be warnings in dmesg. Errors in dmesg were observed too but they are unrelated to this patch. Thank you for your time.
--- Alex Constantino (1): Revert "drm/qxl: simplify qxl_fence_wait" drivers/gpu/drm/qxl/qxl_release.c | 50 +++++++++++++++++++++++++++---- include/linux/dma-fence.h | 7 +++++ 2 files changed, 52 insertions(+), 5 deletions(-) base-commit: a6bd6c9333397f5a0e2667d4d82fef8c970108f2 -- 2.39.2
#1054514#68
Date:
2024-04-04 18:14:48 UTC
From:
To:
This reverts commit 5a838e5d5825c85556011478abde708251cc0776.

Changes from commit 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait") would
result in a '[TTM] Buffer eviction failed' exception whenever it reached a
timeout.
Due to a dependency to DMA_FENCE_WARN this also restores some code deleted
by commit d72277b6c37d ("dma-buf: nuke DMA_FENCE_TRACE macros v2").

Fixes: 5a838e5d5825 ("drm/qxl: simplify qxl_fence_wait")
Link: https://lore.kernel.org/regressions/ZTgydqRlK6WX_b29@eldamar.lan/
Reported-by: Timo Lindfors <timo.lindfors@iki.fi>
Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1054514
Signed-off-by: Alex Constantino <dreaming.about.electric.sheep@gmail.com>
---
 drivers/gpu/drm/qxl/qxl_release.c | 50 +++++++++++++++++++++++++++----
 include/linux/dma-fence.h         |  7 +++++
 2 files changed, 52 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 368d26da0d6a..9febc8b73f09 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -58,16 +58,56 @@ static long qxl_fence_wait(struct dma_fence *fence, bool intr,
 			   signed long timeout)
 {
 	struct qxl_device *qdev;
+	struct qxl_release *release;
+	int count = 0, sc = 0;
+	bool have_drawable_releases;
 	unsigned long cur, end = jiffies + timeout;

 	qdev = container_of(fence->lock, struct qxl_device, release_lock);
+	release = container_of(fence, struct qxl_release, base);
+	have_drawable_releases = release->type == QXL_RELEASE_DRAWABLE;

-	if (!wait_event_timeout(qdev->release_event,
-				(dma_fence_is_signaled(fence) ||
-				 (qxl_io_notify_oom(qdev), 0)),
-				timeout))
-		return 0;
+retry:
+	sc++;
+
+	if (dma_fence_is_signaled(fence))
+		goto signaled;
+
+	qxl_io_notify_oom(qdev);
+
+	for (count = 0; count < 11; count++) {
+		if (!qxl_queue_garbage_collect(qdev, true))
+			break;
+
+		if (dma_fence_is_signaled(fence))
+			goto signaled;
+	}
+
+	if (dma_fence_is_signaled(fence))
+		goto signaled;
+
+	if (have_drawable_releases || sc < 4) {
+		if (sc > 2)
+			/* back off */
+			usleep_range(500, 1000);
+
+		if (time_after(jiffies, end))
+			return 0;
+
+		if (have_drawable_releases && sc > 300) {
+			DMA_FENCE_WARN(fence,
+				       "failed to wait on release %llu after spincount %d\n",
+				       fence->context & ~0xf0000000, sc);
+			goto signaled;
+		}
+		goto retry;
+	}
+	/*
+	 * yeah, original sync_obj_wait gave up after 3 spins when
+	 * have_drawable_releases is not set.
+	 */

+signaled:
 	cur = jiffies;
 	if (time_after(cur, end))
 		return 0;
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index e06bad467f55..c3f9bb6602ba 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -682,4 +682,11 @@ static inline bool dma_fence_is_container(struct dma_fence *fence)
 	return dma_fence_is_array(fence) || dma_fence_is_chain(fence);
 }

+#define DMA_FENCE_WARN(f, fmt, args...) \
+	do {								\
+		struct dma_fence *__ff = (f);				\
+		pr_warn("f %llu#%llu: " fmt, __ff->context, __ff->seqno,\
+			 ##args);					\
+	} while (0)
+
 #endif /* __LINUX_DMA_FENCE_H */

#1054514#73
Date:
2024-04-05 04:37:31 UTC
From:
To:
Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You have marked a patch with a "Fixes:" tag for a commit that is in an
  older released kernel, yet you do not have a cc: stable line in the
  signed-off-by area at all, which means that the patch will not be
  applied to any older kernel releases.  To properly fix this, please
  follow the documented rules in the
  Documentation/process/stable-kernel-rules.rst file for how to resolve
  this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

#1054514#78
Date:
2024-04-05 13:13:04 UTC
From:
To:
Applied to misc/kernel.git (drm-misc-fixes).

Thanks!
Maxime

#1054514#85
Date:
2024-05-08 17:32:23 UTC
From:
To:
We believe that the bug you reported is fixed in the latest version of
linux, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 1054514@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Salvatore Bonaccorso <carnil@debian.org> (supplier of updated linux package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Fri, 03 May 2024 14:36:41 +0200
Source: linux
Architecture: source
Version: 6.1.90-1
Distribution: bookworm-security
Urgency: high
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Changed-By: Salvatore Bonaccorso <carnil@debian.org>
Closes: 1054514 1069092 1069102
Changes:
 linux (6.1.90-1) bookworm-security; urgency=high
 .
   * New upstream stable update:
https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.86
     - amdkfd: use calloc instead of kzalloc to avoid integer overflow
       (CVE-2024-26817)
     - wifi: ath9k: fix LNA selection in ath_ant_try_scan()
     - bnx2x: Fix firmware version string character counts
     - wifi: rtw89: pci: enlarge RX DMA buffer to consider size of RX descriptor
     - [x86] VMCI: Fix memcpy() run-time warning in dg_dispatch_as_host()
     - wifi: iwlwifi: pcie: Add the PCI device id for new hardware
     - panic: Flush kernel log buffer at the end
     - cpuidle: Avoid potential overflow in integer multiplication
     - [arm64] dts: rockchip: fix rk3328 hdmi ports node
     - [arm64] dts: rockchip: fix rk3399 hdmi ports node
     - ionic: set adminq irq affinity
     - net: skbuff: add overflow debug check to pull/push helpers
     - wifi: brcmfmac: Add DMI nvram filename quirk for ACEPC W5 Pro
     - pstore/zone: Add a null pointer check to the psz_kmsg_read
     - net: pcs: xpcs: Return EINVAL in the internal methods
     - dma-direct: Leak pages on dma_set_decrypted() failure
     - wifi: ath11k: decrease MHI channel buffer length to 8KB
     - cpufreq: Don't unregister cpufreq cooling on CPU hotplug
     - btrfs: handle chunk tree lookup error in btrfs_relocate_sys_chunks()
     - btrfs: export: handle invalid inode or root reference in
       btrfs_get_parent()
     - btrfs: send: handle path ref underflow in header iterate_inode_ref()
     - ice: use relative VSI index for VFs instead of PF VSI number
     - net/smc: reduce rtnl pressure in smc_pnet_create_pnetids_list()
     - Bluetooth: btintel: Fix null ptr deref in btintel_read_version
     - Bluetooth: btmtk: Add MODULE_FIRMWARE() for MT7922
     - [arm64,armhf] drm/vc4: don't check if plane->state->fb == state->fb
     - Input: synaptics-rmi4 - fail probing if memory allocation for "phys" fails
     - drm: panel-orientation-quirks: Add quirk for GPD Win Mini
     - pinctrl: renesas: checker: Limit cfg reg enum checks to provided IDs
     - sysv: don't call sb_bread() with pointers_lock held
     - scsi: lpfc: Fix possible memory leak in lpfc_rcv_padisc()
     - isofs: handle CDs with bad root inode but good Joliet root directory
     - ASoC: Intel: common: DMI remap for rebranded Intel NUC M15 (LAPRC710)
       laptops
     - rcu-tasks: Repair RCU Tasks Trace quiescence check
     - Julia Lawall reported this null pointer dereference, this should fix it.
     - media: sta2x11: fix irq handler cast
     - ALSA: firewire-lib: handle quirk to calculate payload quadlets as data
       block counter
     - ext4: add a hint for block bitmap corrupt state in mb_groups
     - ext4: forbid commit inconsistent quota data when errors=remount-ro
     - drm/amd/display: Fix nanosec stat overflow
     - drm/amd/amdgpu: Fix potential ioremap() memory leaks in
       amdgpu_device_init()
     - SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to
       unsigned int
     - Revert "ACPI: PM: Block ASUS B1400CEAE from suspend to idle by default"
     - libperf evlist: Avoid out-of-bounds access
     - input/touchscreen: imagis: Correct the maximum touch area value
     - block: prevent division by zero in blk_rq_stat_sum()
     - RDMA/cm: add timeout to cm_destroy_id wait
     - Input: imagis - use FIELD_GET where applicable
     - Input: allocate keycode for Display refresh rate toggle
     - platform/x86: touchscreen_dmi: Add an extra entry for a variant of the
       Chuwi Vi8 tablet
     - [x86] perf/x86/amd/lbr: Discard erroneous branch entries
     - ring-buffer: use READ_ONCE() to read cpu_buffer->commit_page in concurrent
       environment
     - bus: mhi: host: Add MHI_PM_SYS_ERR_FAIL state
     - usb: gadget: uvc: mark incomplete frames with UVC_STREAM_ERR
     - [x86] thunderbolt: Keep the domain powered when USB4 port is in redrive
       mode
     - usb: typec: tcpci: add generic tcpci fallback compatible
     - usb: sl811-hcd: only defined function checkdone if QUIRK2 is defined
     - thermal/of: Assume polling-delay(-passive) 0 when absent
     - ASoC: soc-core.c: Skip dummy codec when adding platforms
     - fbdev: viafb: fix typo in hw_bitblt_1 and hw_bitblt_2
     - io_uring: clear opcode specific data for an early failure
     - drivers/nvme: Add quirks for device 126f:2262
     - fbmon: prevent division by zero in fb_videomode_from_videomode()
     - netfilter: nf_tables: release batch on table validation from abort path
     - netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
       (CVE-2024-26925)
     - netfilter: nf_tables: discard table flag update with pending basechain
       deletion
     - gcc-plugins/stackleak: Avoid .head.text section
     - virtio: reenable config if freezing device failed
     - randomize_kstack: Improve entropy diffusion
     - [x86] platform/x86: intel-vbtn: Update tablet mode switch at end of probe
     - Bluetooth: btintel: Fixe build regression
     - net: mpls: error out if inner headers are not set
     - [x86] VMCI: Fix possible memcpy() run-time warning in
       vmci_datagram_invoke_guest_handler()
     - Revert "drm/amd/amdgpu: Fix potential ioremap() memory leaks in
       amdgpu_device_init()"
https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.87
     - smb3: fix Open files on server counter going negative
     - ata: libata-scsi: Fix ata_scsi_dev_rescan() error path
     - batman-adv: Avoid infinite loop trying to resize local TT
     - ring-buffer: Only update pages_touched when a new page is touched
     - Bluetooth: Fix memory leak in hci_req_sync_complete()
     - drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11
     - PM: s2idle: Make sure CPUs will wakeup directly on resume
     - media: cec: core: remove length check of Timer Status
     - Revert "drm/qxl: simplify qxl_fence_wait" (Closes: #1054514)
     - nouveau: fix function cast warning
     - scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()
     - scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
     - net: openvswitch: fix unwanted error log on timeout policy probing
     - u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one
       file
     - xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
     - geneve: fix header validation in geneve[6]_xmit_skb
     - bnxt_en: Reset PTP tx_avail after possible firmware reset
     - af_unix: Clear stale u->oob_skb.
     - ipv6: fib: hide unused 'pn' variable
     - ipv4/route: avoid unused-but-set-variable warning
     - ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
     - Bluetooth: SCO: Fix not validating setsockopt user input
     - Bluetooth: L2CAP: Fix not validating setsockopt user input
     - netfilter: complete validation of user input
     - net/mlx5: Properly link new fs rules into the tree
     - net/mlx5e: Fix mlx5e_priv_init() cleanup flow
     - net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
     - af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
     - af_unix: Fix garbage collector racing against connect() (CVE-2024-26923)
     - net: ena: Fix potential sign extension issue
     - net: ena: Wrong missing IO completions check order
     - net: ena: Fix incorrect descriptor free behavior
     - tracing: hide unused ftrace_event_id_fops
     - [amd64] iommu/vt-d: Allocate local memory for page request queue
     - btrfs: qgroup: correctly model root qgroup rsv in convert
     - btrfs: record delayed inode root in transaction
     - btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans
     - io_uring/net: restore msg_control on sendzc retry
     - kprobes: Fix possible use-after-free issue on kprobe registration
     - [x86] drm/i915/vrr: Disable VRR when using bigjoiner
     - drm/ast: Fix soft lockup
     - drm/client: Fully protect modes[] with dev->mode_config.mutex
     - vhost: Add smp_rmb() in vhost_vq_avail_empty()
     - vhost: Add smp_rmb() in vhost_enable_notify()
     - [x86] perf/x86: Fix out of range data
     - [x86] cpu: Actually turn off mitigations by default for
       SPECULATION_MITIGATIONS=n
     - [x86] apic: Force native_apic_mem_read() to use the MOV instruction
     - irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
     - [x86] bugs: Fix return type of spectre_bhi_state()
     - [x86] bugs: Fix BHI documentation
     - [x86] bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
     - [x86] bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'
     - [x86] bugs: Fix BHI handling of RRSBA
     - [x86] bugs: Clarify that syscall hardening isn't a BHI mitigation
     - [x86] bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
     - [x86] bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with
       CONFIG_MITIGATION_SPECTRE_BHI
     - [x86] drm/i915/cdclk: Fix CDCLK programming order when pipes are active
     - [x86] drm/i915: Disable port sync when bigjoiner is used
     - drm/amdgpu: Reset dGPU if suspend got aborted
     - drm/amdgpu: always force full reset for SOC21
     - drm/amd/display: fix disable otg wa logic in DCN316
https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.88
     - drm/vmwgfx: Enable DMA mappings with SEV
     - drm/amdgpu: fix incorrect active rb bitmap for gfx11
     - drm/amdgpu: fix incorrect number of active RBs for gfx11
     - drm/amd/display: Do not recursively call manual trigger programming
     - io_uring: Fix io_cqring_wait() not restoring sigmask on get_timespec64()
       failure
     - SUNRPC: Fix rpcgss_context trace event acceptor field
     - random: handle creditable entropy from atomic process context
     - net: usb: ax88179_178a: avoid writing the mac address before first reading
     - [x86] drm/i915/vma: Fix UAF on destroy against retire race
     - [x86] efi: Drop EFI stub .bss from .data section
     - [x86] efi: Disregard setup header of loaded image
     - [x86] efistub: Reinstate soft limit for initrd loading
     - [x86] efi: Drop alignment flags from PE section headers
     - [x86] boot: Remove the 'bugger off' message
     - [x86] boot: Omit compression buffer from PE/COFF image memory footprint
     - [x86] boot: Drop redundant code setting the root device
     - [x86] boot: Drop references to startup_64
     - [x86] boot: Grab kernel_info offset from zoffset header directly
     - [x86] boot: Set EFI handover offset directly in header asm
     - [x86] boot: Define setup size in linker script
     - [x86] boot: Derive file size from _edata symbol
     - [x86] boot: Construct PE/COFF .text section from assembler
     - [x86] boot: Drop PE/COFF .reloc section
     - [x86] boot: Split off PE/COFF .data section
     - [x86] boot: Increase section and file alignment to 4k/512
     - [x86] efistub: Use 1:1 file:memory mapping for PE/COFF .compat section
     - [x86] mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
     - [x86] head/64: Add missing __head annotation to startup_64_load_idt()
     - [x86] head/64: Move the __head definition to <asm/init.h>
     - [x86] sme: Move early SME kernel encryption handling into .head.text
     - [x86] sev: Move early startup code into .head.text section
     - [x86] efistub: Remap kernel text read-only before dropping NX attribute
     - netfilter: nf_tables: Fix potential data-race in __nft_expr_type_get()
     - netfilter: nf_tables: Fix potential data-race in __nft_obj_type_get()
     - netfilter: br_netfilter: skip conntrack input hook for promisc packets
     - netfilter: nft_set_pipapo: do not free live element (CVE-2024-26924)
     - netfilter: flowtable: validate pppoe header
     - netfilter: flowtable: incorrect pppoe tuple
     - af_unix: Call manage_oob() for every skb in unix_stream_read_generic().
     - af_unix: Don't peek OOB data without MSG_OOB.
     - net/mlx5: Lag, restore buckets number to default after hash LAG
       deactivation
     - net/mlx5e: Prevent deadlock while disabling aRFS
     - ice: tc: allow zero flags in parsing tc flower
     - tun: limit printing rate when illegal packet received by tun dev
     - [arm64] net: ethernet: ti: am65-cpsw-nuss: cleanup DMA Channels before
       using them
     - RDMA/rxe: Fix the problem "mutex_destroy missing"
     - RDMA/cm: Print the old state when cm_destroy_id gets timeout
     - RDMA/mlx5: Fix port number for counter query in multi-port configuration
     - [s390x] qdio: handle deferred cc1
     - [s390x] cio: fix race condition during online processing
     - drm: nv04: Fix out of bounds access
     - [armhf] omap2: n8x0: stop instantiating codec platform data
     - PCI: Avoid FLR for SolidRun SNET DPU rev 1
     - HID: kye: Sort kye devices
     - usb: pci-quirks: Reduce the length of a spinlock section in
       usb_amd_find_chipset_info()
     - PCI: Delay after FLR of Solidigm P44 Pro NVMe
     - [x86] quirks: Include linux/pnp.h for arch_pnpbios_disabled()
     - [x86] thunderbolt: Log function name of the called quirk
     - [x86] thunderbolt: Add debug log for link controller power quirk
     - PCI: Execute quirk_enable_clear_retrain_link() earlier
     - ALSA: scarlett2: Move USB IDs out from device_info struct
     - ALSA: scarlett2: Add support for Clarett 8Pre USB
     - ASoC: ti: Convert Pandora ASoC to GPIO descriptors
     - ALSA: scarlett2: Default mixer driver to enabled
     - ALSA: scarlett2: Add correct product series name to messages
     - ALSA: scarlett2: Add Focusrite Clarett+ 2Pre and 4Pre support
     - ALSA: scarlett2: Add Focusrite Clarett 2Pre and 4Pre USB support
     - PCI/DPC: Use FIELD_GET()
     - PCI: Simplify pcie_capability_clear_and_set_word() to ..._clear_word()
     - ALSA: scarlett2: Rename scarlett_gen2 to scarlett2
     - drm: panel-orientation-quirks: Add quirk for Lenovo Legion Go
     - usb: xhci: Add timeout argument in address_device USB HCD callback
     - usb: new quirk to reduce the SET_ADDRESS request timeout
     - clk: Remove prepare_lock hold assertion in __clk_release()
     - clk: Print an info line before disabling unused clocks
     - clk: Initialize struct clk_core kref earlier
     - clk: Get runtime PM before walking tree during disable_unused
     - clk: remove unnecessary (void*) conversions
     - clk: Show active consumers of clocks in debugfs
     - clk: Get runtime PM before walking tree for clk_summary
     - [x86] bugs: Fix BHI retpoline check
     - [x86] cpufeatures: Fix dependencies for GFNI, VAES, and VPCLMULQDQ
     - ALSA: hda/realtek - Enable audio jacks of Haier Boyue G42 with ALC269VC
     - [arm*] binder: check offset alignment in binder_get_object()
       (CVE-2024-26926)
     - [x86] thunderbolt: Avoid notify PM core about runtime PM resume
     - [x86] thunderbolt: Fix wake configurations after device unplug
     - [x86] comedi: vmk80xx: fix incomplete endpoint checking
     - [armhf] serial: stm32: Return IRQ_NONE in the ISR if no handling happend
     - [armhf] serial: stm32: Reset .throttled state in .startup()
     - USB: serial: option: add Fibocom FM135-GL variants
     - USB: serial: option: add support for Fibocom FM650/FG650
     - USB: serial: option: add Lonsung U8300/U9300 product
     - USB: serial: option: support Quectel EM060K sub-models
     - USB: serial: option: add Rolling RW101-GL and RW135-GL support
     - USB: serial: option: add Telit FN920C04 rmnet compositions
     - Revert "usb: cdc-wdm: close race between read and workqueue"
     - [arm64,armhf] usb: dwc2: host: Fix dereference issue in DDMA completion
       flow.
     - usb: Disable USB3 LPM at shutdown
     - usb: gadget: f_ncm: Fix UAF ncm object at re-bind after usb ep transport
       error
     - mei: me: disable RPL-S on SPS and IGN firmwares
     - speakup: Avoid crash on very long word
     - fs: sysfs: Fix reference leak in sysfs_break_active_protection()
     - [x86] KVM: x86: Snapshot if a vCPU's vendor model is AMD vs. Intel
       compatible
     - [x86] KVM: x86/pmu: Disable support for adaptive PEBS
     - [x86] KVM: x86/pmu: Do not mask LVTPC when handling a PMI on AMD platforms
     - [arm64] hibernate: Fix level3 translation fault in swsusp_save()
     - init/main.c: Fix potential static_command_line memory overflow
     - mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
     - drm/amdgpu: validate the parameters of bo mapping operations more clearly
       (CVE-2024-26922)
     - drm/vmwgfx: Sort primary plane formats by order of preference
     - drm/vmwgfx: Fix crtc's atomic check conditional
     - nouveau: fix instmem race condition around ptr stores
     - bootconfig: use memblock_free_late to free xbc memory to buddy
     - nilfs2: fix OOB in nilfs_set_de_type
     - net: dsa: introduce preferred_default_local_cpu_port and use on MT7530
     - ksmbd: fix slab-out-of-bounds in smb2_allocate_rsp_buf
     - ksmbd: validate request buffer size in smb2_allocate_rsp_buf()
     - ksmbd: clear RENAME_NOREPLACE before calling vfs_rename
     - ksmbd: common: use struct_group_attr instead of struct_group for
       network_open_info
     - PCI/ASPM: Fix deadlock when enabling ASPM (CVE-2024-26605)
https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.89
     - Revert "ASoC: ti: Convert Pandora ASoC to GPIO descriptors"
https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.90
     - smb: client: fix rename(2) regression against samba
     - cifs: reinstate original behavior again for forceuid/forcegid
     - [amd64] HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized
       dev->devc
     - HID: logitech-dj: allow mice to use all types of reports
     - wifi: iwlwifi: mvm: remove old PASN station when adding a new one
     - wifi: iwlwifi: mvm: return uid from iwl_mvm_build_scan_cmd
     - vxlan: drop packets from invalid src-address
     - icmp: prevent possible NULL dereferences from icmp_build_probe()
     - bridge/br_netlink.c: no need to return void function
     - bnxt_en: refactor reset close code
     - bnxt_en: Fix the PCI-AER routines
     - NFC: trf7970a: disable all regulators on removal
     - ax25: Fix netdev refcount issue
     - net: make SK_MEMORY_PCPU_RESERV tunable
     - net: fix sk_memory_allocated_{add|sub} vs softirqs
     - ipv4: check for NULL idev in ip_route_use_hint()
     - net: usb: ax88179_178a: stop lying about skb->truesize
     - net: gtp: Fix Use-After-Free in gtp_dellink
     - Bluetooth: MGMT: Fix failing to MGMT_OP_ADD_UUID/MGMT_OP_REMOVE_UUID
     - Bluetooth: hci_sync: Using hci_cmd_sync_submit when removing Adv Monitor
     - Bluetooth: qca: set power_ctrl_enabled on NULL returned by
       gpiod_get_optional()
     - ipvs: Fix checksumming on GSO of SCTP packets
     - net: openvswitch: Fix Use-After-Free in ovs_ct_exit
     - eth: bnxt: fix counting packets discarded due to OOM and netpoll
     - netfilter: nf_tables: honor table dormant flag from netdev release event
       path
     - i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
     - i40e: Report MFS in decimal base instead of hex
     - iavf: Fix TC config comparison with existing adapter TC config
     - net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
     - af_unix: Suppress false-positive lockdep splat for spin_lock() in
       __unix_gc().
     - cifs: Replace remaining 1-element arrays (Closes: #1069102, #1069092)
     - Revert "crypto: api - Disallow identical driver names"
     - virtio_net: Do not send RSS key if it is not supported
     - fork: defer linking file vma until vma is fully initialized
       (CVE-2024-27022)
     - [x86] cpu: Fix check for RDPKRU in __show_regs()
     - Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()
     - Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853
     - Bluetooth: qca: fix NULL-deref on non-serdev suspend
     - [arm64] mmc: sdhci-msm: pervent access to suspended controller
     - smb: client: Fix struct_group() usage in __packed structs
     - smb3: fix lock ordering potential deadlock in cifs_sync_mid_result
     - HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up
     - btrfs: fix information leak in btrfs_ioctl_logical_to_ino()
     - cpu: Re-enable CPU mitigations by default for !X86 architectures
     - drm/amdgpu/sdma5.2: use legacy HDP flush for SDMA2/3
     - drm/amdgpu: Fix leak when GPU memory allocation fails
     - irqchip/gic-v3-its: Prevent double free on error
     - ACPI: CPPC: Use access_width over bit_width for system memory accesses
     - ACPI: CPPC: Fix bit_offset shift in MASK_VAL() macro
     - ACPI: CPPC: Fix access width used for PCC registers
     - ethernet: Add helper for assigning packet type when dest address does not
       match device address
     - net: b44: set pause params only when interface is up
     - stackdepot: respect __GFP_NOLOCKDEP allocation flag
     - fbdev: fix incorrect address computation in deferred IO
     - udp: preserve the connected status if only UDP cmsg
     - mtd: diskonchip: work around ubsan link failure
     - [x86] tdx: Preserve shared bit on mprotect()
     - [x86] idma64: Don't try to serve interrupts when device is powered off
     - [arm64,armhf] phy: marvell: a3700-comphy: Fix out of bounds read
     - [arm64,armhf] phy: marvell: a3700-comphy: Fix hardcoded array size
     - [arm64] phy: rockchip-snps-pcie3: fix bifurcation on rk3588
     - [arm64] phy: rockchip-snps-pcie3: fix clearing PHP_GRF_PCIESEL_CON bits
     - [amd64] dmaengine: idxd: Fix oops during rmmod on single-CPU platforms
     - i2c: smbus: fix NULL function pointer dereference
     - bounds: Use the right number of bits for power-of-two CONFIG_NR_CPUS
     - macsec: Enable devices to advertise whether they update sk_buff md_dst
       during offloads
     - macsec: Detect if Rx skb is macsec-related for offloading devices that
       update md_dst
     - net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for
       MACsec
 .
   [ Salvatore Bonaccorso ]
   * Bump ABI to 21
   * drivers/tty: Disable N_GSM
   * tipc: fix UAF in error path
   * tipc: fix a possible memleak in tipc_buf_append
Checksums-Sha1:
 b0843098f0c86c5cfeafda753e9fecbe93b81690 290924 linux_6.1.90-1.dsc
 954ea89e05e9279ad27d87884727059eaa1d3b89 137614748 linux_6.1.90.orig.tar.xz
 4ea845f0a5ed6df30a0f647e2f10a0ccd90da022 1631148 linux_6.1.90-1.debian.tar.xz
 0244b34ba383440c4434043035f6ddd620a306e5 7060 linux_6.1.90-1_source.buildinfo
Checksums-Sha256:
 36c8871d04a1ba1de4486be74df3f256f33b036e4237a0aa7da26b2d42f9ea36 290924 linux_6.1.90-1.dsc
 74d8a50f82232eea2e4f3c017c307a4eee0bea10f0727aa3ef1cb866034f44ba 137614748 linux_6.1.90.orig.tar.xz
 0fd764e593ded94abcd6fb44c0c5d6f8c23834f74caf96d5f24967b387af20ae 1631148 linux_6.1.90-1.debian.tar.xz
 92e0a44db413ea0f92de00f26d9945d6c059ae61e3c4a2bc58a5ed7a9cbe2c54 7060 linux_6.1.90-1_source.buildinfo
Files:
 74914f280e2f5451ef320eef2449bd10 290924 kernel optional linux_6.1.90-1.dsc
 9fa5818eef2eca61a4af0f302f177c2b 137614748 kernel optional linux_6.1.90.orig.tar.xz
 58a54a4242b9fffb9878e8dc6c7acf17 1631148 kernel optional linux_6.1.90-1.debian.tar.xz
 32b848510fe7ee229502e9135bbd3d6f 7060 kernel optional linux_6.1.90-1_source.buildinfo
-----BEGIN PGP SIGNATURE-----

iQKmBAEBCgCQFiEERkRAmAjBceBVMd3uBUy48xNDz0QFAmY02tRfFIAAAAAALgAo
aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldDQ2
NDQ0MDk4MDhDMTcxRTA1NTMxRERFRTA1NENCOEYzMTM0M0NGNDQSHGNhcm5pbEBk
ZWJpYW4ub3JnAAoJEAVMuPMTQ89EauAP/3ezk2Wlm15i6yYSxC3AYuMOx3LhIi6x
lhAmfU4FC8FWQZtBni2ILSQY+WPCzsGY7Qwkxfm8ec4Rx8maIZm2u5fbqpwMa5zk
ooH6b8d+r4tmVbMDXISptDJqOYT+sv+i+hqmYifYZVOOBqbpPCs/5QGvZSMJ5OSb
/DQVV6wkXyenQwMzSDhbyjLm9bO3JMpL2MYZl5y+q81yuUxYxYjAus5gIhGA2H0H
uqeXHvd1ZEtvjVd+skYUkGwzRHopFUKoael5nv7rKlfhA/Kabb1OmUxkKKcqB74a
qTjvNnKwV0A2fr/s0NXvQFa3sZRfZevL5mB7rEwzpblI77bqA3vYG01SlRc6eL+D
JelPo9vPGs/OY2DHGE4JSdwsxW/3ozqqWeEAm+yKU7XZSIyEHrBeWKCNNH2ZugbJ
I9WNDoBOf5d0yC8nRvhztjF1dGqLxUs3XZNktdedSYtlVEEDCvKNNwpm7Lz+8oCb
QPjbhro7ILvG13IA2+uuero7Arhmat4FvKyjGH8+5dmPX8MPbCDxz66YdqY5o7cX
DnDictvjKUqktmn2tAJW0GY/XokoIPNW6BmK9Ci8vdLtIaLNTB8VeVFQsEHvGt5u
Gvrj0jjy79Oievj1kfeLltLqtyvb01cHUyFJjmcykFKHaYmTvw9IJQ/d7+xYYBtb
io9oy8WJ1jUe
=23LK
-----END PGP SIGNATURE-----

#1054514#90
Date:
2025-12-09 18:33:24 UTC
From:
To:
Dear Maintainer,

This is back in trixie --- unsurprisingly, because the kernel commit which
had originally caused this bug, and was subsequently reverted, was reapplied
to the kernel in 6.8.10:

https://www.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.8.10
   (Reapply "drm/qxl: simplify qxl_fence_wait")

In addition to making the remote graphics freeze (which is annoying but
tolerable), the graphics driver hang also appears to cause systemd tasks
to hang... which leads to problems with remote login via ssh, and also
the inability to do a clean reboot/shutdown (which is intolerable).

Sample of the kernel log for hung systemd task (bookended by the telltale
"Buffer eviction failed" every 15 seconds):

[413120.319715] INFO: task systemd:1 blocked for more than 1208 seconds.
[413120.319718]       Not tainted 6.12.57+deb13-amd64 #1 Debian 6.12.57-1
[413120.319719] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[413120.319720] task:systemd         state:D stack:0     pid:1     tgid:1     ppid:0      flags:0x00000002
[413120.319723] Call Trace:
[413120.319724]  <TASK>
[413120.319726]  __schedule+0x505/0xc00
[413120.319730]  schedule+0x27/0xf0
[413120.319732]  schedule_preempt_disabled+0x15/0x30
[413120.319733]  __ww_mutex_lock.constprop.0+0x4f6/0x9a0
[413120.319736]  drm_modeset_lock+0x42/0xe0 [drm]
[413120.319765]  drm_atomic_get_plane_state+0x7f/0x180 [drm]
[413120.319777]  drm_client_modeset_commit_atomic+0xbf/0x250 [drm]
[413120.319793]  ? do_sys_poll+0x4e1/0x600
[413120.319796]  drm_client_modeset_commit_locked+0x5a/0x160 [drm]
[413120.319810]  drm_fb_helper_pan_display+0xf2/0x240 [drm_kms_helper]
[413120.319819]  fb_pan_display+0x89/0x140
[413120.319821]  bit_update_start+0x1e/0x40
[413120.319822]  fbcon_switch+0x421/0x5b0
[413120.319825]  csi_J+0x286/0x2d0
[413120.319826]  do_con_write+0x1368/0x2440
[413120.319828]  ? tomoyo_path_number_perm+0x8c/0x1f0
[413120.319830]  ? _prb_read_valid+0x298/0x310
[413120.319832]  con_write+0x13/0x50
[413120.319833]  n_tty_write+0x15a/0x500
[413120.319835]  ? __pfx_woken_wake_function+0x10/0x10
[413120.319837]  file_tty_write.isra.0+0x172/0x2c0
[413120.319840]  vfs_write+0x28c/0x440
[413120.319842]  ksys_write+0x6d/0xf0
[413120.319844]  do_syscall_64+0x82/0x190
[413120.319845]  ? tty_ioctl+0x145/0x8a0
[413120.319847]  ? arch_exit_to_user_mode_prepare.isra.0+0x16/0xa0
[413120.319850]  ? syscall_exit_to_user_mode+0x37/0x1b0
[413120.319851]  ? do_syscall_64+0x8e/0x190
[413120.319852]  ? do_sys_openat2+0x9c/0xe0
[413120.319853]  ? do_syscall_64+0x8e/0x190
[413120.319854]  ? arch_exit_to_user_mode_prepare.isra.0+0x16/0xa0
[413120.319856]  ? syscall_exit_to_user_mode+0x37/0x1b0
[413120.319857]  ? do_syscall_64+0x8e/0x190
[413120.319858]  ? __x64_sys_ppoll+0xf4/0x160
[413120.319859]  ? arch_exit_to_user_mode_prepare.isra.0+0x16/0xa0
[413120.319861]  ? syscall_exit_to_user_mode+0x37/0x1b0
[413120.319862]  ? do_syscall_64+0x8e/0x190
[413120.319862]  ? clear_bhb_loop+0x40/0x90
[413120.319864]  ? clear_bhb_loop+0x40/0x90
[413120.319865]  ? clear_bhb_loop+0x40/0x90
[413120.319866]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[413120.319869] RIP: 0033:0x7f38ba499687
[413120.319878] RSP: 002b:00007ffd7e321560 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
[413120.319879] RAX: ffffffffffffffda RBX: 00007f38baa3ee00 RCX: 00007f38ba499687
[413120.319880] RDX: 000000000000000c RSI: 00007f38ba927cc4 RDI: 0000000000000025
[413120.319881] RBP: 0000000000000025 R08: 0000000000000000 R09: 0000000000000000
[413120.319881] R10: 0000000000000000 R11: 0000000000000202 R12: 00000000000186a0
[413120.319882] R13: 0000005fe7e4b3fb R14: 000000000000000c R15: 0000000000000025
[413120.319883]  </TASK>
[413120.319884] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
[413123.903768] [TTM] Buffer eviction failed
[413139.007582] [TTM] Buffer eviction failed
[413154.111473] [TTM] Buffer eviction failed
[413169.215198] [TTM] Buffer eviction failed
[413184.319084] [TTM] Buffer eviction failed
[413199.422894] [TTM] Buffer eviction failed
[413214.526697] [TTM] Buffer eviction failed

#1054514#95
Date:
2025-12-09 21:24:44 UTC
From:
To:
Hi Matt,

I guess it is very unlikely that the commit get reverted again, given
the reasons explained in the commit message on the re-apply.

But can you please confirm that reverting the commit on top of 6.12.y
fixes the issue again for you?

Note that it has even be re-applied to 6.1.y, actually form 6.9 back
to 6.8.10, 6.6.31, 6.1.91 and 5.15.159.

Is the issue still as well triggerable in 6.17.11 and mainline?

Regards,
Salvatore

#1054514#102
Date:
2025-12-10 02:06:35 UTC
From:
To:
This is also happening to me in Debian Trixie, with the current
6.12.57 kernel. I can also test reverting the commit in the near
future if someone else doesn't get to it first. For now I'll just use
virtio-gpu, though it is a little slower.

#1054514#107
Date:
2025-12-10 12:36:23 UTC
From:
To:
Indeed. The reverst of the problematic commit was reverted in mainline
kernel. I have spent several days trying to understand the problem but
unfortunately I don't have enough time to troubleshoot this more.

I tried switching to virtio but for my use case of remote Linux desktops
it is not usable. If I scroll a fullscreen web page I can see how the
whole screen is redrawn every time,

Meanwhile I've switched from SPICE to RDP. GNOME in trixie supports
headless RDP sessions that make this very convenient. Client support is
also much better for RDP. I solved the authentication problem by isolating
each VM from each other on network level and then deployed rdpgw
(https://github.com/bolkedebruin/rdpgw) in front of everything with OpenID
authentication.