#892982 bugs.debian.org: system freeze after lock screen/switched off moniturs/spontaneously

Package:
xserver-xorg-video-amdgpu
Source:
xserver-xorg-video-amdgpu
Description:
X.Org X server -- AMDGPU display driver
Submitter:
Hermann Lorenz
Date:
2021-06-19 21:33:02 UTC
Severity:
important
Tags:
#892982#5
Date:
2018-03-15 06:43:27 UTC
From:
To:
Dear Maintainers,

because the kernel logs show clock errors for amdgpu and segfaults in the gnome
shell, I'm not sure which package I have to refer to.  Please feel free to ask
for other log files or hardware configurations that might be of interest to
you.


First Scenario
==============

I observed several freezes of my system in October/November.  They occured,
when
 * the screen was locked or
 * when the monitors were switched off.
The screen froze not every time, but in the majority of the times I did this.
Over the last months I "solved" this issue by not locking the screen or turning
off the monitors while the system was running.  During this time I did not
observe any freezes like this.  Yesterday I switched off the monitors and when
I returned, I observed a freezing of the system again.

The follwing actions did not seem to have any effect:

 * on mouse movement, the cursor does not move
 * Ctrl+Shift+F3 doesn't switch to a terminal
 * Ctrl+Shift+Del holding for 10+ seconds does not reboot.

I missed to do press Alt+Print+K, next time I will test this too.

The only solution to recover is to press the power button on the tower for
several seconds and to reboot the system.


Here a short excerpt from /var/log/kern.log from yesterday:

    Mar 14 10:19:24 mundus kernel: [ 5483.172863]
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery tried 5
times
    Mar 14 10:19:24 mundus kernel: [ 5483.172879]
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
    Mar 14 12:44:50 mundus kernel: [14209.555905] gnome-shell[1254]: segfault
at 38 ip 00007ff9ec3f8f70 sp 00007ffe1cf6aaf8 error 4 in
libmutter-1.so.0.0.0[7ff9ec398000+150000]
    Mar 14 12:44:51 mundus kernel: [14209.583537] rfkill: input handler enabled
    Mar 14 12:44:51 mundus kernel: [14209.755875]
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status
failed
    Mar 14 12:44:51 mundus kernel: [14209.755896]
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
    Mar 14 12:44:51 mundus kernel: [14209.932953]
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status
failed
    Mar 14 12:44:51 mundus kernel: [14209.932973]
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
    Mar 14 12:44:51 mundus kernel: [14210.120708]
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status
failed
    Mar 14 12:44:51 mundus kernel: [14210.120728]
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
    Mar 14 12:44:51 mundus kernel: [14210.297755]
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* displayport link status
failed
    Mar 14 12:44:51 mundus kernel: [14210.297774]
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
    Mar 14 12:53:19 mundus kernel: [14718.348671] rfkill: input handler
disabled
    Mar 14 13:43:18 mundus kernel: [17716.735394] gnome-shell[14938]: segfault
at 55ba0af83c18 ip 00007fb6aebe2676 sp 00007ffda3cf9020 error 4 in
libst-1.0.so[7fb6aebc2000+4d000]
    Mar 14 13:43:18 mundus kernel: [17716.783243] rfkill: input handler enabled
    Mar 14 13:43:25 mundus kernel: [17723.744111] rfkill: input handler
disabled




Second Scenario
===============

Because in this second scenario the behaviour is different, I was not sure if
it is relevant to you.  But since it also results in a reset of the user
session, I wanted to mention it.

The screen freezes during normal usage at random.  This happens every two or
three days.  After let's say 30 seconds the log in screen is shown and a new
session is started.  This also occured during the phase when I did not lock the
screen.

Again a short excerpt from /var/log/kern.log regarding the second scenario:

    Mar 15 05:35:32 mundus kernel: [ 2026.148912] show_signal_msg: 17 callbacks
suppressed
    Mar 15 05:35:32 mundus kernel: [ 2026.148914] gnome-shell[1257]: segfault
at 562055805a68 ip 00007fbae1b7b676 sp 00007fff83108b90 error 4 in
libst-1.0.so[7fbae1b5b000+4d000]
    Mar 15 05:35:32 mundus kernel: [ 2026.195608] rfkill: input handler enabled
    Mar 15 05:35:39 mundus kernel: [ 2033.100377] rfkill: input handler
disabled

In this specific case I was moving a window via Super+Shift+Left from one
monitor to another when the GUI crashed.  After logging in I was able to use
the system as normal.



Kind Regards
  Hermann

#892982#10
Date:
2018-03-15 06:55:15 UTC
From:
To:
Hello,

I'm sorry,  it seems I have mixed up the receiver of this bug report.
Is it possible to move this bug to the correct list?  Or should I
recreate it?


Kind Regards
  Hermann

#892982#15
Date:
2018-03-15 16:49:23 UTC
From:
To:
Control: reassign -1 xserver-xorg-video-amdgpu

I'd suspect a bug in the amdgpu drivers, but it's hard to say.

Your best bet is to see if a newer kernel still displays these issues
and/or plug a serial terminal into the machine that you can use to get
kernel messages at the moment of the crash. [The actual crash usually
can't be written to the on-disk logs, because the kernel has already
failed at that point.]

In any event, the appropriate package is not bugs.debian.org, but
something else. I've reassigned it to the xserver-xorg-video-amdgpu
package, but it may actually be a kernel bug. In the future, if you
aren't sure, ask on debian-user@lists.debian.org for assistance, and
they'll help you work through figuring out which package is the right
package.

#892982#22
Date:
2018-03-15 17:02:00 UTC
From:
To:
Hello Don,

sorry for the inconvenience you had and thank you for the reassignment.
 This was the first "bug" I reported and I was a bit unsure regarding
the questions in the GUI.  I vow to write to debian-user@lists.debian.o
rg next time, when I'm not sure.


Kind Regards
  Hermann

#892982#27
Date:
2018-03-15 18:01:32 UTC
From:
To:
The gnome-shell crash is a gnome-shell bug. The amdgpu messages might be
symptoms of an amdgpu kernel driver bug, or might be harmless. I don't
see any evidence of an xserver-xorg-video-amdgpu bug (or even that it or
Xorg is used at all).

#892982#32
Date:
2021-05-30 09:08:57 UTC
From:
To:
I have had first-hand experience of these Xorg crashes and lockups
in combination with either sleep or un-powering monitors.

I would 100% agree with Hermann that this is highly likely to be
improved with kernel update.

In short, I would go to *at least* the buster-backports kernel i.e.

https://backports.debian.org/Instructions/

apt-get -t buster-backports install linux-image-amd64 linux-headers-amd64


Over the recent 5.10 debian kernel provisions, I have seen incremental
improvement in this issue, to the point that now this seems to have
gone away.  I still experience complex MST monitor arrangement
reordering the displayport- virtual numbers (fixed with xrandr script)
but NO LONGER crashes needing reboot or restart-X11 like before...

In any case, please report back how 5.10 series kernel does/does-not
avoid the issue occuring for you.

#892982#37
Date:
2021-06-19 21:23:36 UTC
From:
To:
I can confirm the buster-backports now offers kernel
5.10.0-0.bpo.7-amd64  ..

Please test this following notes on previous bug pointer.
Can the bug now be closed, if that works, or doesn't  -- I don't
think there is an X.org bug here.