#956121 Blank screen on 2nd display using wayland

Package:
mutter
Source:
mutter
Description:
Example window manager using GNOME's window manager library
Submitter:
Pierre Cheynier
Date:
2021-09-22 16:15:03 UTC
Severity:
important
#956121#5
Date:
2020-04-07 15:43:25 UTC
From:
To:
Dear Maintainer,

When upgrading mutter from 3.34.3-1 to 3.34.4-1 (testing), my second
screen (eDP / 3440x1440) is completely white, the second one show
glitches and freezes during a few tens sec. probably while trying to
sync the other one before aborting.
Downgrading to 3.34.3-1 + restart gdm3 fixes the issue.

HW setup: i915

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core
processor Graphics Controller (rev 09)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107GLM [Quadro
K2000M] (rev a1)

Not able to see something specific neither here:
https://gitlab.gnome.org/GNOME/mutter/-/compare/3.34.4...3.34.5#0cc1139e3347f573ae1feee5b73dbc8a8a21fcfa
nor there:
https://salsa.debian.org/gnome-team/mutter/-/commits/upstream/3.34.x
(and BTW most 3.34.5 patches have been backported).

Anything relevant that would help troubleshooting the issue?

#956121#10
Date:
2020-04-07 19:27:18 UTC
From:
To:
Is this reproducible by upgrading/downgrading only libmutter-5-0 and
closely-related packages, without altering anything else?

Are you able to test this with mutter and gnome-shell 3.36.x from
experimental? We're close to uploading those to unstable.

You appear to have a dual-GPU system. Is the NVIDIA device completely
disabled, or are they both active via some sort of dual-GPU arrangement
like Bumblebee?

Do you have the proprietary NVIDIA drivers installed? (I'm guessing you
don't, because reportbug would have told us about the non-free module.)

systemd journal entries during GNOME Shell startup and around the time
you reproduce a bug are usually useful, particularly for hardware-related
things like this.

(Or during mutter startup and around the time you reproduce a bug, if
you are genuinely using the standalone mutter executable - which is not
really recommended, but should in principle work.)

Looking at the commits between 3.34.3 and 3.34.4,
"renderer-native: Fix memory leak in secondary GPU update" and
"kms-impl-simple: Handle mode set race conditions gracefully" look
potentially relevant as things that might have caused a regression,
but I'm just guessing really.

    smcv

#956121#15
Date:
2020-04-10 11:21:41 UTC
From:
To:
Le mar. 7 avr. 2020 à 21:27, Simon McVittie <smcv@debian.org> a écrit :

Sorry for being that late answering!

Yes absolutely, knowing that upgrading libmutter-5-0 also upgrade, by
dependency:
gir1.2-mutter-5 libmutter-5-0
(i.e. the only one not up-to-date here being mutter package itself).

Not right now, since I almost suceeded having an almost-operational
working environment :), but I can provide you some feedback soon (let
say 1 week?)

Yes exactly, nvidia chip not-used, no proprietary, full nouveau. These
setups are terrible.

First, just to be clear on the symptoms, my second screen is fully
white, but I'm still able to move my mouse pointer and see it on this
area ..

Let me tell you that it's a mess if we consider all graphics +
gnome-shell + plugins related logs :)

All the logs seems to say that everything is living its life normaly
(gnome-* related logs), except on the kernel aspects, I noticed that I
observe this during a "working" boot:
```
avril 10 11:47:39 dell-m4700 kernel: DMAR: DRHD: handling fault status reg 2
avril 10 11:47:39 dell-m4700 kernel: DMAR: [DMA Read] Request device
[00:1f.2] PASID ffffffff fault addr c9827000 [fault reason 06] PTE
Read access is not set
```

But when having this non-working setup, I additionaly got this each
5s, following main screen blink, probably some kind of
reinitialization (let say it's a side-effect?):
```
avril 10 11:46:25 dell-m4700 kernel: dmar_fault: 191660 callbacks suppressed
avril 10 11:46:25 dell-m4700 kernel: DMAR: DRHD: handling fault status reg 3
avril 10 11:46:25 dell-m4700 kernel: DMAR: [DMA Read] Request device
[01:00.0] PASID ffffffff fault addr fc01e000 [fault reason 06] PTE
Read access is not set
avril 10 11:46:25 dell-m4700 kernel: DMAR: DRHD: handling fault status reg 3
avril 10 11:46:25 dell-m4700 kernel: DMAR: [DMA Read] Request device
[01:00.0] PASID ffffffff fault addr fc045000 [fault reason 06] PTE
Read access is not set
avril 10 11:46:25 dell-m4700 kernel: DMAR: DRHD: handling fault status reg 3
avril 10 11:46:25 dell-m4700 kernel: DMAR: [DMA Read] Request device
[01:00.0] PASID ffffffff fault addr fc06a000 [fault reason 06] PTE
Read access is not set
avril 10 11:46:25 dell-m4700 kernel: DMAR: DRHD: handling fault status reg 3
```

After each of those crashes, I also see a
```
avril 10 11:46:26 dell-m4700 systemd[1782]: Started Application
launched by gnome-shell.
avril 10 11:46:27 dell-m4700 systemd[1782]:
gnome-launched-gnome-background-panel.desktop-2515.scope: Succeeded.
```

Except from that gnome-shell and friends seems OK-ish.

Anything to add more verbosity on the mutter specifics?

I don't, here I stick on debian defaults with gnome.

#956121#20
Date:
2020-04-10 12:42:38 UTC
From:
To:
That's fine. Everything from the src:mutter source package is tied
together quite closely and upgrading them all as a batch is expected,
although whether you change the mutter binary package or not shouldn't
matter (it doesn't actually do very much).

So we definitely have:

src:mutter version 3.34.3-1: good
src:mutter version 3.34.4-1: not good

although this might be either a bug in mutter, or just mutter doing
something differently that triggers a lower-level issue.

3.36.x should be in unstable, or even in testing, by then. You might
want to put your working 3.34.3 setup on "hold" in apt/aptitude for now.

You didn't answer this. If the NVIDIA device is disabled, how did you
disable it?

Yes, dual-GPU seems to be really hard to get right. The graphics stack is
already a complicated dance involving the kernel, libdrm, Mesa, libmutter
and GNOME Shell; having two GPUs involved just makes that worse.

Yes it is, but it might be helpful to show us anyway.

If you're using GNOME Shell extensions, please test with them all disabled
and see whether that works (you can re-enable them afterwards).

If you run the lspci command (install pciutils if you don't already have
it), what device is 00:1f.2?

That means part of the kernel tried to log 191660 lines, and the logging
subsystem wouldn't let it. That's a lot of logging!

Similarly, what device is 01:00.0? It's on its own separate PCIE bus if I
understand correctly, which probably means it's either your Intel GPU or
NVIDIA GPU.

Searching for the log messages suggests that booting with

    intel_iommu=igfx_off

or

    intel_iommu=off

added to the kernel command line might help
(see also <https://bugs.debian.org/935270>, which was also on a Dell).

    smcv

#956121#25
Date:
2020-04-12 16:08:00 UTC
From:
To:
Le ven. 10 avr. 2020 à 14:42, Simon McVittie <smcv@debian.org> a écrit :

Seeing that you released to unstable 2 days ago, but doing an apt-get
update doesn't change anything for me.
Should I conclude that my mirror is late? (seeing the package when
browing it though...
http://ftp.fr.debian.org/debian/pool/main/m/mutter/)
```
$ date
dimanche 12 avril 2020, 15:28:36 (UTC+0200)
$ sudo apt-get update | grep sid
Atteint :4 http://ftp.fr.debian.org/debian sid InRelease
$ apt-cache policy mutter
mutter:
  Installé : 3.34.3-1
  Candidat : 3.34.4-1
 Table de version :
     3.34.4-1 500
        500 http://ftp.fr.debian.org/debian bullseye/main amd64 Packages
          2 http://ftp.fr.debian.org/debian sid/main amd64 Packages
 *** 3.34.3-1 100
        100 /var/lib/dpkg/status
```

Yes, actually I just didn't do any bumblebee setup of whatever related
to nvidia drivers, but you're right that it doesn't mean my chip is
disabled.
Unfortunately, I just figured out that I cannot disable the chip since
the wiring is the following:
```
$ sudo ls -l /sys/class/drm/card*
lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card0 ->
../../devices/pci0000:00/0000:00:02.0/drm/card0
lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card0-LVDS-1 ->
../../devices/pci0000:00/0000:00:02.0/drm/card0/card0-LVDS-1
lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card0-VGA-1 ->
../../devices/pci0000:00/0000:00:02.0/drm/card0/card0-VGA-1
lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card1 ->
../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1
lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card1-DP-1 ->
../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-1
lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card1-DP-2 ->
../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-2
lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card1-DP-3 ->
../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-3
```
So in my case, LVDS-1 and DP-1 connected.

Just did an apt-get upgrade + gdm3 stop/start > bug > gdm3 stop +
dpkg-i <old_packages>.
You'll find in attachment the full journald logs (between the gdm3
start and stop), concretely if I clean the thing a bit (remove pids,
etc.) and do a diff, I see nothing obvious.. (I mean, there are a lot
of errors, but present in both cases).

Disabled everything, same behavior.

OK, it seems unrelated (and older anyway).
```
$ lspci | grep 00:1f.2
00:1f.2 RAID bus controller: Intel Corporation 82801 Mobile SATA
Controller [RAID mode] (rev 04)
```

True, the NVIDIA GPU.
```
$ lspci | grep 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation GK107GLM [Quadro
K2000M] (rev a1)
```

Tried these, didn't change anything.

#956121#30
Date:
2020-04-18 17:46:35 UTC
From:
To:
Hello Simon, list,

I just gave a try to bump to mutter=3.36.1-4 and it seems it doesn't
change anything for me, my screen is still completely white, so
something bad appeared between 3.34.3 and 3.34.4.
Except from that I completely broke my install at trying this upgrade
(since it implies to bump gnome-shell, got a crash screen "contact
your administrator" instead of login screen, I finally ended up
identifying I had to downgrade another package:
gir1.2-gnomedesktop-3.0).

My current situation if I dry-run a dist-upgrade is the following:

$ sudo apt-get -V dist-upgrade --assume-no
(...)
Les NOUVEAUX paquets suivants seront installés :
   gir1.2-mutter-6 (3.36.1-4)
   libmutter-6-0 (3.36.1-4)
Les paquets suivants seront mis à jour :
   gir1.2-gnomedesktop-3.0 (3.34.2-2 => 3.36.1-2)
   gir1.2-mutter-5 (3.34.3-1 => 3.34.4-1)
   gnome-shell (3.34.4-1 => 3.36.1-5)
   gnome-shell-common (3.34.4-1 => 3.36.1-5)
   gnome-shell-extensions (3.34.2-1 => 3.36.1-1)
   libmutter-5-0 (3.34.3-1 => 3.34.4-1)
   mutter (3.34.3-1 => 3.36.1-4)
   mutter-common (3.34.4-1 => 3.36.1-4)

Any idea?
Many thanks!


Le dim. 12 avr. 2020 à 18:08, Pierre Cheynier
<pierre.cheynier@gmail.com> a écrit :

#956121#35
Date:
2020-04-19 15:11:17 UTC
From:
To:
#956121#40
Date:
2020-04-27 17:48:14 UTC
From:
To:
Hi Simon, list,

This evening, I gave a try to:

* upgrading my system except gnome-shell/mutter etc.
I tried to bump gir1.2-gnomedesktop-3.0, but this breaks gnome either
in 3.36.1-2 or 3.36.1-3
(https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956896).
May I suggest to add a "Breaks: gnome-shell (<< 3.36)" instead of
"Breaks: gnome-shell (<< 3.34)"? Or maybe I'm missing something?

* repackaging mutter 3.36.1+git20200419-1 with my revert of 59e9b073
from upstream (context in the gnome thread
https://gitlab.gnome.org/GNOME/mutter/-/issues/1193#note_776408)
Unfortunately it requires libgnome-desktop-3-dev which depends on
gir1.2-gnomedesktop-3.0, same version, so I'm back to the first issue.
I can upgrade, compile, downgrade though.

Any path I could follow to not experience too much troubles while
still being able to test an upgrade with this revert?

pch

#956121#45
Date:
2020-04-27 18:40:15 UTC
From:
To:
What's meant to have happened is that gir1.2-gnomedesktop-3.0 Depends on
libgnome-desktop-3-19, which Depends on a matching gnome-desktop3-data,
which prevents installation of libgnome-desktop-3-18, which avoids the
crash. The goal is that you can have either libgnome-desktop-3-19 or
libgnome-desktop-3-18, but never both at the same time.

This is not completely fixed in testing until
libgnome-desktop-3-18_3.36.1-2 and gir1.2-gnomedesktop-3.0_3.36.1-2
disappear from testing, which should have happened 5 days ago but has been
held up by some changes to other packages. If I'm reading the migration
logs correctly, this is being delayed by the rebuilt version of evince
having picked up a dependency on a newer libsecret, and might be solved
in 2 days when libsecret is old enough to migrate.

Increasing the version in the Breaks wouldn't really help you here,
because gir1.2-gnomedesktop-3.0_3.36.1-2 didn't have it, and there's
nothing we can do that will retroactively change version 3.36.1-2:
we have to wait for the state of the archive to be suitable for
version 3.36.1-2 to disappear.

The best route would be to rebuild mutter against packages from
unstable (in particular, with libgnome-desktop-3-dev_3.36.1-3 and
libgnome-desktop-3-19 installed), with a revert of the commit(s) that
you suspect are causing this.

If you're able to build mutter in sbuild or pbuilder, in a container, or
in a virtual machine, then the easiest way will be to do that, using an
up-to-date unstable environment in the chroot/container/VM.

Or you could upgrade to the unstable version of libgnome-desktop-3-19
(which will require upgrading gnome-shell and mutter to their latest
versions from unstable, or at least a recent-ish version), and apply
whatever workarounds make your system halfway usable for long enough
to compile mutter with the problematic commit reverted. If this issue
is Wayland-specific, maybe you could edit /etc/gdm3/daemon.conf and
force X11 by uncommenting "WaylandEnable=false"; or you could
temporarily disable gdm and use text mode to compile mutter.

    smcv

#956121#50
Date:
2021-09-22 16:10:36 UTC
From:
To:
Hi everyone,

I'm facing exactly the same issue in GNOME 40. Same symptoms: The screen in
the second display is entirely blank, except for the mouse pointer which I
can move. I'm not using proprietary nvidia drivers, just Wayland, Nouveau,
Mutter, on GNOME 40.

Here's some environmental information:

Versions:

===
$ dpkg -l xserver-xorg-video-nouveau mutter xwayland
Desired=Unknown/Install/Remove/Purge/Hold
|
Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                       Version      Architecture Description
+++-==========================-============-============-===========================================================
ii  mutter                     40.4-2+b1    amd64        Example window
manager using GNOME's window manager library
ii  xserver-xorg-video-nouveau 1:1.0.17-1   amd64        X.Org X server --
Nouveau display driver
ii  xwayland                   2:1.20.11-1  amd64        Xwayland X server
===

Devices:

===
$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 630 (rev
04)
01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro
M1200 Mobile] (rev a2)
===

Info from reportbug:

===