Dear Maintainer, When upgrading mutter from 3.34.3-1 to 3.34.4-1 (testing), my second screen (eDP / 3440x1440) is completely white, the second one show glitches and freezes during a few tens sec. probably while trying to sync the other one before aborting. Downgrading to 3.34.3-1 + restart gdm3 fixes the issue. HW setup: i915 $ lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09) 01:00.0 VGA compatible controller: NVIDIA Corporation GK107GLM [Quadro K2000M] (rev a1) Not able to see something specific neither here: https://gitlab.gnome.org/GNOME/mutter/-/compare/3.34.4...3.34.5#0cc1139e3347f573ae1feee5b73dbc8a8a21fcfa nor there: https://salsa.debian.org/gnome-team/mutter/-/commits/upstream/3.34.x (and BTW most 3.34.5 patches have been backported). Anything relevant that would help troubleshooting the issue?
Is this reproducible by upgrading/downgrading only libmutter-5-0 and
closely-related packages, without altering anything else?
Are you able to test this with mutter and gnome-shell 3.36.x from
experimental? We're close to uploading those to unstable.
You appear to have a dual-GPU system. Is the NVIDIA device completely
disabled, or are they both active via some sort of dual-GPU arrangement
like Bumblebee?
Do you have the proprietary NVIDIA drivers installed? (I'm guessing you
don't, because reportbug would have told us about the non-free module.)
systemd journal entries during GNOME Shell startup and around the time
you reproduce a bug are usually useful, particularly for hardware-related
things like this.
(Or during mutter startup and around the time you reproduce a bug, if
you are genuinely using the standalone mutter executable - which is not
really recommended, but should in principle work.)
Looking at the commits between 3.34.3 and 3.34.4,
"renderer-native: Fix memory leak in secondary GPU update" and
"kms-impl-simple: Handle mode set race conditions gracefully" look
potentially relevant as things that might have caused a regression,
but I'm just guessing really.
smcv
Le mar. 7 avr. 2020 à 21:27, Simon McVittie <smcv@debian.org> a écrit : Sorry for being that late answering! Yes absolutely, knowing that upgrading libmutter-5-0 also upgrade, by dependency: gir1.2-mutter-5 libmutter-5-0 (i.e. the only one not up-to-date here being mutter package itself). Not right now, since I almost suceeded having an almost-operational working environment :), but I can provide you some feedback soon (let say 1 week?) Yes exactly, nvidia chip not-used, no proprietary, full nouveau. These setups are terrible. First, just to be clear on the symptoms, my second screen is fully white, but I'm still able to move my mouse pointer and see it on this area .. Let me tell you that it's a mess if we consider all graphics + gnome-shell + plugins related logs :) All the logs seems to say that everything is living its life normaly (gnome-* related logs), except on the kernel aspects, I noticed that I observe this during a "working" boot: ``` avril 10 11:47:39 dell-m4700 kernel: DMAR: DRHD: handling fault status reg 2 avril 10 11:47:39 dell-m4700 kernel: DMAR: [DMA Read] Request device [00:1f.2] PASID ffffffff fault addr c9827000 [fault reason 06] PTE Read access is not set ``` But when having this non-working setup, I additionaly got this each 5s, following main screen blink, probably some kind of reinitialization (let say it's a side-effect?): ``` avril 10 11:46:25 dell-m4700 kernel: dmar_fault: 191660 callbacks suppressed avril 10 11:46:25 dell-m4700 kernel: DMAR: DRHD: handling fault status reg 3 avril 10 11:46:25 dell-m4700 kernel: DMAR: [DMA Read] Request device [01:00.0] PASID ffffffff fault addr fc01e000 [fault reason 06] PTE Read access is not set avril 10 11:46:25 dell-m4700 kernel: DMAR: DRHD: handling fault status reg 3 avril 10 11:46:25 dell-m4700 kernel: DMAR: [DMA Read] Request device [01:00.0] PASID ffffffff fault addr fc045000 [fault reason 06] PTE Read access is not set avril 10 11:46:25 dell-m4700 kernel: DMAR: DRHD: handling fault status reg 3 avril 10 11:46:25 dell-m4700 kernel: DMAR: [DMA Read] Request device [01:00.0] PASID ffffffff fault addr fc06a000 [fault reason 06] PTE Read access is not set avril 10 11:46:25 dell-m4700 kernel: DMAR: DRHD: handling fault status reg 3 ``` After each of those crashes, I also see a ``` avril 10 11:46:26 dell-m4700 systemd[1782]: Started Application launched by gnome-shell. avril 10 11:46:27 dell-m4700 systemd[1782]: gnome-launched-gnome-background-panel.desktop-2515.scope: Succeeded. ``` Except from that gnome-shell and friends seems OK-ish. Anything to add more verbosity on the mutter specifics? I don't, here I stick on debian defaults with gnome.
That's fine. Everything from the src:mutter source package is tied
together quite closely and upgrading them all as a batch is expected,
although whether you change the mutter binary package or not shouldn't
matter (it doesn't actually do very much).
So we definitely have:
src:mutter version 3.34.3-1: good
src:mutter version 3.34.4-1: not good
although this might be either a bug in mutter, or just mutter doing
something differently that triggers a lower-level issue.
3.36.x should be in unstable, or even in testing, by then. You might
want to put your working 3.34.3 setup on "hold" in apt/aptitude for now.
You didn't answer this. If the NVIDIA device is disabled, how did you
disable it?
Yes, dual-GPU seems to be really hard to get right. The graphics stack is
already a complicated dance involving the kernel, libdrm, Mesa, libmutter
and GNOME Shell; having two GPUs involved just makes that worse.
Yes it is, but it might be helpful to show us anyway.
If you're using GNOME Shell extensions, please test with them all disabled
and see whether that works (you can re-enable them afterwards).
If you run the lspci command (install pciutils if you don't already have
it), what device is 00:1f.2?
That means part of the kernel tried to log 191660 lines, and the logging
subsystem wouldn't let it. That's a lot of logging!
Similarly, what device is 01:00.0? It's on its own separate PCIE bus if I
understand correctly, which probably means it's either your Intel GPU or
NVIDIA GPU.
Searching for the log messages suggests that booting with
intel_iommu=igfx_off
or
intel_iommu=off
added to the kernel command line might help
(see also <https://bugs.debian.org/935270>, which was also on a Dell).
smcv
Le ven. 10 avr. 2020 à 14:42, Simon McVittie <smcv@debian.org> a écrit : Seeing that you released to unstable 2 days ago, but doing an apt-get update doesn't change anything for me. Should I conclude that my mirror is late? (seeing the package when browing it though... http://ftp.fr.debian.org/debian/pool/main/m/mutter/) ``` $ date dimanche 12 avril 2020, 15:28:36 (UTC+0200) $ sudo apt-get update | grep sid Atteint :4 http://ftp.fr.debian.org/debian sid InRelease $ apt-cache policy mutter mutter: Installé : 3.34.3-1 Candidat : 3.34.4-1 Table de version : 3.34.4-1 500 500 http://ftp.fr.debian.org/debian bullseye/main amd64 Packages 2 http://ftp.fr.debian.org/debian sid/main amd64 Packages *** 3.34.3-1 100 100 /var/lib/dpkg/status ``` Yes, actually I just didn't do any bumblebee setup of whatever related to nvidia drivers, but you're right that it doesn't mean my chip is disabled. Unfortunately, I just figured out that I cannot disable the chip since the wiring is the following: ``` $ sudo ls -l /sys/class/drm/card* lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card0 -> ../../devices/pci0000:00/0000:00:02.0/drm/card0 lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card0-LVDS-1 -> ../../devices/pci0000:00/0000:00:02.0/drm/card0/card0-LVDS-1 lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card0-VGA-1 -> ../../devices/pci0000:00/0000:00:02.0/drm/card0/card0-VGA-1 lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card1 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1 lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card1-DP-1 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-1 lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card1-DP-2 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-2 lrwxrwxrwx 1 root root 0 avril 12 15:12 /sys/class/drm/card1-DP-3 -> ../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/drm/card1/card1-DP-3 ``` So in my case, LVDS-1 and DP-1 connected. Just did an apt-get upgrade + gdm3 stop/start > bug > gdm3 stop + dpkg-i <old_packages>. You'll find in attachment the full journald logs (between the gdm3 start and stop), concretely if I clean the thing a bit (remove pids, etc.) and do a diff, I see nothing obvious.. (I mean, there are a lot of errors, but present in both cases). Disabled everything, same behavior. OK, it seems unrelated (and older anyway). ``` $ lspci | grep 00:1f.2 00:1f.2 RAID bus controller: Intel Corporation 82801 Mobile SATA Controller [RAID mode] (rev 04) ``` True, the NVIDIA GPU. ``` $ lspci | grep 01:00.0 01:00.0 VGA compatible controller: NVIDIA Corporation GK107GLM [Quadro K2000M] (rev a1) ``` Tried these, didn't change anything.
Hello Simon, list, I just gave a try to bump to mutter=3.36.1-4 and it seems it doesn't change anything for me, my screen is still completely white, so something bad appeared between 3.34.3 and 3.34.4. Except from that I completely broke my install at trying this upgrade (since it implies to bump gnome-shell, got a crash screen "contact your administrator" instead of login screen, I finally ended up identifying I had to downgrade another package: gir1.2-gnomedesktop-3.0). My current situation if I dry-run a dist-upgrade is the following: $ sudo apt-get -V dist-upgrade --assume-no (...) Les NOUVEAUX paquets suivants seront installés : gir1.2-mutter-6 (3.36.1-4) libmutter-6-0 (3.36.1-4) Les paquets suivants seront mis à jour : gir1.2-gnomedesktop-3.0 (3.34.2-2 => 3.36.1-2) gir1.2-mutter-5 (3.34.3-1 => 3.34.4-1) gnome-shell (3.34.4-1 => 3.36.1-5) gnome-shell-common (3.34.4-1 => 3.36.1-5) gnome-shell-extensions (3.34.2-1 => 3.36.1-1) libmutter-5-0 (3.34.3-1 => 3.34.4-1) mutter (3.34.3-1 => 3.36.1-4) mutter-common (3.34.4-1 => 3.36.1-4) Any idea? Many thanks! Le dim. 12 avr. 2020 à 18:08, Pierre Cheynier <pierre.cheynier@gmail.com> a écrit :
Hi Simon, list, https://gitlab.gnome.org/GNOME/mutter/-/issues/1193 I guess this was due to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956896, so unrelated. Pierre
Hi Simon, list, This evening, I gave a try to: * upgrading my system except gnome-shell/mutter etc. I tried to bump gir1.2-gnomedesktop-3.0, but this breaks gnome either in 3.36.1-2 or 3.36.1-3 (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=956896). May I suggest to add a "Breaks: gnome-shell (<< 3.36)" instead of "Breaks: gnome-shell (<< 3.34)"? Or maybe I'm missing something? * repackaging mutter 3.36.1+git20200419-1 with my revert of 59e9b073 from upstream (context in the gnome thread https://gitlab.gnome.org/GNOME/mutter/-/issues/1193#note_776408) Unfortunately it requires libgnome-desktop-3-dev which depends on gir1.2-gnomedesktop-3.0, same version, so I'm back to the first issue. I can upgrade, compile, downgrade though. Any path I could follow to not experience too much troubles while still being able to test an upgrade with this revert? pch
What's meant to have happened is that gir1.2-gnomedesktop-3.0 Depends on
libgnome-desktop-3-19, which Depends on a matching gnome-desktop3-data,
which prevents installation of libgnome-desktop-3-18, which avoids the
crash. The goal is that you can have either libgnome-desktop-3-19 or
libgnome-desktop-3-18, but never both at the same time.
This is not completely fixed in testing until
libgnome-desktop-3-18_3.36.1-2 and gir1.2-gnomedesktop-3.0_3.36.1-2
disappear from testing, which should have happened 5 days ago but has been
held up by some changes to other packages. If I'm reading the migration
logs correctly, this is being delayed by the rebuilt version of evince
having picked up a dependency on a newer libsecret, and might be solved
in 2 days when libsecret is old enough to migrate.
Increasing the version in the Breaks wouldn't really help you here,
because gir1.2-gnomedesktop-3.0_3.36.1-2 didn't have it, and there's
nothing we can do that will retroactively change version 3.36.1-2:
we have to wait for the state of the archive to be suitable for
version 3.36.1-2 to disappear.
The best route would be to rebuild mutter against packages from
unstable (in particular, with libgnome-desktop-3-dev_3.36.1-3 and
libgnome-desktop-3-19 installed), with a revert of the commit(s) that
you suspect are causing this.
If you're able to build mutter in sbuild or pbuilder, in a container, or
in a virtual machine, then the easiest way will be to do that, using an
up-to-date unstable environment in the chroot/container/VM.
Or you could upgrade to the unstable version of libgnome-desktop-3-19
(which will require upgrading gnome-shell and mutter to their latest
versions from unstable, or at least a recent-ish version), and apply
whatever workarounds make your system halfway usable for long enough
to compile mutter with the problematic commit reverted. If this issue
is Wayland-specific, maybe you could edit /etc/gdm3/daemon.conf and
force X11 by uncommenting "WaylandEnable=false"; or you could
temporarily disable gdm and use text mode to compile mutter.
smcv
Hi everyone, I'm facing exactly the same issue in GNOME 40. Same symptoms: The screen in the second display is entirely blank, except for the mouse pointer which I can move. I'm not using proprietary nvidia drivers, just Wayland, Nouveau, Mutter, on GNOME 40. Here's some environmental information: Versions: === $ dpkg -l xserver-xorg-video-nouveau mutter xwayland Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-==========================-============-============-=========================================================== ii mutter 40.4-2+b1 amd64 Example window manager using GNOME's window manager library ii xserver-xorg-video-nouveau 1:1.0.17-1 amd64 X.Org X server -- Nouveau display driver ii xwayland 2:1.20.11-1 amd64 Xwayland X server === Devices: === $ lspci | grep VGA 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 630 (rev 04) 01:00.0 VGA compatible controller: NVIDIA Corporation GM107GLM [Quadro M1200 Mobile] (rev a2) === Info from reportbug: ===