#988200 xserver-xorg-video-nvidia: user-specified xorg configuration can't find nvidia driver

#988200#5
Date:
2021-05-07 14:38:02 UTC
From:
To:
Dear Maintainer,

I was attempting to test NVIDIA Prime Render Offload on Debian 11 on my Thinkpad P52,
following instructions at https://wiki.debian.org/NVIDIA%20Optimus and
https://us.download.nvidia.com/XFree86/Linux-x86_64/450.57/README/primerenderoffload.html.

As I understand it, the required steps should just be: booting via the modesetting driver
while the nvidia driver is installed. However, this autodetection failed; while the nvidia
kernel modules were loaded, xrandr --listproviders reported merely the modesetting driver.

As per the nvidia documentation linked above, I attempted to explicitly use an xorg.conf
as follows:

Section "ServerLayout"
  Identifier "layout"
  Screen 0 "iGPU"
  Option "AllowNVIDIAGPUScreens"
EndSection

Section "Device"
  Identifier "iGPU"
  Driver "modesetting"
EndSection

Section "Screen"
  Identifier "iGPU"
  Device "iGPU"
EndSection

Section "Device"
  Identifier "dGPU"
  Driver "nvidia"
EndSection

This caused the following error in Xorg.log:

(EE) Failed to load module "nvidia" (module does not exist, 0)

I asked the nvidia support forum as to what the issue could be, and they suggested it was
xorg not finding the debian-specific location for where the xorg nvidia drivers live
https://forums.developer.nvidia.com/t/460-73-01-fails-to-load-in-xorg-under-prime-render-offload-debian-11/177257

That is, they suggested the following /etc/X11/xorg.conf.d/11-nvidia-driver.conf

Section "OutputClass"
    Identifier "nvidia-driver"
    MatchDriver "nvidia-drm"
    Driver "nvidia"
    Option "AllowEmptyInitialConfiguration"
    ModulePath "/usr/lib/nvidia/current"
EndSection

This worked. I would imagine then this OutputClass configuration should be provided with
the xserver-xorg-video-nvidia package, then?

The following logs are from the working configuration; I've also attached nvidia-bug-report.log.gz
capturing the state of the system before this OutputClass file was added.

Thank you.

#988200#10
Date:
2022-08-23 19:28:31 UTC
From:
To:
Dear Maintainer,

I also encountered this problem.  I'm not trying to do anything exotic, I have one monitor plugged into the nVidia video card and one on the Intel integrated one.

I was able to solve my immediate problem by running:

    sudo ln -s /usr/lib/nvidia/current/nvidia_drv.so /usr/lib/xorg/modules/drivers

But I was very surprised to find that this package didn't put its shlibs into the place Xorg expects to find them.

IMO, this is a pretty major bug.

#988200#15
Date:
2022-08-24 17:33:30 UTC
From:
To:
Adding, this issue means Xorg can’t detect the nVidia GPU *at
all*, even if you simply fire it up with no special configuration.
Perhaps it works on machines with a single nVidia card?  But on my
machine with Intel and nVidia, straight out of the box, it doesn’t
work *at all*.

Verified that the same situation exists in 470.141.03-1, which is
the current version in sid.

#988200#20
Date:
2023-12-23 23:38:25 UTC
From:
To:
Just want to pipe in that I’m also affected by this bug, which
seems very severe.  I don’t see how this package is usable /at
all/, in any configuration, in its current state.

I’m on a ThinkStation P360, which has both Intel and nVidia GPUs
-- but I only use the nVidia GPU.  Spread through a few files in 
/etc/X11/xorg.conf.d, I have this configuration:

Section "Device"
    Identifier "nVidia T400 4gb"
    # BusID "pci:1:0:0"
	Driver "nvidia"
    Option "Monitor-DP-4" "Lenovo ThinkVision P32p-20"
    Option "ForceFullCompositionPipeline" "On"
EndSection

Section "Monitor"
    Identifier "Lenovo ThinkVision P32p-20"
    VendorName "Lenovo"
    ModelName "ThinkVision P32p-20"
    Option "Primary" "true"
    Option "LeftOf" "Lenovo ThinkVision LT1913p"
    EndSection

Section "Monitor"
    Identifier "Lenovo ThinkVision LT1913p"
    VendorName "Lenovo"
    ModelName "ThinkVision LT1913p"
    Option "Primary" "false"
    Option "RightOf" "Lenovo ThinkVision P32p-20"
EndSection

Section "Screen"
    Identifier "Primary"
    Device "nVidia T400 4gb"
    Monitor "Lenovo ThinkVision LT1913p"
    DefaultDepth 24
    Option "nvidiaXineramaInfoOrder" "DFP-4"
    # Option "metamodes" "nvidia-auto-select +0+0
    {ForceCompositionPipeline=On,
    ForceFullCompositionPipeline=On}"
EndSection


With this configuration, the X server fails to start, with:

[   200.858] (II) LoadModule: "nvidia"
[   200.858] (WW) Warning, couldn't open module nvidia

This is because the server’s default ModulePath is
/usr/lib/xorg/modules -- which can be seen in the server log:

[   286.124] (==) ModulePath set to "/usr/lib/xorg/modules"

But this package installs it into
/usr/lib/nvidia/current/nvidia_drv.so -- so the X server can’t
load it, and fails to start.

This package either needs to either:

1. Install the driver into /usr/lib/xorg/drivers
2. Create a symlink there pointing to
/usr/lib/nvidia/current/nvidia_drv.so
3. Include an xorg configuration file in
/usr/share/X11/xorg.conf.d which adds /usr/lib/nvidia/current/ to
the ModulePath.

I’ve been working around this with option 2, but have found that
various package updates tend to remove the link for some reason,
which then breaks my setup.

I’m on an up-to-date Debian bookworm, xserver-xorg-video-nvidia
version 525.147.05-4~deb12u1.

Please let me know if I’m doing something wrong here -- I don’t
believe I am, but this also seems like a ridiculously severe bug
to have gotten no response whatsoever for over a year, and I find
it difficult to believe it could be this broken for that long.
But I also don’t see any way it could work without manual effort.

Thanks,

  — Ian

#988200#25
Date:
2023-12-26 02:43:53 UTC
From:
To:
Please run

   reportbug -N 988200

on the affected system to collect some information about your system
state that might help diagnose what went wrong in your case.


Andreas

#988200#30
Date:
2023-12-26 17:58:06 UTC
From:
To:
Dear Maintainer,

Here’s the reportbug output, as requested.

Could you please help me understand how this package is *supposed* to work?  My understanding of what’s happening here is:

1. Xorg determines that some video adapter needs the nvidia module, whether that’s autodetected or explicitly configured.
2. Xorg consults ModulePath to determine where to find loadable modules.  The default ModulePath is /usr/lib/xorg/modules.
3. Xorg looks in ModulePath for nvidia_drv.so.
4. nvidia_drv.so doesn’t exist in that path, so Xorg fails to start.

Should something be changing the Xorg ModulePath to include /usr/lib/nvidia/current/, where this package installs nvidia_drv.so?

Should something be making nvidia_drv.so available in the default ModulePath (/usr/lib/xorg/modules), either by directly installing there, or by linking it there?

If so, by what mechanism would it do that/those?

If not, how does Xorg know where to locate nvidia_drv.so?

Is there a different package which is supposed to handle this?  If so, which is it?  I see nvidia-alternatives, but that doesn’t manage any xorg links, only stuff inside /usr/lib/nvidia.  I also see nvidia-support has some debconf templates saying "The NVIDIA driver is not yet configured; it needs to be enabled in xorg.conf before it can be used.  Please see the package documentation for instructions."  -- however, that package has no documentation, just a changelog and copyright.

I’ve looked in this package’s files and install scripts, and I don’t see anything which alters ModulePath, nor anything which puts nvidia_drv.so into the default ModulePath.  So I don’t see how Xorg can load the driver, ever.

I checked a few other Xorg driver packages to see what they do.  This same system has installed:

xserver-xorg-video-all
xserver-xorg-video-amdgpu
xserver-xorg-video-ati
xserver-xorg-video-dummy
xserver-xorg-video-fbdev
xserver-xorg-video-nouveau
xserver-xorg-video-nvidia
xserver-xorg-video-radeon
xserver-xorg-video-vesa
xserver-xorg-video-vmware

Ignoring xserver-xorg-video-all (which is a dummy package that depends on the others), *every package* installs a "_drv.so" file into /usr/lib/xorg/modules/drivers -- except this one.

#988200#35
Date:
2023-12-26 20:07:49 UTC
From:
To:
There we are ;-)

You have the nvidia driver installed, but it is disabled (i.e. only MESA
gets used). Therefore no nvidia driver alternatives are placed anywhere
(and manually attempting to use it (with your custom xorg.conf) fails by
not finding it).
Yes, there are use cases where the nvidia driver needs to be installed
(even multiple variants of it in parallel) without being enabled.

The alternatives must have been modified manually, as the packaging will
always leave them in "auto" mode. Of course the packages won't override
such local customization.

You probably want to run
   update-glx --auto nvidia
   update-glx --auto glx
to re-enable automatic control of the alternatives by the system.


Andreas