- Package:
- nvidia-kernel-source
- Source:
- nvidia-graphics-drivers
- Submitter:
- Goswin von Brederlow
- Date:
- 2011-11-24 16:36:07 UTC
- Severity:
- normal
Hi, starting X under xen leaves the console broken. I just build a xen pv-ops kernel from git (2.6.31.6 + xen patches) and compiled the nvidia-kernel module against it using the instructions from http://en.opensuse.org/Talk:Use_Nvidia_driver_with_Xen: vi /usr/src/mdules/nvidia-kernel/Makefile.kbuild Insert the following code after EXTRA_CFLAGS += -Wall.. XEN_FEATURES := $(shell grep "D xen_features" /boot/System.map-$(shell uname -r) | colrm 17) EXTRA_LDFLAGS := --defsym xen_features=0x$(XEN_FEATURES) Close the file and set some environment variables: export IGNORE_XEN_PRESENCE=1 make-kpkg --append-to-version -xen-2010.02.18 --revision 2.6-xen-2010.02.18-1 --added-modules nvidia-kernel modules-image After that I installed the deb and modprobed nvidia. Then when I start X I get the following in dmesg: Xorg: Corrupted page table at address 7fc15c1f9000 PGD 7b488067 PUD 7b4a5067 PMD 6c132067 PTE fffffffffffff237 Bad pagetable: 000f [#4] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:10.0/0000:03:00.0/boot_vga CPU 3 Modules linked in: nvidia(P) fuse nf_nat_irc nf_nat snd_ens1371 snd_ac97_codec ac97_bus dmfe psmouse snd_hda_codec_nvhdmi snd_hda_codec_realtek usbhid snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss ohci_hcd snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device snd ehci_hcd soundcore forcedeth i2c_nforce2 usbcore snd_page_alloc [last unloaded: nvidia] Pid: 8683, comm: Xorg Tainted: P D 2.6.31.6-xen-2010.02.18 #1 Point of View RIP: e033:[<00007fc15632ea45>] [<00007fc15632ea45>] 0x7fc15632ea45 RSP: e02b:00007fff3ea41448 EFLAGS: 00010246 RAX: 00007fc15c1f9000 RBX: 0000000002210840 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000020 RDI: 0000000002218eb0 RBP: 0000000000000001 R08: 0000000000000058 R09: 0101010101010101 R10: 0000000000000000 R11: 00007fc15632ea30 R12: 0000000002218eb0 R13: 0000000000000000 R14: 00007fc15684a7a0 R15: 0000000000000001 FS: 00007fc15c1e3790(0000) GS:ffffc90000042000(0000) knlGS:0000000000000000 CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007fc15c1f9000 CR3: 000000006c117000 CR4: 0000000000000660 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process Xorg (pid: 8683, threadinfo ffff880079b94000, task ffff88007d2673b0) RIP [<00007fc15632ea45>] 0x7fc15632ea45 RSP <00007fff3ea41448> ---[ end trace d693108671cb486f ]--- The X is killed leaving the screen broken. No text console and nothing. Luckily ssh logins are unaffected so it isn't a complete crash. MfG Goswin
Hi, nvidia-graphics-drivers 195.36.24 was uploaded to unstable today. Please try to reproduce your problem with current driver and Debian stock kernel 2.6.32-5 with xen support. Andreas
Hi, I have same problem here with the nvidia-kernel-dkms package. I am using driver version 195.36.24-1 (testing) and kernel version 2.6.32-5-xen-686 (unstable).
Tested theses setup: 1/ xen hypervisor 4.0.1~rc3-1 + linux-image "xen" (linux-image-2.6.32-bpo.5-xen-amd64) + nvidia 195.36.24 built from dkms 2/ linux-image "xen" (linux-image-2.6.32-bpo.5-xen-amd64) + nvidia 195.36.24 built from dkms 3/ linux-image "without xen" (linux-image-2.6.32-bpo.5-amd64) + nvidia 195.36.24 built from dkms Test results: 1 fails, whereas 2 and 3 work. When the system fails, we have the message: Xorg: Corrupted page table Tests 2 and 3 work well: launched successfully glxinfo, glxgears and openarena. Maybe this bug is related to #470817 ? Test results point at a problem between Xen Hypervisor and NVIDIA driver memory management. Best regards, Michel
Thanks for the detailed test report. Could you try the new versions of the driver, too? 195.36.31-3 is in unstable and 256.53-1 in experimental. Also it would be nice if you could test with the official Debian 2.6.32 kernel (and xen hypervisor) packages (preferably the versions from unstable) instead of a backport kernel. Andreas
Hello, my latest try : X11 starts up and freeze with a black screen. I've updated the nvidia package to 195.36.31-5. ii nvidia-glx 195.36.31-5 ii nvidia-kernel-2.6-xen-amd64 195.36.31+1 ii nvidia-kernel-2.6.32-5-amd64 195.36.31+2+4+2.6.32-24 ii nvidia-kernel-2.6.32-5-xen-am 195.36.31+1+2+2.6.32-21 ii nvidia-kernel-common 20100522+1 ii nvidia-kernel-dkms 195.36.31-5 ii nvidia-settings 195.36.24-1 ii nvidia-vdpau-driver 195.36.31-5 Xen and Kernel packages are up to date with testing : ii xen-hypervisor-4.0-amd64 4.0.1-1 ii linux-image-2.6-amd64 2.6.32+28 ii linux-image-2.6.32-5-amd64 2.6.32-26 ii linux-image-2.6.32-5-xen-amd6 2.6.32-26 ii linux-image-xen-amd64 2.6.32+28 When I boot with Xen hypervisor + Linux 2.6.32-5, X11 starts up and freeze with a black screen. I've attached Xorg.log and an extract of /var/log/messages : - messages.firstTrace : is similar to which I sent to debian bug #601869 (even if in the first place (in 601869) the nvidia module was not loaded) -- at this time the boot process did not reach X startup. - messages.secondTrace : kernel messages when X has started. Kind regards, Michel PS: I don't understand why some package has a number with a - and some with a + in their version.
some older test report that didn't make it into this bug report ... Michel Briand <michelbriand@free.fr> - Sat, 25 Sep 2010 15:22:45 +0200 Hum.... bad news. with 2.6.32-5-xen-amd64 under hypervisor, Xorg does start with hang with black screen. I used Magic SysRq to reboot. I attach the Xorg log file.
Hi Michel,
something you could try is to patch nv-linux.h in /usr/src/nvidia*/ and
change the following:
Change the line
#if defined(CONFIG_XEN) && !defined(CONFIG_PARAVIRT)
to
#if defined(CONFIG_XEN) // && !defined(CONFIG_PARAVIRT)
and rebuild the module.
Only try this with the xen kernel running under the hypervisor. This
patch will probably break the nvidia module that is currently working
for normal kernel and the xen kernel running not under the hypervisor.
The intention of this patch is to explicitely reactiviate some old style
behaviour described in this comment in nv-linux.h, but I don't know if
the kernel still compiles in this mode:
/*
* Traditionally, CONFIG_XEN indicated that the target kernel was
* built exclusively for use under a Xen hypervisor, requiring
* modifications to or disabling of a variety of NVIDIA graphics
* driver code paths. As of the introduction of CONFIG_PARAVIRT
* and support for Xen hypervisors within the CONFIG_PARAVIRT_GUEST
* architecture, CONFIG_XEN merely indicates that the target
* kernel can run under a Xen hypervisor, but not that it will.
*
* If CONFIG_XEN and CONFIG_PARAVIRT are defined, the old Xen
* specific code paths are disabled. If the target kernel executes
* stand-alone, the NVIDIA graphics driver will work fine. If the
* kernels executes under a Xen (or other) hypervisor, however, the
* NVIDIA graphics driver has no way of knowing and is unlikely
* to work correctly.
*/
If this still does not reactivate xen support under hypervisor, that
configuration is probably unsupported by upstream. At least upstream
docs do not say anything about xen at all ...
In that case I have no further clues and the only thing we can document
is: does not work in the xen kernel running under the hypervisor.
Or is there some xen patch for current drivers that I'm not aware of?
Andreas
I believe this is still an ongoing issue. I have full dist-upgrade from 2 days ago. here are what should be relevant pkgs: dpkg -l | egrep "linux-image|xen" ii libxenstore3.0 4.0.1-1 Xenstore communications library for Xen ii linux-headers-2.6.32-5-common-xen 2.6.32-29 Common header files for Linux 2.6.32-5-xen ii linux-headers-2.6.32-5-xen-amd64 2.6.32-29 Header files for Linux 2.6.32-5-xen-amd64 ii linux-image-2.6-amd64 2.6.32+28 Linux 2.6 for 64-bit PCs (meta-package) ii linux-image-2.6.26-2-xen-amd64 2.6.26-15 Linux 2.6.26 image on AMD64, oldstyle Xen support ii linux-image-2.6.32-4-amd64 2.6.32-11 Linux 2.6.32 for 64-bit PCs ii linux-image-2.6.32-5-amd64 2.6.32-29 Linux 2.6.32 for 64-bit PCs ii linux-image-2.6.32-5-vserver-amd64 2.6.32-29 Linux 2.6.32 for 64-bit PCs, Linux-VServer support ii linux-image-2.6.32-5-xen-amd64 2.6.32-29 Linux 2.6.32 for 64-bit PCs, Xen dom0 support ii linux-image-xen-amd64 2.6.32+28 Linux for 64-bit PCs (meta-package), Xen dom0 support, Xen dom0 support ii linux-modules-2.6.26-2-xen-amd64 2.6.26-15 Linux 2.6.26 modules on AMD64 ii xen-hypervisor-4.0-amd64 4.0.1-1 The Xen Hypervisor on AMD64 ii xen-utils-4.0 4.0.1-1 XEN administrative tools ii xen-utils-common 4.0.0-1 XEN administrative tools - common files ii xenstore-utils 4.0.1-1 Xenstore utilities for Xen i made change as requested in previous post and attempted to build, but build failed, errors look like this: In file included from /var/lib/dkms/nvidia/195.36.31/build/nv.c:14: /var/lib/dkms/nvidia/195.36.31/build/nv-linux.h:153:23: error: asm/maddr.h: No such file or directory In file included from /usr/src/linux-headers-2.6.32-5-common-xen/include/linux/compat.h:14, from /usr/src/linux-headers-2.6.32-5-common-xen/arch/x86/include/asm/mtrr.h:173, from /var/lib/dkms/nvidia/195.36.31/build/nv-linux.h:163, from /var/lib/dkms/nvidia/195.36.31/build/nv.c:14: /usr/src/linux-headers-2.6.32-5-common-xen/arch/x86/include/asm/compat.h: In function ‘arch_compat_alloc_user_space’: /usr/src/linux-headers-2.6.32-5-common-xen/arch/x86/include/asm/compat.h:210: warning: pointer of type ‘void *’ used in arithmetic /var/lib/dkms/nvidia/195.36.31/build/nv.c: In function ‘nv_kern_open’: /var/lib/dkms/nvidia/195.36.31/build/nv.c:2245: error: implicit declaration of function ‘HYPERVISOR_memory_op’ make[4]: *** [/var/lib/dkms/nvidia/195.36.31/build/nv.o] Error 1 make[3]: *** [_module_/var/lib/dkms/nvidia/195.36.31/build] Error 2 make[2]: *** [sub-make] Error 2 make[1]: *** [all] Error 2 make[1]: Leaving directory `/usr/src/linux-headers-2.6.32-5-xen-amd64' make: *** [modules] Error 2 make: Leaving directory `/var/lib/dkms/nvidia/195.36.31/build' perhaps i'm just missing a needed build/src module, lemme know if you want me to get some source in there i'm missing. but i figure it could just be an unavoidable invalid configuration with that modification. anywho, i'll try to keep up on this report for a few days, i have it in a good position to build, and the error is identical to originally reported. frozen machine, black screen, crashed x with error about corrupted page table.
Hi, I'm sorry Andreas, but I didn't have the time to test your proposed change before now. It seems that someone did it (Joseph) and that the driver does not compile. It does not compile here either: In file included from .../tmp/nvidia-195.36.31/nv.c:14: .../tmp/nvidia-195.36.31/nv-linux.h:153:23: error: asm/maddr.h: Aucun fichier ou dossier de ce type and /usr/src/linux-headers-2.6.32-5-common/include/xen/interface/memory.h: At top level: /usr/src/linux-headers-2.6.32-5-common/include/xen/interface/memory.h:32: error: expected specifier-qualifier-list before 'GUEST_HANDLE' ... plus 3 more definition errors. and .../tmp/nvidia-195.36.31/nv.c: In function 'nv_kern_open': .../tmp/nvidia-195.36.31/nv.c:2245: error: implicit declaration of function 'HYPERVISOR_memory_op' maddr.h does not exist on my machine. I've found that HYPERVISOR_memory_op is defined in hypercall.h. So I've added hypercall.h to try to make this compile : #if defined(CONFIG_XEN) // && !defined(CONFIG_PARAVIRT) //#include <asm/maddr.h> #include <asm/xen/hypercall.h> #include <xen/interface/memory.h> #define NV_XEN_SUPPORT_OLD_STYLE_KERNEL #endif But it fails soon after : .../tmp/nvidia-195.36.31/nv-vm.c: In function 'nv_vm_malloc_pages': .../tmp/nvidia-195.36.31/nv-vm.c:507: error: implicit declaration of function 'phys_to_machine' I believe this comes from : #if defined(NV_XEN_SUPPORT_OLD_STYLE_KERNEL) #define NV_GET_DMA_ADDRESS(phys_addr) phys_to_machine(phys_addr) #else #define NV_GET_DMA_ADDRESS(phys_addr) (phys_addr) #endif--- However I managed to try a new test at work : I've installed latest squeeze kernel, xen and nvidia packages (as of 09/01/2011). I used the dkms -- great tool ;) -- to try nvidia driver version 195 and version 260 (from experimental). Both of them are nicely compiled and installed by dkms ! But none of them would work under Xen (DOM0). Same problem with black screen and freeze a few seconds after X starts. Best regards, Michel
Hello, I try to run nvidia drivers(260.19.44 & 195.36.24) on asus at3iont-i deluxe debian squeeze 2.6.32-5-xen-amd64 #1 SMP Wed Jan 12 05:46:49 UTC 2011 x86_64 GNU/Linux but also get errors: Message from syslogd@neptun2 at Mar 16 12:20:13 kernel:[ 36.146786] Bad pagetable: 000f [#2 SMP Message from syslogd@neptun2 at Mar 16 12:20:13 kernel:[ 36.146797] last sysfs file: /sys/bus/acpi/drivers/NVIDIA ACPI Video Driver/uevent Is there any workaround ? Regards,
Hello, What is the conclusion of this bug? http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=570365 Debian squeeze, xen and nvidia driver are not supported and will not work in this release? What is causing the problem? Is it nvidia driver? Is there a version of nvida driver where this issue is fixed? Is there a workaround or patch? What needs to happen for this to work in squeeze? Bug was marked as upstream/won't fix. What does that mean for users? Thanks, Lucas
Lukasz Szybalski <szybalski@gmail.com> writes:
I'm pretty sure it doesn't work in squeeze and is not going to change
there. Or did the point release contain updates for nvidia?
But it should be working in wheezy or when you use a newer kernel and
nvidia on squeeze (backports?).
For me it was that the nvidia module would conflict with
CONFIG_XEN*. They simply failed to build even when following some
workarounds I found with google.
Try it with the latest kernel+driver and google if you have problems. I
hope the issue is fixed upstream by now but I simply haven't had time to
setup a test system to try again yet.
So what needs to happen now is that someone tries this again and reports
back.
Don't expect it to work in squeeze but it should be fixable with
backports (if it isn't fixed already). Again, someone needs to test and
report back.
MfG
Goswin
I have installed the "wheezy" nvidia driver and it still does not work with current 2.6.32 kernel. So future version of debian will probably not work as well. The module compiles fine for me using m-a, but Xorg is running at 100% cpu and I get black screen. Unable to do anything. Its a "grave" situation :(