Dear Maintainer, I've updated an older typewriter computer from Stretch to Buster and from there to Bullseye. The kernel linux-image-4.19.0-20-amd64 from Buster works. This one is used at the moment for reportbug! But anything from the newer versions hangs during boot. I've tried the release kernel of Bullseye, the security-updated linux-image-5.10.0-14-amd64, the newest one from backports and the latest version from testing, which has been a 5.17 version. I've tried to start up with boot option boot_delay=1000, but then it already hangs/crashes after the line of loading the initial ramdisk. Can only switch off/on the computer afterwards. Num-lock switch is dead. Without the boot_delay option, there is some fast kernel output and I've filmed with my camera. Between working and crashing kernel there is some difference in the SATA ports. And these are the last lines of output before the screen becomes black and there isn't any reaction of the computer any longer. This is a working boot with the Buster-kernel: [ 1.800181] hub 7-0:1.0: USB hub found [ 1.800226] hub 7-0:1.0: 2 ports detected [ 1.816820] scsi host1: ahci [ 1.817122] scsi host3: ahci [ 1.817395] scsi host4: ahci [ 1.817663] scsi host5: ahci [ 1.817939] scsi host6: ahci [ 1.818297] scsi host7: ahci [ 1.818430] ata3: SATA max UDMA/133 abar m2048@0xf01a6000 port 0xf01a6100 irq 25 [ 1.818489] ata4: SATA max UDMA/133 abar m2048@0xf01a6000 port 0xf01a6180 irq 25 [ 1.818547] ata5: DUMMY [ 1.818582] ata6: DUMMY [ 1.818618] ata7: SATA max UDMA/133 abar m2048@0xf01a6000 port 0xf01a6300 irq 25 [ 1.818678] ata8: SATA max UDMA/133 abar m2048@0xf01a6000 port 0xf01a6380 irq 25 [ 1.819353] scsi host2: ata_generic [ 1.819448] ata1: PATA max UDMA/100 cmd 0x1218 ctl 0x1240 bmdma 0x1200 irq 18 [ 1.819494] ata2: PATA max UDMA/100 cmd 0x1220 ctl 0x1244 bmdma 0x1208 irq 18 [ 1.879056] pci 0000:00:00.0: Intel Q35 Chipset [ 1.879122] pci 0000:00:00.0: detected gtt size: 524288K total, 262144K mappable [ 1.879855] pci 0000:00:00.0: detected 8192K stolen memory [ 1.879947] [drm] Replacing VGA console driver [ 1.880476] Console: switching to colour dummy device 80x25 [ 1.880940] [drm] ACPI BIOS requests an excessive sleep of 1124034056 ms, using 1500 ms instead [ 1.884726] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 1.884730] [drm] Driver supports precise vblank timestamp query. With Bullseye-kernel I can't see the line for scsi hosts 1 and 3. But maybe the order is differently and it is not viewable before the screen becomes black. So maybe also a problem with console switching and i915 graphic? I'll upload the video to opened bug report. I hope you can help me?! Do you know some boot options for the kernel I could try to get the newer kernels to work? br Markus
Am 01.06.2022 12:17, schrieb Markus Kolb: [...] [...] In the attachment is a screenshot of the last output and the video where it switches to black screen and crash around 0:12.
Control: found -1 linux/5.17.3-1 Bug https://bugs.debian.org/1006149 was also about a boot failure (and SATA) and that got fixed in version 5.17.6-1, but due to the openssl transition, that didn't get into Testing (which currently has 5.17.3-1). The current version in Sid/Unstable is 5.17.11-1 and it would be useful if you could test that as well. This does seem like a different issue as it happens with much older kernels then that bug report, so I don't expect it to fix it. But it's still useful info and it may result in some extra info.
Could you try to unplug any peripherals that are not strictly needed? IOW: only attach a keyboard and monitor and see if that makes a difference.
Am 01.06.2022 13:34, schrieb Diederik de Haas: Hey Diederik, I've just tested linux-image-5.17.0-2-amd64 5.17.6-1+b1 but also doesn't boot. The computer has only attached USB mouse. The keyboard is PS/2. Next to this only Ethernet cable and VGA is connected. In the meantime I've built linux-image-5.10.119 from kernel.org sources and it boots successful. I've used localmodconfig while running the linux-image-4.19.0-20-amd64 4.19.235-1 and afterwards used mostly defaults for new config options but also deactivated many options and modules, I don't need and was quite sure about it, for faster build finishing. I've attached dmesg output, config and lsmod output for my 5.10.119. Maybe it helps to find the right patch. Next I build 5.10.114 which is date corresponding to the 5.17.6 and see if I can find the version with patch with going upwards. Or someone already any idea what it could be? :-) cu Markus
Hi Markus, I was expecting 5.17.11-1, but that version does have the SATA fix from the other bug report too. And there's another thing to focus on ... Ok, that is fine. ... and this is VERY significant (afaict) :-) The exact implications of this is 'above my pay grade', but hopefully one of the kernel maintainers (who should understand this) chimes in. Via https://packages.debian.org/bullseye/linux-config-5.10 I retrieved the Debian kernel config for 5.10.106-1 (which is likely close enough) and compared it with the config you attached. The diff was *huge*, but the fact that you were able to boot your self-built 5.10 kernel while the Debian 5.10 kernel failed, points (strongly) towards a Debian kernel configuration difference which is the cause of this bug. I have no idea how to make any intelligent recommendations wrt kernel config changes, so I have to defer to people 'smarter' then me (wrt this). Building 5.10.113 with your custom config sounds like a good test case. Debian's 5.10.113 didn't boot (with Debian's config), but if the (exact) same version with a different config does work, then it seems almost certain to me that the bug is in the Debian kernel configuration. Cheers, Diederik
Because the kernel.org 5.10.113 with my stripped down config is running successful, I've rebuilt this kernel.org 5.10.113 with the Debian config /boot/config-5.10.0-14-amd64 via make oldconfig. There are some differences between kernel.org and Debian in the config, I've put the config diff bug-1012210-kernel-config-changes.patch in the attached tar.xz. This kernel also boots successful. I think the problem is introduced by a Debian patch which handles the config CONFIG_INTEL_IOMMU_DEFAULT_ON_INTGPU_OFF=y because this is not available in kernel.org 5.10.113. And the Debian kernel linux-image-5.10.0-14-amd64 5.10.113-1 boots successful when I set the boot option intel_iommu=on,igfx_off which is only needed with the Debian kernel and not any version from kernel.org. In the boot log of Debian kernel are the additional lines [ 0.050433] DMAR: IOMMU enabled [ 0.050434] DMAR: Disable GFX device mapping ... [ 1.373598] DMAR: No ATSR found [ 1.373721] DMAR: dmar2: Using Register based invalidation [ 1.373767] DMAR: dmar0: Using Register based invalidation [ 1.373810] DMAR: dmar3: Using Register based invalidation ... [ 1.391563] DMAR: Intel(R) Virtualization Technology for Directed I/O The kernel boot logs dmesg-5.10.113.txt (kernel.org without any boot options) dmesg-5.10.0-14-amd64.txt (Debian with boot option intel_iommu=on,igfx_off) are also in the attached tar.xz. In the patches https://salsa.debian.org/kernel-team/linux/-/blob/bullseye-security/debian/patches/features/x86/intel-iommu-add-kconfig-option-to-exclude-igpu-by-default.patch https://salsa.debian.org/kernel-team/linux/-/blob/bullseye-security/debian/patches/features/x86/intel-iommu-add-option-to-exclude-integrated-gpu-only.patch there is introduced the kernel config option INTEL_IOMMU_DEFAULT_ON_INTGPU_OFF but it is not handled anywhere in the code. I think you have mixed up the defaults of the configuration and settings of igfx_off and intgpu_off somehow which sets something up resulting in a wrong config for my boot. intgpu_off boot config itself doesn't change anything, with the Debian kernel I need igfx_off. At https://salsa.debian.org/kernel-team/linux/-/blob/bullseye-security/debian/patches/features/x86/intel-iommu-add-option-to-exclude-integrated-gpu-only.patch#L66 you should compare 10 chars and not only 8, but is more or less correctness. Maybe this static int dmar_map_intgpu = IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON); at https://salsa.debian.org/kernel-team/linux/-/blob/bullseye-security/debian/patches/features/x86/intel-iommu-add-kconfig-option-to-exclude-igpu-by-default.patch#L74 should be static int dmar_map_intgpu = IS_ENABLED(INTEL_IOMMU_DEFAULT_ON_INTGPU_OFF); or the negated value, not sure at the moment, what a y or n should mean in this config and if the assignments of 0 or 1 are correct everywhere.
On Thu, 2022-06-02 at 15:42 +0200, Markus Kolb wrote: [...] It is handled implicitly. When that config symbol is enabled, both INTEL_IOMMU_DEFAULT_ON and INTEL_IOMMU_DEFAULT_OFF are disabled. Well spotted. This is because at some point in development I changed the name of the option from igpu_off to intgpu_off. The patch description also has the earlier name. I'll correct that. No, the whole point of INTEL_IOMMU_DEFAULT_ON_INTGPU_OFF is to turn that off by default while still enabling the IOMMU for other devices. Based on the log from your self-built kernel, it seems like your system should work with the kernel parameter "intel_iommu=on". Can you test whether that makes a difference with the Debian kernels? Ben.
Am 02.06.2022 17:27, schrieb Ben Hutchings: [...] [...] Yes, I got this, but not sure if the code logic is really correct. With this INTEL_IOMMU_DEFAULT_ON is "implicitly" falsy and the code is supposed to run like it would be truthy. With Debian kernel it doesn't boot with intel_iommu=on and also not with intel_iommu=on,intgpu_off (which should be the same like nothing specified). Really only possibility intel_iommu=on,igfx_off.
Am 2. Juni 2022 13:42:54 UTC schrieb Markus Kolb <debian@tower-net.de>: [...] [...] I've patched the kernel.org 5.10.113 just with these 2 Debian patches and at least I can confirm, that these changes are the cause. Although I don't understand at the moment where the difference of intel_iommu=on with patch and the defaults without patch could be. Will have a closer look tomorrow.
I'm running Debian stable on a VM from an Apple MacBook Pro M1 14" 2021. The software I'm using is UTM, which uses QEMU under the hood. Yesterday I did a system upgrade: aptitude -y update && aptitude -y full-upgrade && apt -y autoremove I noticed the kernel was upgraded to 5.10.0-14 so I rebooted the VM. After that Debian was unable to boot. Choosing the previous kernel image available from Grub (5.10.0-13) allowed Debian to boot normally. Regards, Vincent
Control: clone -1 -2 Control: notfound -2 linux/5.17.3-1 Control: retitle -2 linux-image-5.10.0-14-amd64: boot failure in VM after upgrading from -13 Control: tag -2 moreinfo This seems to be completely unrelated to that bug, so I've cloned it into a new bug. When responding please only respond to that new bug report/number. With 'update' it seems pointless With 'full-upgrade' you REALLY should review what is about to happen before agreeing to that as it could remove packages (important for you) I'd recommend reviewing the 'autoremove' result too before committing it Bug 1012210 is about a boot failure on a (wide) variety of kernels, likely related to igpu. Your issue is a regression from -13 to -14. I assumed that you're running Debian Stable *in* a VM (on what host OS?). Please clarify whether that is correct or not. Also provide more info about YOUR boot failure and sent that to the NEW bug number that you should receive.
Am 02.06.2022 23:26, schrieb Markus Kolb: [...] I've found the difference, somehow I've had the opinion that with the kernel.org and Debian Buster kernel dmar_disabled would be set to false by default or CONFIG_INTEL_IOMMU_DEFAULT_ON would be enabled by default. But this is not the case. So dmar_disabled is true there without boot config. With the Debian patch in Bullseye and newer this has been enabled implicitly via CONFIG_INTEL_IOMMU_DEFAULT_ON_INTGPU_OFF=y and dmar_disabled became false. With the older kernels you had to enable it per boot config, and now you need to disable it. So added now this to drivers/iommu/intel/iommu.c and my computer boots without any required kernel boot option with Debian kernels:--- a/drivers/iommu/intel/iommu.c 2022-06-03 14:50:52.248268257 +0200 +++ b/drivers/iommu/intel/iommu.c 2022-06-03 14:48:12.695769217 +0200 @@ -6186,6 +6186,9 @@ dmar_map_gfx = 0; } +/* Q35 integrated gfx dmar support is totally busted. */ +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x29b2, quirk_iommu_igfx); + /* G4x/GM45 integrated gfx dmar support is totally busted. */ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2a40, quirk_iommu_igfx); DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e00, quirk_iommu_igfx); The only related to problem, I've found, is this discussion without result 5 years ago: https://lore.kernel.org/linux-iommu/20161205215841.GA20819@beast/ And this nearly 4 year old bug report without attention: https://bugzilla.kernel.org/show_bug.cgi?id=201185 I've opened https://bugzilla.kernel.org/show_bug.cgi?id=216064 Would you add this patch to Debian kernels?
Hi This bug was filed for a very old kernel or the bug is old itself without resolution. If you can reproduce it with - the current version in unstable/testing - the latest kernel from backports please reopen the bug, see https://www.debian.org/Bugs/server-control for details. Regards, Salvatore
Hi This bug was filed for a very old kernel or the bug is old itself without resolution. If you can reproduce it with - the current version in unstable/testing - the latest kernel from backports please reopen the bug, see https://www.debian.org/Bugs/server-control for details. Regards, Salvatore
The issue reported in this bug is still reproducible in Debian 13 stable (trixie) with kernel 6.12.73+deb13-amd64. This occurs with the same behavior described in the original report, the system hangs during startup. Disabling VT for Direct I/O in BIOS also seems to mitigate the problem.
Hi Ariel, From your followup we do not know if it is the same issue, can you please specify if our is as well a 00:02.0 VGA compatible controller [0300]: Intel Corporation 82Q35 Express Integrated Graphics Controller [8086:29b2] (rev 02) (prog-if 00 [VGA controller]) Subsystem: Hewlett-Packard Company 82Q35 Express Integrated Graphics Controller [103c:2818] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 16 Region 0: Memory at f0100000 (32-bit, non-prefetchable) [size=512K] Region 1: I/O ports at 1210 [size=8] Region 2: Memory at e0000000 (32-bit, prefetchable) [size=256M] Region 3: Memory at f0000000 (32-bit, non-prefetchable) [size=1M] Expansion ROM at 000c0000 [virtual] [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: i915 Kernel modules: i915 If so, can you test the patch from message #69 (https://bugs.debian.org/1012210#69) and confirm if this fixes your issue? Regards, Salvatore