Hi, Starting with version 6.13.3-1~exp1, the riscv64 kernel is shipped as a EFI binary with the payload compressed with zstd (using the EFI_ZBOOT config option). In addition to breaking non-EFI systems, this change simply prevents the kernel to boot on a VisionFive 2 board: | Loading Linux 6.13-riscv64 ... | Loading initial ramdisk ... | EFI stub: Decompressing Linux Kernel... | Unhandled exception: Store/AMO access fault | EPC: 00000000fb64a6ea RA: 00000000fb64a6da TVAL: 0000000040020020 | EPC: 000000003b9046ea RA: 000000003b9046da reloc adjusted | | Code: 0506 9526 4783 0015 4703 0005 3583 ed84 (0e23 fef9) | UEFI image [0x00000000fe6aa000:0x00000000fe6d0fff] '/efi\boot\bootriscv64.efi' | UEFI image [0x00000000fb646000:0x00000000fbe933ff] pc=0x46ea | | | resetting ... | reset not supported yet | ### ERROR ### Please RESET the board ### Regards Aurelien
Please re-assign to the bootloader package. Bastian
I disagree. The bootloader is u-boot and while it might be fixable at this level, debian should be bootable on the original firmware. BTW, you never explained the reason for your changes. It only brings smaller kernel nothing more. And a working kernel is better than a smaller kernel that does not work.
It needs to be fixed nevertheless. What do you mean with "original firmware"? What is this setup anyway? Smaller images, so often faster load times. Feature parity between architectures. Fullfils the interface (U)EFI and works fine in edk2. As I currently try to assemble a list of all the interfaces the kernel fullfils: How would you define this? Running this in u-boot is not (U)EFI, but something more strict, or there is a bug in the kernel decompressor. Bastian
- Vision Five 2 board: https://www.starfivetech.com/en/site/boards - Using U-Boot as the firmware - Booting is done through grub (grub-efi-riscv64 package) - Installed with debian-installer Smaller image is nice, but not mandatory. Other architectures also use uncompressed kernel. Feature parity, do you mean only with arm64 and loong64? EFI_ZBOOT is not enabled on other architectures. Just like the kernel before your change. The uncompressed kernel is a perfectly valid EFI binary that can be run under U-Boot with either Distro Boot and Grub or with the loadefi command. It can also be run under EDK2 either directly or also through Grub.
Hi,
Let me summarize the situation for external reviewers.
The kernel for riscv64 used to rely on CONFIG_EFI_STUB=y, enabling the
kernel to be used either as an EFI executable or as conventional ELF
file. Unlike x86, this requires the kernel to be uncompressed, which is
why it was shipped as vmlinux. Note that this is not the only
architecture where the kernel is uncompressed, this is also the case for
ppc64el and a many ports architectures.
Commit 16b5ae589a679 ("[arm64, riscv64] Enable EFI_ZBOOT") [1] changed
three things for riscv64:
1) Changed the kernel file that ends up in the package from the
uncompressed one (arch/riscv/boot/Image) to the compressed one
(arch/riscv/boot/vmlinuz.efi)
2) Enabled EFI_ZBOOT to compress the kernel payload and include a
decompressor in the EFI binary
3) Changed the kernel compression from GZIP to ZSTD
Note that technically changes 2 and 3 have basically no effect on the
resulting package without change 1. Please also note that change 1 was
done without renaming the kernel from vmlinux to vmlinuz to match the
(probably non-written) standard so to ship compressed kernels as vmlinuz
and uncompressed ones as vmlinux. OTOH such a change would have
probably broken many things.
This change was made without checking with the porters and without any
justification. I quickly noticed the commit and was worried about change
1, as it basically enforces UEFI booting. Although Debian Installer
defaults to a UEFI installation with the standard ISO media or UKI
image, it is technically possible to use a system booting directly from
U-Boot, which some users prefer (this is particularly useful for
switching between non-UEFI vendor kernels and debian kernels). In
addition a non-UEFI kernel is important for KVM, as it currently doesn't
support running in S-mode, therefore requiring a non-UEFI kernel to be
loaded directly without any firmware.
As a porter I requested on IRC for the riscv64 part of the code to be
reverted. I was told this is not possible, as Debian Installer does not
support non-UEFI, that this change will target forky only, and that I
can simply use a script to extract the payload from the UEFI kernel.
The situation worsened when I realized that the changes do not even work
on a real riscv64 board installed using the standard Debian installer:
| Loading Linux 6.13-riscv64 ...
| Loading initial ramdisk ...
| EFI stub: Decompressing Linux Kernel...
| Unhandled exception: Store/AMO access fault
| EPC: 00000000fb64a6ea RA: 00000000fb64a6da TVAL: 0000000040020020
| EPC: 000000003b9046ea RA: 000000003b9046da reloc adjusted
|
| Code: 0506 9526 4783 0015 4703 0005 3583 ed84 (0e23 fef9)
| UEFI image [0x00000000fe6aa000:0x00000000fe6d0fff] '/efi\boot\bootriscv64.efi'
| UEFI image [0x00000000fb646000:0x00000000fbe933ff] pc=0x46ea
|
|
| resetting ...
| reset not supported yet
| ### ERROR ### Please RESET the board ###
Sure this has been tested as mentioned in the MR [2], but it appears
that booting a kernel with QEMU + EDK2 is not comparable to booting a
kernel with a real board + U-Boot + Grub. I agree that there is an issue
in the firmware / bootloader / kernel stack (my current wild guess is
that it's a Grub issue), but still that change currently results in a
non-working kernel.
At this stage I have not seen a strong arguments for the original
commit. The reason that have been given a posteriori are:
- Smaller images, so often faster load times.
- Feature parity between architectures.
- Fullfils the interface (U)EFI and works fine in edk2.
I don't believe the above reasons are enough to enforce UEFI only
kernel and break the boot on existing boards. In addition the "forky
only" argument doesn't stand as many newer riscv64 devices are expected
during the lifetime of trixie and will require a kernel from
trixie-backports. That is why I submitted a MR [3] to revert the riscv64
specific part of the commit.
Regards
Aurelien
[1] https://salsa.debian.org/kernel-team/linux/-/commit/16b5ae589a679acbc9e43de9cb691f42fe058068
[2] https://salsa.debian.org/kernel-team/linux/-/merge_requests/1362
[3] https://salsa.debian.org/kernel-team/linux/-/merge_requests/1384
Linux both with zboot and without zboot are valid EFI binary. But zboot seems to uncover a bug in u-boot. So, now we have the options: - We target EFI, the decompressor is correct, then u-boot is broken. - We target EFI, the decompressor is invalue, then the kernel is broken. - We target u-boot restricted EFI, then we have to revert that for all three architectures. What we still can do is workaround this bug. But this is a defined state and requires both sides. Bastian
I digged a bit. Yes, this is the file from linux-image-6.13-riscv64_6.13.3-1~exp1_riscv64.deb. It contains the mentioned instructions: | 46da: 0506 slli a0,a0,0x1 | 46dc: 9526 add a0,a0,s1 | 46de: 00154783 lbu a5,1(a0) | 46e2: 00054703 lbu a4,0(a0) | 46e6: ed843583 ld a1,-296(s0) | 46ea: fef90e23 sb a5,-4(s2) I did not manage to get the crash you mentioned. The u-boot out of u-boot-qemu_2024.01+dfsg-7_all.deb can start both the uncompressed EFI file and the zboot compressed one. Sadly it fails unrelated shortly after that in both cases. Using the uncompressed file: | % qemu-system-riscv64 -m 1024 -nographic -machine virt -device virtio-rng-pci -bios ../qemu-riscv64/u-boot.bin -device loader,file=../../../../boot/plain,addr=0x84000000 | U-Boot 2024.01+dfsg-7 (Jan 09 2025 - 19:14:04 +0000) | CPU: rv64imafdch_zic64b_zicbom_zicbop_zicboz_ziccamoa_ziccif_zicclsm_ziccrse_zicntr_zicsr_zifencei_zihintntl_zihintpause_zihpm_zmmul_za64rs_zaamo_zalrsc_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_ssccptr_sscounterenw_sstc_sstvala_sstvecd_svadu_svvptc | Model: riscv-virtio,qemu | DRAM: 1 GiB | Core: 25 devices, 12 uclasses, devicetree: board | Flash: 32 MiB | Loading Environment from nowhere... OK | In: serial,usbkbd | Out: serial,vidconsole | Err: serial,vidconsole | No working controllers found | Net: No ethernet found. […] | => bootefi 0x84000000:0x1a61000 | No EFI system partition | No EFI system partition | Failed to persist EFI variables | Booting /MemoryMapped(0x0,0x84000000,0x1a61000) | EFI stub: Booting Linux Kernel... | EFI stub: Using DTB from configuration table | EFI stub: Exiting boot services... | Unhandled exception: Environment call from M-mode | EPC: 00000000baa1bd6c RA: 00000000baa1be9c TVAL: 0000000000000000 | EPC: 000000007b2ddd6c RA: 000000007b2dde9c reloc adjusted | | Code: 8562 85de 865a 86d6 8752 87ce 8866 88a6 (0073 0000) | UEFI image [0x00000000bc488000:0x00000000bdee8fff] Using the zboot compressed file: | % qemu-system-riscv64 -m 1024 -nographic -machine virt -device virtio-rng-pci -bios ../qemu-riscv64/u-boot.bin -device loader,addr=0x84000000,file=../../../../boot/vmlinux-6.13-riscv64 | U-Boot 2024.01+dfsg-7 (Jan 09 2025 - 19:14:04 +0000) […] | => bootefi 0x84000000:0x80d200 | No EFI system partition | No EFI system partition | Failed to persist EFI variables | Booting /MemoryMapped(0x0,0x84000000,0x80d200) | EFI stub: Decompressing Linux Kernel... | EFI stub: Using DTB from configuration table | EFI stub: Exiting boot services... | Unhandled exception: Environment call from M-mode | EPC: 000000008001bd6c RA: 000000008001be9c TVAL: 0000000000000000 | EPC: 00000000408ddd6c RA: 00000000408dde9c reloc adjusted | | Code: 8562 85de 865a 86d6 8752 87ce 8866 88a6 (0073 0000) | UEFI image [0x00000000bd69b000:0x00000000bdee83ff] The executed code is bogus, but identical both times. It lives at different adresses. Bastian
I have not been able to reproduce the crash under QEMU. I believe it could be due to the fact that QEMU doesn't trap unaligned accesses. So far I only reproduced the issue on real hardware. It works fine when the kernel is directly started from U-Boot with bootefi. It only fails when U-Boot launches Grub and Grub launches the EFI file. You should use OpenSBI as the bios, and U-Boot in S-mode as the kernel.
does not work. Also I found reports that it seems to work for others on this hardware.[1] So this whole ordeal is not a bug fix, but a workaround for another as yet not identified bug in either of the components. So, I see the following steps to see what the heck happens: - Upgrade u-boot. The version in Debian is one year old and several new releases exist since then. - Build u-boot with SHOW_REGS to see what exactly it failed on. The already shown TVAL register should contain the trapping address and that is pretty near to the loaded u-boot. - Try to find what this code is for. Sadly the Linux package does not retain debugging infos for the EFI wrappers. - Change the instruction into a trap to be able to see the same error in other environments and compare. Yeah, thanks, found that as well. With that Linux is able to boot correctly. Bastian [1]: At least I read the last lines in this log this way https://libera.irclog.whitequark.org/u-boot/2024-05-10
Breaks non-EFI systems. Isn't that like 95+% of arm64 boards?
Wrong.
Testing on real hardware seems useful ...
Someone said: "arm64 build *looks* good to me" (emphasis mine)
If it was tested on real hardware it would have said so and mentioned
on which hardware. It doesn't, so it's safe to assume it was NOT tested
on real hardware.
Indeed. You can configure QEMU to have the features you want/need. That
does not mean that all real boards support that.
That's due to compression. You can have compression without EFI.
Looking at https://github.com/edk2-porting I see the following repos:
- edk2-rk3588 ("EDK2 UEFI firmware for Rockchip RK3588 platforms")
- edk2-msm ("Broken edk2 port for Qualcomm platforms xD")
So there is *partial* support for some rk3588 based devices and broken
support for (some?) Qualcomm based devices. That's it.
Looking at the contributors for edk2-rk3588 I see there are *3* people
with more then 10 commits ... and one indicates he's inactive.
I haven't found any other indication it has some real momentum.
by upgrading Debian's 6.13.2 kernel (which works) to the 6.13.4 kernel.
FWIW/FTR: My Q64-A board has a self-compiled U-Boot 2024.10-rc6.
Aurelien indicated he wanted this bug to be about RISC-V, so I'll just
attach my serial log in case ppl want to see that.
TL;DR: My U-Boot found out that it CAN'T load Debian's 6.13.4 kernel and
tries the next one till it finds one which it can boot ...
which was my 6.13 kernel (without EFI_ZBOOT).
Most people use the bootloader/U-Boot that was shipped with the product
and never update it. I can understand why as the goal of the bootloader
is to boot the device, so when it does that ... why upgrade?
https://bugs.debian.org/1095745 is about broken backward compatibility
and that is a *kernel* bug.
My 0.02
First, we talk about riscv64, nor arm64. Second, which arm64 board can boot from nothing? The riscv64 installer only supports EFI. edk2 is the reference implementation. uboot is what everything else uses. This kernel (after unpacking) boots find on non-UEFI. So, what is the problem? u-boot can load it as zboot image, as also mentioned. grub(!) fails for some reason. Bastian
Control: clone -1 -2 Control: reassign -2 src:grub Control: retitle -2 grub - fails to start zboot linux on risvc64: Unhandled exception: Store/AMO access fault Control: severity -2 important So cloning the bug accordingly to grub. The kernel team intents to change riscv64 to zboot for forky, so this bug needs to be identified. Bastian
control: reassign -1 u-boot control: found -1 2024.01+dfsg-7 control: fixed -1 2025.01-1 I have upgraded u-boot to version 2025.01, and I can't reproduce the issue anymore. So I guess we can consider the issue fixed. Reassigning the bug accordingly. This means we now need to find a way for users to easily upgrade u-boot before that happens, so that they are able to reboot their board after a kernel upgrade. Regards Aurelien