#975490 u-boot-sunxi: A64-Olinuxino-eMMC boot stuck at "Starting kernel ..."

Package:
u-boot-sunxi
Source:
u-boot
Submitter:
Benedikt Spranger
Date:
2021-08-14 17:57:22 UTC
Severity:
critical
Tags:
#975490#5
Date:
2020-11-22 19:20:14 UTC
From:
To:
Dear Maintainer,

after a fresh install of Debian "bullseye" the first reboot got stuck
after  "Starting kernel ..."

It turend out that booting the system got always stuck using the a
"normal" u-boot boot sequence. Using extlinux or FIT-Images is not
affected.

boot.scr : FAIL
extlinux : OK
FIT-Image: OK

Since the Debian Installer provides neither extlinux configuration nor
build a FIT-Image the system is unusable after the reboot from the
installer.

I got into the same situation during an update on an other system to
bullseye.

Bootlog:
---8<---
U-Boot SPL 2020.10+dfsg-1+b1 (Nov 19 2020 - 03:18:11 +0000)
DRAM: 1024 MiB
Trying to boot from MMC2
NOTICE:  BL31: v2.3():
NOTICE:  BL31: Built : 05:17:48, Oct 18 2020
NOTICE:  BL31: Detected Allwinner A64/H64/R18 SoC (1689)
NOTICE:  BL31: Found U-Boot DTB at 0x4093968, model: Olimex
A64-Olinuxino-eMMC INFO:    ARM GICv2 driver initialized
INFO:    Configuring SPC Controller
INFO:    PMIC: Probing AXP803 on RSB
INFO:    PMIC: Enabling DRIVEVBUS
INFO:    PMIC: dcdc1 voltage: 3.300V
INFO:    PMIC: dcdc5 voltage: 1.360V
INFO:    PMIC: dcdc6 voltage: 1.100V
INFO:    PMIC: dldo1 voltage: 3.300V
INFO:    PMIC: dldo2 voltage: 3.300V
INFO:    PMIC: dldo3 voltage: 2.800V
INFO:    PMIC: dldo4 voltage: 3.300V
INFO:    PMIC: fldo1 voltage: 1.200V
INFO:    PMIC: Enabling DC SW
INFO:    BL31: Platform setup done
INFO:    BL31: Initializing runtime services
INFO:    BL31: cortex_a53: CPU workaround for 843419 was applied
INFO:    BL31: cortex_a53: CPU workaround for 855873 was applied
NOTICE:  PSCI: System suspend is unavailable
INFO:    BL31: Preparing for EL3 exit to normal world
INFO:    Entry point address = 0x4a000000
INFO:    SPSR = 0x3c9
alloc space exhausted


U-Boot 2020.10+dfsg-1+b1 (Nov 19 2020 - 03:18:11 +0000) Allwinner
Technology

CPU:   Allwinner A64 (SUN50I)
Model: Olimex A64-Olinuxino-eMMC
DRAM:  1 GiB
MMC:   mmc@1c0f000: 0, mmc@1c10000: 2, mmc@1c11000: 1
Loading Environment from FAT... Unable to use mmc 1:1... In:    serial
Out:   serial
Err:   serial
Net:   phy interface7
eth0: ethernet@1c30000
starting USB...
Bus usb@1c1a000: USB EHCI 1.00
Bus usb@1c1a400: USB OHCI 1.0
Bus usb@1c1b000: USB EHCI 1.00
Bus usb@1c1b400: USB OHCI 1.0
scanning bus usb@1c1a000 for devices... 1 USB Device(s) found
scanning bus usb@1c1a400 for devices... 1 USB Device(s) found
scanning bus usb@1c1b000 for devices... 1 USB Device(s) found
scanning bus usb@1c1b400 for devices... 1 USB Device(s) found
       scanning usb for storage devices... 0 Storage Device(s) found
Hit any key to stop autoboot:  0
switch to partitions #0, OK
mmc1(part 0) is current device
Scanning mmc 1:1...
Found U-Boot script /boot.scr
2225 bytes read in 2 ms (1.1 MiB/s)
## Executing script at 4fc00000
22744944 bytes read in 1003 ms (21.6 MiB/s)
28403 bytes read in 5 ms (5.4 MiB/s)
30071341 bytes read in 1326 ms (21.6 MiB/s)
Booting Debian 5.9.0-2-arm64 from mmc 1:1...
Moving Image from 0x40080000 to 0x40200000, end=41850000
## Flattened Device Tree blob at 4fa00000
   Booting using the fdt blob at 0x4fa00000
EHCI failed to shut down host controller.
   Loading Ramdisk to 48352000, end 49fffa2d ... OK
   Loading Device Tree to 0000000048348000, end 0000000048351ef2 ... OK

Starting kernel ...
---8<---

Regards
    Benedikt Spranger

#975490#10
Date:
2020-11-22 22:34:33 UTC
From:
To:
Thanks for the bug report...

Very surprising that extlinux would work but boot.scr would not; they
almost certainly use the same load addresses...

This symptom is sometimes related to the kernel or device tree or
initramfs overwriting the load address of one of the other values.

Can you get to a u-boot prompt and:

  printenv fdt_addr_r kernel_addr_r ramdisk_addr_r


Could you downgrade to the 2020.10+dfsg-1 version from snapshot.debian.org
and see if that has the same issue?

What other board?

I wish flash-kernel were more verbose about which files it is
loading... are there other similar variants to this board that require a
different device-tree and is the boot.scr loading the correct one?

Maybe add some debugging into the boot.scr used in /etc/flash-kernel/

I'll test on a few of my systems to see if I can reproduce the issue.


live well,
  vagrant

#975490#15
Date:
2020-11-23 01:18:23 UTC
From:
To:

#975490#22
Date:
2020-11-23 13:29:18 UTC
From:
To:
You're welcome!
I was astonished.
BTST :)

=> printenv fdt_addr_r kernel_addr_r ramdisk_addr_r
fdt_addr_r=0x4FA00000
kernel_addr_r=0x40080000
ramdisk_addr_r=0x4FE00000
=>

Done.
extlinux: booting
boot.scr: failed
---8<--- U-Boot SPL 2020.10+dfsg-1 (Oct 05 2020 - 19:13:28 +0000) DRAM: 1024 MiB Trying to boot from MMC2 NOTICE: BL31: v2.3(): NOTICE: BL31: Built : 05:17:48, Oct 18 2020 NOTICE: BL31: Detected Allwinner A64/H64/R18 SoC (1689) NOTICE: BL31: Found U-Boot DTB at 0x4093960, model: Olimex A64-Olinuxino-eMMC INFO: ARM GICv2 driver initialized INFO: Configuring SPC Controller INFO: PMIC: Probing AXP803 on RSB INFO: PMIC: Enabling DRIVEVBUS INFO: PMIC: dcdc1 voltage: 3.300V INFO: PMIC: dcdc5 voltage: 1.360V INFO: PMIC: dcdc6 voltage: 1.100V INFO: PMIC: dldo1 voltage: 3.300V INFO: PMIC: dldo2 voltage: 3.300V INFO: PMIC: dldo3 voltage: 2.800V INFO: PMIC: dldo4 voltage: 3.300V INFO: PMIC: fldo1 voltage: 1.200V INFO: PMIC: Enabling DC SW INFO: BL31: Platform setup done INFO: BL31: Initializing runtime services INFO: BL31: cortex_a53: CPU workaround for 843419 was applied INFO: BL31: cortex_a53: CPU workaround for 855873 was applied NOTICE: PSCI: System suspend is unavailable INFO: BL31: Preparing for EL3 exit to normal world INFO: Entry point address = 0x4a000000 INFO: SPSR = 0x3c9 alloc space exhausted U-Boot 2020.10+dfsg-1 (Oct 05 2020 - 19:13:28 +0000) Allwinner Technology CPU: Allwinner A64 (SUN50I) Model: Olimex A64-Olinuxino-eMMC DRAM: 1 GiB MMC: mmc@1c0f000: 0, mmc@1c10000: 2, mmc@1c11000: 1 Loading Environment from FAT... Unable to use mmc 1:1... In: serial Out: serial Err: serial Net: phy interface7 eth0: ethernet@1c30000 starting USB... Bus usb@1c1a000: USB EHCI 1.00 Bus usb@1c1a400: USB OHCI 1.0 Bus usb@1c1b000: USB EHCI 1.00 Bus usb@1c1b400: USB OHCI 1.0 scanning bus usb@1c1a000 for devices... 1 USB Device(s) found scanning bus usb@1c1a400 for devices... 1 USB Device(s) found scanning bus usb@1c1b000 for devices... 1 USB Device(s) found scanning bus usb@1c1b400 for devices... 1 USB Device(s) found scanning usb for storage devices... 0 Storage Device(s) found Hit any key to stop autoboot: 0 switch to partitions #0, OK mmc1(part 0) is current device Scanning mmc 1:1... Found U-Boot script /boot.scr 2417 bytes read in 1 ms (2.3 MiB/s) ## Executing script at 4fc00000 bootargs: bootargs=console=ttyS0,115200 quiet fk_kvers: fk_kvers=5.9.0-2-arm64 fdtpath: fdtpath=dtbs/5.9.0-2-arm64/allwinner/sun50i-a64-olinuxino-emmc.dtb partition: partition=1 addr: fdt_addr_r=0x4FA00000 kernel_addr_r=0x40080000 ramdisk_addr_r=0x4FE00000 22744944 bytes read in 1004 ms (21.6 MiB/s) 28403 bytes read in 4 ms (6.8 MiB/s) 30071341 bytes read in 1326 ms (21.6 MiB/s) Booting Debian 5.9.0-2-arm64 from mmc 1:1... Moving Image from 0x40080000 to 0x40200000, end=41850000 ## Flattened Device Tree blob at 4fa00000 Booting using the fdt blob at 0x4fa00000 EHCI failed to shut down host controller. Loading Ramdisk to 48352000, end 49fffa2d ... OK Loading Device Tree to 0000000048348000, end 0000000048351ef2 ... OK Starting kernel ... ---8<--- Also an Olimex A64-Olinuxino-eMMC. Did an update from a running Debian-Image provided by Olimex. Therefore the Problem exists on: 1) Olimex A64-Olinuxino-eMMC Fresh Debian Install (bullseye) as described above. 2) Olimex A64-Olinuxino-eMMC Update to bullseye from a Debian-Image provided by Olimex. I can try to get earlyprintk running. OK. Will do. OK. Thx Bene
#975490#27
Date:
2020-11-28 02:34:13 UTC
From:
To:
...

Just to be clear, you're using the serial console at ttyS0 at 115200
baud? Or are you expecting HDMI output or some other console output?

If you're not expecting to be using the serial console, the problem may
be:

https://bugs.debian.org/969070


live well,
  vagrant

#975490#32
Date:
2020-11-28 03:07:59 UTC
From:
To:
...
...
...

I can confirm similar behavior on a pinebook, although the kernel does
boot and actually load, and eventually displays on the LCD display (if I
"setenv console" from u-boot commandline). It even responds
appropriately to ctrl-alt-delete, so it is not a completely hung
kernel...

It definitely gets as far as the initrd for me, as setting break=top in
the boot arguments stalls out with a blank screen, and setting
break=premount it actually manages to load the LCD and keyboard drivers
... and then stalls out before giving me a shell where it would be
easier to do some actualy debugging... :/

So I doubt it is u-boot misbehaving, but have not yet identified where
exactly the problem is... though I have some suspicions in the boot
script and thus flash-kernel.


live well,
  vagrant

#975490#37
Date:
2020-11-28 11:04:09 UTC
From:
To:
On Fri, 27 Nov 2020 18:34:13 -0800 Vagrant Cascadian <vagrant@debian.org> wrote:
Yes, I am using the serial console.
I am old fashioned serial console guy and I am happy not to get stuck
on 9600 ;)
No HDMI here - please notice, if I should check. I am expecting output
on a serial console.

Regards
    Bene

#975490#42
Date:
2021-01-05 04:27:51 UTC
From:
To:
With a locally built version of 2020.10+dfsg-2, I can no longer
reproduce this issue at all.

Could you try with the new version?


live well,
  vagrant

#975490#47
Date:
2021-02-10 08:37:01 UTC
From:
To:
Hi,

Testing/unstable now has version 2021.01+dfsg-2. Benedikt, could you try this
version to see if the issue is still there?

Thanks,

Ivo

#975490#52
Date:
2021-04-16 12:25:27 UTC
From:
To:
On a Lamobo R1, I can verify 2021.01 versions not to boot with a default environment. However,
2020.10+dfsg-2 boots. Even though the original issue has the same outcome, I guess it is caused by
something else. I figured out my problem is caused by
https://github.com/u-boot/u-boot/commit/f3866909e35074ea6f50226d40487a180de1132f. The
boot_efi_bootmgr will run and read a bad dtb, which makes a boot.scr boot fail.

The issue is fixed in 2021.04 (experimental) which has the same default environment as 2021.01.

#975490#57
Date:
2021-04-16 15:06:00 UTC
From:
To:
I can confirm this on the Lamobo R1, when rootfs is on scsi0 and it
first attempts to boot from microSD (failing to find boot_efi on
mmc0).

If I force it to boot from scsi0 first by interrupting the boot to get
to a u-boot shell, and typing "setenv boot_targets scsi0", it worked
fine with 2021.01 (e.g. it didn't hit the bootefi codepath) as well.

Booting the debian-installer image from
https://d-i.debian.org/daily-images/armhf/20210416-00:15/netboot/SD-card-images/
worked for me (which currently uses 2021.01).

That said, I'm not sure if this is the same issue as in the original
report, as the symptom stuck at "Starting kernel ..." can be caused by a
wide variety of issues...


live well,
  vagrant

#975490#62
Date:
2021-04-16 15:06:37 UTC
From:
To:
Am 16.04.21 um 14:25 schrieb Bastian Germann:

The upstream commit that fixed this is
https://github.com/u-boot/u-boot/commit/82d01f04facef1276cede067efd02d2a731ffe83

It applies cleanly on 2021.01+dfsg-4. Please include it or change the config for the affected boards
not to try EFI bootmanager.

#975490#65
Date:
2021-04-16 15:06:00 UTC
From:
To:
I can confirm this on the Lamobo R1, when rootfs is on scsi0 and it
first attempts to boot from microSD (failing to find boot_efi on
mmc0).

If I force it to boot from scsi0 first by interrupting the boot to get
to a u-boot shell, and typing "setenv boot_targets scsi0", it worked
fine with 2021.01 (e.g. it didn't hit the bootefi codepath) as well.

Booting the debian-installer image from
https://d-i.debian.org/daily-images/armhf/20210416-00:15/netboot/SD-card-images/
worked for me (which currently uses 2021.01).

That said, I'm not sure if this is the same issue as in the original
report, as the symptom stuck at "Starting kernel ..." can be caused by a
wide variety of issues...


live well,
  vagrant

#975490#72
Date:
2021-05-07 20:28:58 UTC
From:
To:
Any news on this? It would be great if that patch would be integrated
into the bullseye version.

#975490#77
Date:
2021-05-07 21:09:26 UTC
From:
To:
Control: clone 975490 -1
Control: retitle -1 bootefi causes boot failure with boot.scr
Control: tags -1 + fixed-upstream
Control: tags -1 + patch
Control: severity -1 important

Different enough to warrant it's own bug, cloning...

This would definitely be good to fix in bullseye, but this is quite late
in the release cycle.

Will need to test failure and success cases with and without EFI as well
as boot.scr and extlinux.conf to make sure this doesn't cause
regressions in other boot paths...

An ugly workaround in the meantime would be to add a no-op boot.scr on
the media (e.g. mmc0) and then fall back to the other boot methods.


live well,
  vagrant

#975490#86
Date:
2021-05-18 07:56:29 UTC
From:
To:
This is from the cloned bug but belongs here.

Am 18.05.21 um 08:31 schrieb Heinrich Schuchardt:

#975490#91
Date:
2021-05-23 20:50:24 UTC
From:
To:
Control: retitle 975490 u-boot-sunxi: A64-Olinuxino-eMMC boot stuck at "Starting kernel ..."
Control: tags 975490 moreinfo

I've uploaded 2021.01+dfsg-5 to unstable that fixes a bug with similar
symptoms (Bug#988217); would you be able to test this version?

There is also a version in experimental that would be good to see if the
issue is fixed for you.

Thanks!

live well,
  vagrant