#1010365 linux: failure to boot on Raspberry Pi Compute Module 4 (black screen)

Package:
src:linux
Source:
linux
Submitter:
Cyril Brulebois
Date:
2022-05-15 10:48:05 UTC
Severity:
important
Tags:
#1010365#5
Date:
2022-04-29 15:58:54 UTC
From:
To:
Source: linux
Version: 5.17.3-1
Severity: important
X-Debbugs-Cc: raspi-firmware@packages.debian.org

Hi,

In the process of testing patches for the Raspberry Pi Compute Modules
(CM3 and CM4), for bullseye[1][2] and bookworm[2], I discovered that
bookworm images don't boot on the CM4.

 1. https://bugs.debian.org/1010317
 2. https://bugs.debian.org/996937

The usual start-up rainbow is displayed, the screen turns to black and
nothing happens. My first stop was trying to downgrade the bootloader
(shipped by the raspi-firmware package) to the bullseye's version, but
that didn't help.

Then I moved to starting from a bullseye image (which boots), upgrading
the raspi-firmware package, that still boots.

Then I deployed 5.16.18-1 (from snapshot.debian.org), that still boots.

Then I deployed 5.17.3-1, and it broke booting.

I'll try and pinpoint when it broke using the various intermediary
versions:

 - 5.17~rc3-1~exp1
 - 5.17~rc4-1~exp1
 - 5.17~rc5-1~exp1
 - 5.17~rc6-1~exp1
 - 5.17~rc7-1~exp1
 - 5.17~rc8-1~exp1
 - 5.17.1-1~exp1

and then try to figure out what broke exactly. Contrary to my earlier
efforts to introduce support for that hardware a few months ago, I
haven't been following upstream changes recently, so I'll need to catch
up.


Cheers,

#1010365#10
Date:
2022-04-29 17:35:15 UTC
From:
To:
Cyril Brulebois <cyril@debamax.com> (2022-04-29):


Checking the upstream diff, nothing obvious on the DTB side. Trying to
use 5.16.18-1's DTB with 5.17~rc3-1~exp1 kernel didn't help anyway.

I've also tried latest mainline: v5.18-rc4-192-g38d741cb70b3

built with:

    cp ~/config-5.17.0-1-arm64 .config
    time PATH=/usr/lib/ccache:$PATH make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- oldconfig       # accept everything
    time PATH=/usr/lib/ccache:$PATH make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- bindeb-pkg -j32

and the symptoms are the same: black screen at start-up.

I've also checked the serial console (which is confirmed to work if I
boot 5.16.18-1), and I'm not getting anything there either, with either
5.17~rc3-1~exp1 or my local v5.18-rc4-192-g38d741cb70b3 build.


Cheers,

#1010365#17
Date:
2022-04-30 05:10:54 UTC
From:
To:
Cyril Brulebois <cyril@debamax.com> (2022-04-29):

Using the same base image as before, and only updating the kernel: I've
tested upstream builds, starting from the .config found in the Debian
5.16.18-1 package, using oldconfig and accepting everything by default:

 - v5.16 is confirmed a first good;
 - v5.17-rc1 is confirmed a first bad;
 - the culprit seems to be 3ceff4ea07410763d5d4cccd60349bf7691e7e61


Here's the git bisect log:

    git bisect start
    # good: [df0cc57e057f18e44dac8e6c18aba47ab53202f9] Linux 5.16
    git bisect good df0cc57e057f18e44dac8e6c18aba47ab53202f9
    # bad: [e783362eb54cd99b2cac8b3a9aeac942e6f6ac07] Linux 5.17-rc1
    git bisect bad e783362eb54cd99b2cac8b3a9aeac942e6f6ac07
    # good: [fef8dfaea9d6c444b6c2174b3a2b0fca4d226c5e] Merge tag 'regulator-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
    git bisect good fef8dfaea9d6c444b6c2174b3a2b0fca4d226c5e
    # bad: [3ceff4ea07410763d5d4cccd60349bf7691e7e61] Merge tag 'sound-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
    git bisect bad 3ceff4ea07410763d5d4cccd60349bf7691e7e61
    # good: [57ea81971b7296b42fc77424af44c5915d3d4ae2] Merge tag 'usb-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
    git bisect good 57ea81971b7296b42fc77424af44c5915d3d4ae2
    # good: [feb7a43de5ef625ad74097d8fd3481d5dbc06a59] Merge tag 'irq-msi-2022-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
    git bisect good feb7a43de5ef625ad74097d8fd3481d5dbc06a59
    # good: [10674ca9ea02491fd3f8ffe303861b7a6837994b] ASoC/SoundWire: improve suspend flows and use set_stream() instead of set_tdm_slots() for HDAudio
    git bisect good 10674ca9ea02491fd3f8ffe303861b7a6837994b
    # good: [c77b1f8a8faeeba43c694d9d09d0b25a4f52cf37] scsi: mpi3mr: Bump driver version to 8.0.0.61.0
    git bisect good c77b1f8a8faeeba43c694d9d09d0b25a4f52cf37
    # good: [f66229aa355f7e0dc0dc20cbc1f4d45c3176eed2] Merge tag 'asoc-v5.17-2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
    git bisect good f66229aa355f7e0dc0dc20cbc1f4d45c3176eed2
    # good: [59aa7fcfe2e44afbe9736e5cfa941699021d6957] IB/mthca: Use memset_startat() for clearing mpt_entry
    git bisect good 59aa7fcfe2e44afbe9736e5cfa941699021d6957
    # good: [18451db82ef7f943c60a7fce685f16172bda5106] RDMA/core: Calculate UDP source port based on flow label or lqpn/rqpn
    git bisect good 18451db82ef7f943c60a7fce685f16172bda5106
    # good: [1f43e5230aebb17aea35238dc26e297a61095ac0] mailbox: qcom-ipcc: Support more IPCC instance
    git bisect good 1f43e5230aebb17aea35238dc26e297a61095ac0
    # good: [747c19eb7539b5e6bb15ed57a0a14ebf9f3adb8e] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
    git bisect good 747c19eb7539b5e6bb15ed57a0a14ebf9f3adb8e
    # good: [e1a7aa25ff45636a6c1930bf2430c8b802e93d9c] Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
    git bisect good e1a7aa25ff45636a6c1930bf2430c8b802e93d9c
    # good: [19980aa10d2d944ed8fe345ce2eb87c2cb4bedf8] ALSA: hda: intel-dsp-config: add JasperLake support
    git bisect good 19980aa10d2d944ed8fe345ce2eb87c2cb4bedf8
    # good: [081c73701ef0c2a4f6a127da824a641ae6505fbe] ALSA: hda: intel-dsp-config: reorder the config table
    git bisect good 081c73701ef0c2a4f6a127da824a641ae6505fbe
    # first bad commit: [3ceff4ea07410763d5d4cccd60349bf7691e7e61] Merge tag 'sound-5.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound


I'll try and find out more in a couple of hours, and get in touch with
upstream.


Cheers,

#1010365#22
Date:
2022-04-30 09:22:36 UTC
From:
To:
Cyril Brulebois <cyril@debamax.com> writes:
merge bug.

I looked briefly at what was merged there, and I believe this commit
stands out as suspicious:

bjorn@miraculix:/usr/local/src/git/linux$ git show f59f6aaead97
commit f59f6aaead975f0ec4d8ff2d59c4ffb8cf0127b2
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Mon Nov 22 23:21:56 2021 +0100

    mmc: bcm2835: stop setting chan_config->slave_id

    The field is not interpreted by the DMA engine driver, as all the data
    is passed from devicetree instead. Remove the assignment so the field
    can eventually be deleted.

    Reviewed-by: Nicolas Saenz Julienne <nsaenz@kernel.org>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
    Acked-by: Mark Brown <broonie@kernel.org>
    Link: https://lore.kernel.org/r/20211122222203.4103644-5-arnd@kernel.org
    Signed-off-by: Vinod Koul <vkoul@kernel.org>

diff --git a/drivers/mmc/host/bcm2835.c b/drivers/mmc/host/bcm2835.c
index 8c2361e66277..463b707d9e99 100644
--- a/drivers/mmc/host/bcm2835.c
+++ b/drivers/mmc/host/bcm2835.c
@@ -1293,14 +1293,12 @@ static int bcm2835_add_host(struct bcm2835_host *host)

                host->dma_cfg_tx.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
                host->dma_cfg_tx.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
-               host->dma_cfg_tx.slave_id = 13;         /* DREQ channel */
                host->dma_cfg_tx.direction = DMA_MEM_TO_DEV;
                host->dma_cfg_tx.src_addr = 0;
                host->dma_cfg_tx.dst_addr = host->phys_addr + SDDATA;

                host->dma_cfg_rx.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
                host->dma_cfg_rx.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
-               host->dma_cfg_rx.slave_id = 13;         /* DREQ channel */
                host->dma_cfg_rx.direction = DMA_DEV_TO_MEM;
                host->dma_cfg_rx.src_addr = host->phys_addr + SDDATA;
                host->dma_cfg_rx.dst_addr = 0;


But I'm basing that only on it being related to the bcm28/27xx SoCs and
a bit unexpected in the sound merge...  I cannot explain why this mmc
host driver change should affect your display.  Could be completely
wrong.  But migt be worth testing?



Bjørn

#1010365#27
Date:
2022-04-30 19:55:47 UTC
From:
To:
Hi Bjørn,

Bjørn Mork <bjorn@mork.no> (2022-04-30):

Yeah, I skipped a bunch of details in my last mail since I've tried
various things (including reverting that one I spotted, plus the few
commits around it since it was part of removing that field altogether)
but didn't get any consistent results.

My methodology was probably fragile since I worked incrementally, and I
suppose I got some wires crossed at some point. Sorry for the confusion.


I've redone this entirely, and here are better (and reproducible, this
time) findings:

 - 830aa6f29f07a4e2f1a947dfa72b3ccddb46dd21 breaks the boot, leading to
   a kernel panic very early in the boot process; I'm seeing the trace
   on the screen, not on the serial console. It involves the modified
   brcm_pcie_driver_init() function, so that's quite consistent.

 - 87c71931633bd15e9cfd51d4a4d9cd685e8cdb55 is the last commit
   exhibiting the kernel panic (further in that branch, before it gets
   merged into mainline).

 - 88db8458086b1dcf20b56682504bdb34d2bca0e2 is the last commit that lets
   the CM4 boots properly.

 - d0a231f01e5b25bacd23e6edc7c979a18a517b2b, which is the merge of the
   last two aforementioned commits, is the first one that results in
   a completely black screen (no kernel panic displayed), and still
   nothing on the serial console. It seems to me that the kernel panic
   escalates into a more serious issue after this merge. I note there
   are conflict resolutions about drivers/pci/controller/pcie-brcmstb.c
   in that commit.


No luck with latest master. I've filed this upstream (see link above).


Cheers,

#1010365#36
Date:
2022-05-15 10:30:02 UTC
From:
To:
Hello,

I have a seeed studio CM4 dual gigabit board that I have installed Raspi
Debian on it
https://raspi.debian.net/tested/20220121_raspi_4_bookworm.img.xz .


System works fine but does not reboot.
If I reboot it hangs instead and I push the small reset button to
restart it (or unplug + plug).

I configured wireless AP and ran apt update + apt upgrade on the system.
It upgraded kernel and raspberry-firmware and then it did not boot again.

I had to start in boot mode again and reflash the debian image on the
system to make it work.

Working system runs linux 5.15.0-2-arm64 - 5.15.5.-2 .
Raspi firmware is 1.20210805+ds-1


For some background, I flashed OpenWrt 22.xx-rc1 before Debian and
reboot worked.
https://github.com/sergey-brutsky/openwrt-seeed-carrier-board/issues/1 .