#1076564 pahole BTF processing seems flaky on powerpc

Package:
pahole
Source:
pahole
Description:
set of advanced DWARF utilities
Submitter:
Ben Hutchings
Date:
2024-07-24 16:42:02 UTC
Severity:
normal
Tags:
#1076564#5
Date:
2024-07-18 21:56:09 UTC
From:
To:
Several kernel builds on powerpc have failed recently:

6.8.12-1:    https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpc&ver=6.8.12-1&stamp=1717234422&raw=1
6.9.9-1:     https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpc&ver=6.9.9-1&stamp=1720906547&raw=1
6.10-1~exp1: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpc&ver=6.10-1%7Eexp1&stamp=1721287862&raw=1

Note, these log files are up to 270 MB in size and should be
downloaded; at least Firefox becomes unresponsive when trying to
display them.

For each of these, the failure seems to start with an error from
pahole such as:

    [102044] ARRAY (anon) type_id=99491 index_type_id=14 nr_elems=12 Error emitting BTF type
    Encountered error while encoding BTF.

This does *not* happen consistently.  Compare these successful
builds:

6.8.12-1:    https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpc&ver=6.8.12-1&stamp=1717278092&raw=1
- This same version failed to build on the first try.
6.9.7-1:     https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpc&ver=6.9.7-1&stamp=1719538806&raw=1
- Earlier and later 6.9.x versions failed to build.

Both pahole versions 1.26-1 and 1.27-1 have been used in both
successful and failing builds.

Ben.

#1076564#12
Date:
2024-07-19 06:53:24 UTC
From:
To:
Hi Arnaldo,

Does the above error ring any bell?

Is there anything I can do to help?

#1076564#24
Date:
2024-07-19 19:13:00 UTC
From:
To:
Adding Alan and Jiri to the CC list.

Nope
https://buildd.debian.org/status/fetch.php?pkg=linux&arch=powerpc&ver=6.10-1%7Eexp1&stamp=1721287862&raw=1
file:

+ LLVM_OBJCOPY=powerpc-linux-gnu-objcopy pahole -J -j --btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func --lang_exclude=rust .tmp_vmlinux.btf

Can I have access to that .tmp_vmlinux.btf file so that I can try to
reproduce it here?

- Arnaldo

#1076564#29
Date:
2024-07-19 21:20:54 UTC
From:
To:
CCing debian-kernel and debian-powerpc

I don't have access to the build host (blaauw2) and I've some doubts
it would still have that file.

Maybe our kernel team and powerpc porters have suggestions?

Dom

#1076564#34
Date:
2024-07-20 07:16:24 UTC
From:
To:
Hi Domenico,

I have root access to all powerpc/ppc64 machines (buildds and porterbox).

I'm cleaning up the porterbox now, disk is quite full, then you can try
to build the kernel package on perotto.debian.net or I can try it myself.

I have seen the bug myself and I wanted to debug it, but the attempt was
foiled by the fact that the disk on perotto is full (again).

Will take care of it and let you know when it's (some hours).

Adrian

#1076564#39
Date:
2024-07-20 07:47:59 UTC
From:
To:
Hi!

That's great, thank you.

#1076564#44
Date:
2024-07-20 19:17:25 UTC
From:
To:
I had a go yesterday and ran into the same problem.  I couldn't
reproduce with a small kernel config (allnoconfig + BPF + DEBUG_INFO +
DEBUG_INFO_BTF) and there wasn't enough disk space to build even one of
the Debian kernel flavours.

Thank you!

Ben.

#1076564#49
Date:
2024-07-20 19:45:17 UTC
From:
To:
There are now 120 GB of free disk space. Let me know if that's sufficient
or whether I need to clean up more, probably asking others to clean up
their home directories.

Adrian

#1076564#54
Date:
2024-07-21 22:07:06 UTC
From:
To:
I've now done 10 kernel builds on perotto (4 builds of just the
"powerpc" flavour and then 2 builds of all 3 flavours) and not
reproduced this.  I'm thinking this may be machine-dependent in some
way.

Looking again at all the build logs since DEBUG_INFO_BTF was enabled
for powerpc, we have:

Successes:

Version        Builder
----------------------------------
6.9.10-1       debian-project-be-2
6.9.7-1        debian-project-be-1
6.8.12-1       debian-project-be-2
6.8.11-1       blaauw
6.8.9-1        debian-project-be-2
6.7.12-1       blaauw
6.7.9-2        debian-project-be-2
6.7.4-1~exp1   blaauw
6.7.1-1~exp1   debian-project-be-2
6.6.15-1       blaauw
6.6.13-1       blaauw
6.6.11-1       debian-project-be-1
6.6.9-1+b1     debian-project-be-1
6.6.8-1        debian-project-be-2
6.6.4-1~exp1   blaauw
6.5.13-1       debian-project-be-1
6.5.10-1       debian-project-be-2
6.5.8-1        blaauw
6.5.6-1        debian-project-be-2
6.5.3-1        blaauw
6.5~rc7-1~exp1 debian-project-be-1
6.5~rc6-1~exp1 blaauw
6.4.13-1       blaauw
6.4.11-1       blaauw
6.4.4-3        blaauw

Failures:

Version        Builder             Failure mode
----------------------------------------------------------------------------
6.10-1~exp1    blaauw              this bug
6.9.9-1        blaauw              this bug
6.9.8-1        kapitsa             this bug
6.9.2-1~exp1   blaauw              this bug
6.8.12-1       kapitsa             this bug
6.7-1~exp1     debian-project-be-2 compiler OOM; not powerpc-specific
6.6.3-1~exp1   blaauw              kernel-wedge failed; not powerpc-specific
6.4.4-2        blaauw              out of disk space

Ignoring the unrelated failures, kapitsa has a 0% success rate (but
with only 2 attempts), blaauw an 80% success rate, and debian-project-
be-{1,2} have 100% success rates.

I don't know what differences there are between these builders that
might be relevant.

Ben.

#1076564#59
Date:
2024-07-22 04:57:56 UTC
From:
To:
Hi Ben,

For kapitsa, the installed host system is powerpc while all the others
run the ppc64 port.

As for the hardware:

kapitsa runs bare-metal (inside an LPAR) on a POWER8 machine:

root@kapitsa:~# grep model /proc/cpuinfo
model           : IBM,8284-22A
root@kapitsa:~#

Both blaauw and perotto are KVMs running on watson which runs
Debian's ppc64el port (little-endian):

root@watson:~# grep model /proc/cpuinfo
model           : 8247-42L
root@watson:~#

root@blaauw:~# grep model /proc/cpuinfo
model           : IBM pSeries (emulated by qemu)
root@blaauw:~#

root@perotto:~# grep model /proc/cpuinfo
model           : IBM pSeries (emulated by qemu)
root@perotto:~#

Both debian-project-be-01 and debian-project-be-02 are KVMs running
on OpenStack at OSUOSL's OpenPOWER platform:

root@debian-project-be-1:~# grep model /proc/cpuinfo
model           : IBM pSeries (emulated by qemu)
root@debian-project-be-1:~#

root@debian-project-be-2:~# grep model /proc/cpuinfo
model           : IBM pSeries (emulated by qemu)
root@debian-project-be-2:~#

Adrian

#1076564#64
Date:
2024-07-24 10:04:51 UTC
From:
To:
I'm late chiming in on this one, but judging by the output:

  BTF     .btf.vmlinux.bin.o
+ LLVM_OBJCOPY=powerpc-linux-gnu-objcopy pahole -J -j
--btf_features=encode_force,var,float,enum64,decl_tag,type_tag,optimized_func,consistent_func
--lang_exclude=rust .tmp_vmlinux.btf
[102044] ARRAY (anon) type_id=99491 index_type_id=14 nr_elems=12 Error
emitting BTF type
Encountered error while encoding BTF.


...we hit an error in btf_encoder__add_array() as a result of
btf__add_array() failing:

btf__log_err(btf, BTF_KIND_ARRAY, NULL, true,
                              "type_id=%u index_type_id=%u nr_elems=%u
Error emitting BTF type",
                              type, index_type, nelems);


Unfortunately we don't preserve the negative id value (containing the
error code) in btf__log_err(); I'm thinking one thing we should do is
modify btf__log_err() to preserves errors for cases where the encoding
errors out due to a libbpf-returned -errno, something like

#1076564#69
Date:
2024-07-24 16:29:06 UTC
From:
To:
I agree completely that the error reporting we have is lacking, we
better go and add extra info for these cases so that we can more quickly
get a clue of what is taking place, so please submit patches for that
and I'll consider them.

Thanks,

- Arnaldo