#1068350 musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie

Package:
musl
Source:
musl
Description:
standard C library
Submitter:
Thorsten Glaser
Date:
2024-04-06 15:21:02 UTC
Severity:
normal
Tags:
#1068350#5
Date:
2024-04-03 22:49:38 UTC
From:
To:
||/ Name           Version      Architecture Description
+++-==============-============-============-==========================================
ii  binutils       2.42-4       s390x        GNU assembler, linker and binary utilities
ii  gcc            4:13.2.0-7   s390x        GNU C compiler
ii  gcc-13         13.2.0-23    s390x        GNU C compiler

For some reason, mksh built with static-pie behaves bogus:

(sid_s390x-dchroot)tg@zelenka:~/mksh-59c$ env -i ./builddir/static-musl/mksh -c 'echo hi'
typeset EPOCHREALTIME
typeset IFS
typeset PATH
typeset PATHSEP
typeset PS2
typeset PS3
typeset PS4
typeset PWD
typeset -i SECONDS
typeset TMOUT
hi

If I build without static-pie, just static, things work:

(sid_s390x-dchroot)tg@zelenka:~/mksh-59c$ env -i ./builddir/static-musl/mksh -c 'echo hi'
hi

If I replace the -static with -fPIE -pie (and build the .o files with -fPIE):

(sid_s390x-dchroot)tg@zelenka:~/mksh-59c/builddir/static-musl$ file mksh
mksh: ELF 64-bit MSB pie executable, IBM S/390, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-s390x.so.1, with debug_info, not stripped
(sid_s390x-dchroot)tg@zelenka:~/mksh-59c/builddir/static-musl$ ./mksh -c 'echo test'
test

(it was done in the same subdirectory, ignore the pathname)

Unfortunately, this is not easily reduced… it, however, i̲s̲ reproducible
on the s390x porterbox. The code works with musl and static-pie on all
other Debian architectures on which musl is available and static-pie is
not broken (see #1068302).

#1068350#10
Date:
2024-04-03 23:57:11 UTC
From:
To:
retitle 1068350 musl: miscompiles (runtime problems) on riscv64 and s390x with static-pie
thanks

Dixi quod…

Exact same behaviour on the riscv64 buildd.

bye,
//mirabilos

#1068350#17
Date:
2024-04-04 10:44:28 UTC
From:
To:
* Thorsten Glaser <tg@debian.org> [2024-04-03 23:57:11 +0000]:

are you sure static pie works on these targets?

last time i checked binutils ld only supports it on a small number of targets.

#1068350#22
Date:
2024-04-04 10:54:08 UTC
From:
To:
* Szabolcs Nagy <nsz@port70.net> [2024-04-04 12:44:28 +0200]:

i take that back, it seems both riscv and s390 fixed
https://sourceware.org/bugzilla/show_bug.cgi?id=22263

the next culprit is gcc (each target can have their own
static pie specs) or the way you invoked gcc (not visible
in the bugreport).

#1068350#27
Date:
2024-04-04 19:50:40 UTC
From:
To:
Szabolcs Nagy dixit:

gcc-13_13.2.0-23

As I wrote earlier, though with more flags. Dropping all the -D…
and -W… and -I… and other irrelevant ones:

musl-gcc -Os -g -fPIE -fno-lto -fno-asynchronous-unwind-tables
    -fno-strict-aliasing -fstack-protector-strong -fwrapv -c …
musl-gcc -Os -g -fPIE -fno-lto -fno-asynchronous-unwind-tables
    -fno-strict-aliasing -fstack-protector-strong -fwrapv
    -Wl,-z,relro -Wl,-z,now -Wl,--as-needed -static -static-pie
    -fno-lto -o mksh  *.o

Same for both. You can see the full log by activating the
[64]Installed and [71]Installed links respectively on
https://buildd.debian.org/status/package.php?p=mksh and
skipping to 'compilation of mksh in static-musl' to get to
the beginning of the configure phase for that.

No ;-) That’s why I reported this issue. I had just
enabled it for the musl builds, as the security people
like that more than normal static.

Thanks for looking,
//mirabilos

No, the only way I've seen them sold is for $40 with a free OpenBSD CD.
	-- Haroon Khalid and Steve Shockley in gmane.os.openbsd.misc

#1068350#32
Date:
2024-04-04 20:40:44 UTC
From:
To:
Rich Felker dixit:

Hmm. Inhowfar? And it does seem to work fine on the other
architectures.

I fear that that’s out of question for Debian.

I’ve got a github action test setup for mksh though, which also
uses jirutka/setup-alpine to set up chroots of Alpine Linux for
various architectures and uses them to build natively under
qemu-user. I could use that to check static-pie? IIUC, these use
“a real cross toolchain”, if natively; qemu-user adds an extra
potential failure dimension though…

Together with the MIPS fix?

Hmm, actually… I could… test whether that one fixes static-pie
on zelenka. Or at least the same approach. I’ll get back with
report from that.

bye,
//mirabilos

#1068350#37
Date:
2024-04-04 20:26:41 UTC
From:
To:
I seem to recall the musl-gcc wrapper does not handle static-pie
right. A real cross toolchain should. If there's an easy fix for the
wrapper I'd be happy to merge it.

Rich

#1068350#42
Date:
2024-04-04 21:18:26 UTC
From:
To:
Dixi quod…

Having looked at the spec file, the only extra things the stock
specs do that the overriding specs don’t is:

*link:
[…] %{!static|static-pie:--eh-frame-hdr} […] %{static-pie:-static -pie --no-dynamic-linker -z text} […]

instead of:

[…] %{static-pie:-static -pie --no-dynamic-linker} […]

The -Wl,-z,text makes TEXTRELs an error. Granted.
The -Wl,--eh-frame-hdr is added for anything that’s not a normal
static executable, however adding that to a musl build doesn’t
fix the problem either.

A bit of gdb-ing shows the problem, though: the source code has…

#define Ttypeset "typeset"
#define Tdr "-r"
//… (a variant of this is used for string sharing on ancient Unix)

static const char *initcoms[] = {
	Ttypeset, Tdr, initvsn, NULL,
	Ttypeset, Tdx, "HOME", TPATH, TSHELL, NULL,
  […]
};

It then iterates over these commands with:

for (wp = initcoms; *wp != NULL; wp++) {
	c_builtin(wp);
	while (*wp != NULL)
		wp++;
}

This is where the extra output happens:

(gdb) print initcoms
$3 = {0x3fff7fc14a4 "typeset", 0x0, 0x0, 0x0, 0x3fff7fc14a4 "typeset", 0x0, 0x3fff7fc0478 "HOME",
[…]

Notice the nullptrs there where string pointers are expected.
It shows the same output when just loading the executable, i.e. this
isn’t a runtime issue.

Linking the exact same .o files with the exact same command minus
-static-pie gives:

(gdb) print initcoms
$1 = {0x103cb34 "typeset", 0x103e368 <u_ops+128> "-r",
  0x103e73c <initvsn> "KSH_VERSION=@(#)MIRBSD KSH R59 2024/02/01 +Debian", 0x0, 0x103cb34 "typeset",

But this does seem to be a toolchain bug: adding -static-pie to the
glibc dynamic-pie link command and…

(gdb) print initcoms
$1 = {0xda494 "typeset", 0x0, 0x0, 0x0, 0xda494 "typeset", 0x0, 0xd942c "HOME", 0xda7d8 "PATH",

Now I (or someone) is going to have to reduce that to a testcase, so
we can detect static-pie viability before it’s committed to being used…

bye,
//mirabilos

#1068350#47
Date:
2024-04-05 00:26:57 UTC
From:
To:
Dixi quod…

No success with that, unfortunately.

Wait, what?

(gdb) b main
Breakpoint 1 at 0xd820: file ../../main.c, line 785.
(gdb) print initcoms
$1 = {0xda494 "typeset", 0x0, 0x0, 0x0, 0xda494 "typeset", 0x0, 0xd942c "HOME", 0xda7d8 "PATH",
[…]
(gdb) r
Starting program: /home/tg/mksh-59c/builddir/full/mksh

Breakpoint 1, main (argc=1, argv=0x3ffffffa4d8) at ../../main.c:785
785     {
(gdb) print initcoms
$2 = {0x3fff7eda494 "typeset", 0x3fff7ee4548 <u_ops+128> "-r",
  0x3fff7ee4ae0 <initvsn> "KSH_VERSION=@(#)MIRBSD KSH R59 2024/02/01 +Debian", 0x0, 0x3fff7eda494 "typeset",
[…]

While in musl:

(gdb) print initcoms
$1 = {0x414a4 "typeset", 0x0, 0x0, 0x0, 0x414a4 "typeset", 0x0, 0x40478 "HOME", 0x41d42 "PATH",
[…]
(gdb) r
Starting program: /home/tg/mksh-59c/builddir/static-musl/mksh

Breakpoint 1, main (argc=1, argv=0x3ffffffa498) at ../../main.c:785
785     {
(gdb) print initcoms
$2 = {0x3fff7fc14a4 "typeset", 0x0, 0x0, 0x0, 0x3fff7fc14a4 "typeset", 0x0, 0x3fff7fc0478 "HOME",
[…]

So the existing ones did get relocated, but the nullptrs stayed thusly.

Apparently, it *is* supported on glibc on s390x, mjt (qemu maintainer)
also said so in 2023.

bye,
//mirabilos

#1068350#54
Date:
2024-04-05 04:11:20 UTC
From:
To:
Hi,

in static-pie, relocations get processed in _start, before main() is
called. In musl, this is done by linking with rcrt1.o as start file
instead of crt1.o. And that file processes all relative relocations. You
can check with readelf -r what the relocation types are. If they are not
relative, they will not be processed.

What you are seeing seems indicative of missing relocation processing.
Is it possible you are linking in the wrong start file? gcc -v should
output the command line it feeds to the linker.

Ciao,
Markus

#1068350#59
Date:
2024-04-05 05:04:37 UTC
From:
To:
Markus Wichmann dixit:

Gotcha! They are all R_390_RELATIVE except for:

000000045ff0  001100000016 R_390_64          0000000000042c58 u_ops + 70
000000045ff8  001100000016 R_390_64          0000000000042c58 u_ops + 0
000000047020  001100000016 R_390_64          0000000000042c58 u_ops + 80
000000047088  001100000016 R_390_64          0000000000042c58 u_ops + 80
0000000470a8  001100000016 R_390_64          0000000000042c58 u_ops + b8
000000047220  001100000016 R_390_64          0000000000042c58 u_ops + 80
000000046900  002600000016 R_390_64          0000000000015af8 c_command + 0
000000046940  000700000016 R_390_64          0000000000017238 c_exec + 0
000000046ab0  002000000016 R_390_64          0000000000016a80 c_trap + 0
000000047090  002500000016 R_390_64          00000000000430ac initvsn + 0
000000047278  005500000016 R_390_64          0000000000047438 null_string + 2

That’s our missing strings.

Should be correct:

 /usr/libexec/gcc/s390x-linux-gnu/13/collect2 -fno-lto -dynamic-linker /lib/ld-musl-s390x.so.1 -nostdlib -static -static -pie --no-dynamic-linker -o mksh /usr/lib/s390x-linux-musl/rcrt1.o /usr/lib/s390x-linux-musl/crti.o /usr/lib/gcc/s390x-linux-gnu/13/crtbeginS.o -L/usr/lib/s390x-linux-musl -L /usr/lib/gcc/s390x-linux-gnu/13/. -z relro -z now --as-needed -z text --eh-frame-hdr lalloc.o edit.o eval.o exec.o expr.o funcs.o histrap.o jobs.o lex.o main.o misc.o shf.o syn.o tree.o var.o ulimit.o --start-group /usr/lib/gcc/s390x-linux-gnu/13/libgcc.a /usr/lib/gcc/s390x-linux-gnu/13/libgcc_eh.a -lc --end-group /usr/lib/gcc/s390x-linux-gnu/13/crtendS.o /usr/lib/s390x-linux-musl/crtn.o

HTH & HAND,
//mirabilos

#1068350#64
Date:
2024-04-05 05:31:54 UTC
From:
To:
Am Fri, Apr 05, 2024 at 05:04:37AM +0000 schrieb Thorsten Glaser:

I may not really know what I am talking about, so take this with a grain
of salt, but isn't this missing a -Bsymbolic somewhere? Ironically, that
switch causes ld to not emit symbolic relocations. I seem to remember
reading long ago in Rich's initial -static-pie proposal that that was
one of the switches added to the linker command line.

In any case, the emission of non-relative relocations is the issue here,
and it is coming from the linker.

Ciao,
Markus

#1068350#69
Date:
2024-04-05 05:58:15 UTC
From:
To:
Markus Wichmann dixit:

When searching for which architectures support static PIE in the first
place (sadly, there doesn’t seem a consistent list), I found one saying
it’s no longer necessart after some point, so I didn’t check it.

They are present in the glibc static-pie binary as well, though.
And tbh they look to me like “just plug the absolute address of
the symbol here, please”, which is perfectly fine for things like
an array of strings when the actual string has already its own symbol.

(Disclaimer: I know… barely anything about Unix relocation types,
a bit more about those on DOS and even TOS.)

bye,
//mirabilos

#1068350#74
Date:
2024-04-05 06:42:17 UTC
From:
To:
Am Fri, Apr 05, 2024 at 05:58:15AM +0000 schrieb Thorsten Glaser:

Then glibc's static-pie startup code also processes symbolic
relocations. musl's doesn't. It only processes relative relocations. And
changing this would require some massive reworking. We'd somehow have to
put stage 2 of the dynamic linker into rcrt1.o.

A symbolic lookup doesn't really make sense for a static executable
outside of FDPIC. The only difference in address space possible is a
relative offset. In order to do a symbolic relocation, you also need the
symbol lookup stuff, which - granted - for a static PIE is probably
very simple because there can be only one symbol table, but still.

I thought the whole point of static-PIE support was to only leave
relative relocations around.

Ciao,
Markus

#1068350#79
Date:
2024-04-05 06:48:11 UTC
From:
To:
* Thorsten Glaser <tg@mirbsd.de> [2024-04-05 05:04:37 +0000]:


this is not correct static pie.

glibc handles symbolic relocs, but there should not be
any non-local symbol in a static exe. you may want to
check the symbol table.

so s390 does not support static pie.
(arguably the elf is correct, if you expect a full
dynlinker in a static pie, but even then it's bad
quality linker output)

#1068350#84
Date:
2024-04-06 03:00:51 UTC
From:
To:
Is there anything weird about how these objects were declared that
might have caused ld not to resolve them statically like it should? It
seems odd that these data symbols, but not any other ones, would be
left as symbolic relocations.

Rich

#1068350#89
Date:
2024-04-06 15:18:42 UTC
From:
To:
Rich Felker dixit:

I don’t think so?

In <Pine.BSM.4.64L.2404042102310.18654@herc.mirbsd.org> I already
posted the short version; the actual source is (mirrored):

The initcoms array is here:
https://github.com/MirBSD/mksh/blob/b0219da8e6dfc7b16e923e220dc6933c5ed9b326/main.c#L77

Tdr is defined at:
https://github.com/MirBSD/mksh/blob/b0219da8e6dfc7b16e923e220dc6933c5ed9b326/sh.h#L3055

The u_ops array is declared a few lines above that and defined at:
https://github.com/MirBSD/mksh/blob/b0219da8e6dfc7b16e923e220dc6933c5ed9b326/funcs.c#L160

initvsn is defined at…
https://github.com/MirBSD/mksh/blob/b0219da8e6dfc7b16e923e220dc6933c5ed9b326/sh.h#L713
… with the EXTERN and E_INIT macros from…
https://github.com/MirBSD/mksh/blob/b0219da8e6dfc7b16e923e220dc6933c5ed9b326/sh.h#L657
where main.c defines EXTERN, so the string is embedded into the file using it.

Is there perhaps a misunderstanding with the gcc/binutils/glibc developers
as to what static-pie is meant to be?

bye,
//mirabilos