Dear Maintainer,
The motivation for the present bug report comes from Bug#1055228. Since
version 1.22.1 of dpkg-dev was released (on October 30), the plplot
package FTBFS due to a failing compilation of one of Fortran examples,
which is exercised as a unit test during package building.
The package built fine previously. The problem is triggered by the change
in dpkg-buildflags, which now includes -fstack-clash-protection in
FFLAGS.
I am attaching to this bug message a shell script that can reliably
trigger the bug on an armhf system. Here is the output:
$ ./gfortran-stack-clash-protection-armhf-bug.sh
[…]
Program received signal SIGBUS: Access to an undefined portion of a memory object.
Backtrace for this error:
Bus error
Note that the bug does not happen on amd64. Also, it does not happen on
armhf when the option -fstack-clash-protection is not used in the
invocation of gfortran.
As far as I can tell, the problem is due to a global variable (tr) that
is not correctly accessed in a private function (mypltr) of the x09f
program. Here is what gdb tells me:
$ gdb x09f
[…]
(gdb) run -dev ps -o /dev/null
Starting program: /home/rafael/fortran/x09f -dev ps -o /dev/null
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Program received signal SIGBUS, Bus error.
0x00400dfe in x09f::mypltr (x=0, y=1, xt=1, yt=34) at x09f.f90:193
193 xt = tr(1) * x + tr(2) * y + tr(3)
My knowledge of Fortran and gfortran is way too scarce and, therefore, I
cannot debug the problem deeper. There may be a programming error in
x09f.f90 or this may be a problem with gfortran on armhf when option
-fstack-clash-protection is given.
Any help will be appreciated.
Best,
Rafael Laboissière
The attached file bug-1055750.tgz contains a minimal code that
triggers the bug on an armhf system :
$ ./run
***** Compile without -fstack-clash-protection & run
/usr/bin/ld: warning: /tmp/cc9l2GJa.o: requires executable stack (because the .note.GNU-stack section is executable)
***** Compile with -fstack-clash-protection & run
/usr/bin/ld: warning: /tmp/ccP0miz5.o: requires executable stack (because the .note.GNU-stack section is executable)
Program received signal SIGBUS: Access to an undefined portion of a memory object.
Backtrace for this error:
Bus error
* Rafael Laboissière <rafael@debian.org> [2023-11-10 15:52]:
Hi Rafael, Thanks! For the record I can reproduce the issue in a armhf chroot, but *not* on armel and arm64. The only thing to change in the reproducer is the -I argument to gfortran: armhf: /usr/lib/arm-linux-gnueabihf armel: /usr/lib/arm-linux-gnueabi arm64: /usr/lib/aarch64-linux-gnu I'll see if I can find out more.
* Emanuele Rocca <ema@debian.org> [2023-11-14 12:01]:
Thanks for the followup, Emanuele.
I have investigated the issue a little bit deeper. Please, find here
attached a new tarball with a simplified version of the x09f.f90 file,
that still triggers the bug. Running the resulting executable with gbd,
yields the following:
(gdb) run -dev ps -o /dev/null
Starting program: /home/rafael/bug-1055750/x09f -dev ps -o /dev/null
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
x09f::mypltr2 (x=0, y=1, xt=0, yt=7.300765e-43) at x09f.f90:48
48 yt = tr(1)
In this new code, there are two private functions mypltr1 and mypltr2.
The bug only happens when the latter is invoked, in which a value coming
from the global variable tr is assigned to yt. In mypltr1, the assignment
is done to xt, instead of yt, and no segfault is triggered.
Best,
Rafael Laboissière
* Rafael Laboissière <rafael@debian.org> [2023-11-15 18:06]: Oops, with the correct tarball this time. Best, Rafael Laboissière
* Rafael Laboissière <rafael@debian.org> [2023-11-15 18:37]: Grrr, with the correct one now, hopefully! R.
Hi Emanuele, Our messages crossed. * Emanuele Rocca <ema@debian.org> [2023-11-15 18:20]: Does this mean that the origin of the bug is upstream or that it still may be a bug in gfortran? Best, Rafael Laboissière
Hello Rafael! At this point we know for sure that the issue is not armhf-specific, and also that it is not caused by stack-clash-protection. On the contrary, enabling stack-clash-protection on armhf allowed us to discover a problem that would have otherwise gone unnoticed. Whether the bug is in plplot or gfortran I really have no idea. I would tend not to think of a compiler issue unless we have some evidence in that direction though, and suggest raising the issue with plplot upstream. Showing them the x86 reproducer with -fsanitize=address should be a good starting point. Thanks, Emanuele
* Emanuele Rocca <ema@debian.org> [2023-11-15 20:11]: Ok, thanks. FYI, I can reproduce the segfault on armhf and amd64 with -fsanitize=address. My guess is that the bug is in PLplot and not in gfortran, but this is jsut a guess. I will eventually inform the PLplot upstream authors about the issue. Best, Rafael Laboissière
* Rafael Laboissière <rafael@debian.org> [2023-11-16 07:51]: Done ! R.
Hi Rafael, Thank you! To be honest I think it's safe to close 1055750 (gfortran) and mark 1055228 (plplot) as forwarded upstream though, I don't think we have any reasons to believe the compiler is at fault really. Emanuele
* Emanuele Rocca <ema@debian.org> [2023-11-16 09:30]: Thanks for the suggestion, Emanuele. I am hereby merging both bugs, which are now both assigned to plplot. Best, Rafael Laboissière
severity 1055750 serious merge 1055750 1055228