Via "Bug#1004894: sudo: [i386] invalid opcode", at the suggestion of the maintainer, I'm opening this bug. Since 1.9.9-1 'sudo' dumps core on a Geode LX host on i386. Attempts to build from source on the Geode LX host itself, to test for possible autotool issues, crashes early during the build. Logs attached. Martin-Éric - -- System Information: Debian Release: bookworm/sid APT prefers testing-debug APT policy: (500, 'testing-debug'), (500, 'stable-security'), (500, 'testing') Architecture: i386 (i586) Kernel: Linux 5.15.0-3-686 (SMP w/1 CPU thread) Locale: LANG=fi_FI.UTF-8, LC_CTYPE=fi_FI.UTF-8 (charmap=UTF-8), LANGUAGE=fi:en Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages gcc-11 depends on: ii binutils 2.38-1 ii cpp-11 11.2.0-16 ii gcc-11-base 11.2.0-16 ii libc6 2.33-5 ii libcc1-0 11.2.0-16 ii libgcc-11-dev 11.2.0-16 ii libgcc-s1 11.2.0-16 ii libgmp10 2:6.2.1+dfsg-3 ii libisl23 0.24-2 ii libmpc3 1.2.1-1 ii libmpfr6 4.1.0-3 ii libstdc++6 11.2.0-16 ii libzstd1 1.4.8+dfsg-3 ii zlib1g 1:1.2.11.dfsg-2 Versions of packages gcc-11 recommends: ii libc6-dev 2.33-5 Versions of packages gcc-11 suggests: pn gcc-11-doc <none> pn gcc-11-locales <none> pn gcc-11-multilib <none> - -- no debconf information -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEyJACx3qL7GpObXOQrh+Cd8S017YFAmIMu+UACgkQrh+Cd8S0 17ZBMA//Qg31Dc8/ozNURUGQ/RosW+BGVHoLpQdjngUQ5vv50h4VGX/bIdko0s9L VvACLi75VYsmQXyfyGNiKUk9QeHlQoNyf7x2WZ9WF440vMoL09+mgBYkTiA6ZrWR y+m4ljo9UdzwI0gNgOXaLYZzFKX2Nl14DhcU1WhDFwszVa/ju2Kq0KUg8NQR9Pe1 1iJtDIY+ktimRa8fV15W+AgGDRPu9cH4f75sRFI/CTMdbn98zRHyIcD7KvtHLAaJ 5iIdM/cX70PnNetzaX+TXPtWoxQZ4Q7Bc6yqzshvb2RP59Ok1UGwfwPAZ24PFD7E fXgxqWT+2FZW5w6RDyajfqIrUIczDPyaf7UNGiZXocsv9fneCMhMYPG8XRXzvela puajm94NyjY9ozNx2MsOjr+lSpgnhNXnY4g+2OugKVCSIoNy9gzIJTsYnrFG9rys /3FJaC+YZ+clFbtnysi1TWJuJJJiOWN3LLgkMrGFx7AUBQtA0wAxiZBH34jLZ34g 0TTmC+FmxiskKzhVmebyMzA4vaPPQ3rPYKwrRaG436ZnA189pztq17jdW5il8QNT yVJ7QZzaeDHdFpVCJ9zH0xzOkcOn2vZNC3djiI6f1sDXFvT4GGkj5VSEg/YUpaNF jTUxwXMUQdrA+2MnO6/DJBX/ZvkRpEZlihJ3eRv/TPR1Vh/6Z24= =IJ1+ -----END PGP SIGNATURE-----
Greetings, As I just noticed, 'netstat' similarily dumps core on the Geode LX host. Martin-Éric
Dear Maintainer and Martin-Éric,
Using a recent (2023-03-19) mirror of i386 packages from 'main' and 'contrib'
of bookworm, in combination with an ad-hoc script[1], the following packages
currently appear susceptible to this bug:
* libjavascriptcoregtk-4.0-18_2.38.5-1_i386
/usr/lib/i386-linux-gnu/libjavascriptcoregtk-4.0.so.18.21.8
* gobjc++-12-x86-64-linux-gnu_12.2.0-14cross1_i386
/usr/lib/gcc-cross/x86_64-linux-gnu/12/cc1objplus
* libfsapfs-dev_20201107-1+b3_i386
/usr/lib/i386-linux-gnu/libfsapfs.a
That's three potential positives; in total, the check ran on approximately
thirty-two thousand (32340, to be more precise) packages.
Thanks,
James
[1] - https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1033065#22
Package: gcc-11 Followup-For: Bug #1005863 X-Debbugs-Cc: martin-eric.racine@iki.fi Control: affects -1 + sudo not, in fact, run on every one of the 32340 packages -- instead only a sample (of yet-to-be-determined size) were checked. Complete results should follow; initial output appears to show that, in fact, many more than three packages (including a recent 'sudo' package) are indeed affected.
Ok; I should have realised that scanning the entire contents of the i386 bookworm archive for particular opcodes across _all_ files on a single machine seemed to complete surprisingly quickly.. Please find attached an updated check-script (check.sh) that is running currently. It makes some tradeoffs for scanning performance reasons: in particular, it's only inspecting files that have the executable bit set, or that end with the suffix '.so' or '.a'. It seems that it's going to take a while to run to completion on the available hardware here: my estimate would be approximately another two days (48 hours). I'm uncertain whether the script will run to completion uninterrupted, and also it is not written to be easily-resumable, so.. let's at least gather some summary statistics from the output while it's in progress. Please also find attached a reporting script (report.sh) that summarises the total number of packages scanned, the number of packages where at least one file was inspected, and the number of packages where at least one inspected file contained a 'nopl' opcode. The current report.sh output at the time of writing is: 2441 2042 130 So my guess is that approximately 6-7% of i386 packages in bookworm _that contain binaries or shared libraries_ are susceptible to this bug. The opcode may not be encountered at runtime when those packages are used, and analysis of the packages to determine where they sit in Debian's dependency graph would indicate the level of impact on a system, however my initial sense is that this could indeed be a fairly critical issue on Geode LX hardware for Debian bookworm. It's also a larger number of packages than we could expect individual maintainers to adjust their buildflags for on any realistic timescale - so either a Debian-specific patch or upstream fix would be required to continue to support Geode LX (in my opinion, and assuming that the script and report are accurate-enough to be guiding indicators).
Package: gcc-11 Followup-For: Bug #1005863 X-Debbugs-Cc: martin-eric.racine@iki.fi ... This reporting was, in fact, too optimistic; the check.sh script had a bug that meant it wasn't inspecting '*.a' files (even though it was including them in the per-package binary/library counts). With this fix in place:
Hi folks, Bug #1005863 describes a gcc-11 behaviour that results in software that exits ungracefully on Geode LX i686 hardware. Despite self-reporting as i586 sometimes, Geode LX is in fact an i686 CPU (without physical address extensions and multi-instruction noops -- both optional per spec). My assessment -- which may be incorrect -- is that something like 20% of packages in the bookworm i386 suite are susceptible to the bug, so I think that installing bookworm on a Geode LX system would present users with a poor experience of Debian. Would it be fair to raise the severity of this bug to a release-critical level? I understand that toolchains are an important part of the ecosystem and that changes to them -- especially ones that may affect many packages -- should be undertaken with care, and that we are into bookworm's pre-release hard freeze. Thank you, James
No, it would be fair to remove Geode LX from the set of supported processors. Those are now over 15 years old. Bastian
Ok, thank you; understood. It looks like this was previously documented[1] for the Debian 9.0 (stretch) release in 2017, and later discussed[2] further. I'll continue following the upstream bug, but I clearly don't fully understand the problem yet. My hope was that we could continue to maintain (in fact, with my updated understanding: restore) support for the affected Geode LX platform. I can accept that that may not be possible. [1] - https://www.debian.org/releases/stretch/i386/release-notes/ch-information.html#i386-is-now-almost-i686 [2] - https://lists.debian.org/debian-user/2019/04/msg01091.html
From a purely engineering perspective, without a way to address this problem, increasing the severity will not achieve much. Instead this should be documented in the release notes, with any relevant information. However, the fact that the the CPU is 15 year old is not a sufficient rationale to stop supporting it. It is more useful for debian-i386 to focus on CPUs that cannot run debian-amd64 than on CPUs that can. Cheers,
Reassigning this from package 'gcc' to 'binutils': It looks like it is GNU binutils[1] (and in particular, the GNU assembler) that is responsible for producing the assembly opcodes for a binary compiled with gcc. Yep, agreed. I'd like to learn more about technical fix feasibility before adjusting the severity. There was a commit[2] in Y2010 of GNU binutils to stop emitting NOPL on (32bit) i686 targets.. I'm wondering if it's possible that a regression since then may have caused the opcodes to reappear. (it continues to be equally likely that I've completely misunderstood and am creating noise without making any useful progress) [1] - https://www.gnu.org/software/binutils/ [2] - https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=2210942396dab942a86cb6777c705554b84ebb0e
Possible workaround It seems that the Linux kernel developers encountered this same issue (or at least a very, very similar one) way back in Y2008 and applied the following workaround: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=28f7e66fc1da53997a545684b21b91fb3ca3f321 (in short: add '-Wa,-mtune=generic32' as a CFLAG so that GCC passes an architecture-specific tuning option to the assembler. as described in the upstream binutils bug[1], tuning (not architecture selection) is currently used to determine whether multi-byte NOPs are emitted) Affected-packages update: I'm re-running my affected-packages analysis using 'objdump --disassemble' instead of 'objdump --disassemble-all' and that is indicating a _much_ lower affected percentage of packages (32-of-7325 so far, less than 1%). However, doing that does _not_ detect _any_ 'nopl' opcodes in the sudo 1.9.9-1 package -- and I'm a bit confused by that. It also turns out that, despite my comments to the contrary, I have been including non-free packages in the scan (the 32340 package count). I will probably add two non-free-firmware packages to that for completeness. When results are available I will attach the updated versions of the 'check.sh' and 'report.sh' script, and their output before and after the disassemble-selection change. Bedtime reading: I can recommend https://www.jookia.org/wiki/Nopl as a good writeup about the history of the 'nopl' instruction. I'm going to write to the author to thank them. [1] - https://sourceware.org/bugzilla/show_bug.cgi?id=6957
Ok, some more findings: The file 'libass.a' in libass-dev:i386 (1:0.17.1-1) is affected, and I'm able to build that package from source and replicate that behaviour: # Checking the existing package /srv/mirror/debian/pool$ dpkg -x main/liba/libass/libass-dev_0.17.1-1_i386.deb libass-dev_0.17.1-1_i386 /srv/mirror/debian/pool$ objdump -d libass-dev_0.17.1-1_i386/usr/lib/i386-linux-gnu/libass.a | grep -wc nopl 56 # Building from source to verify $ apt source libass $ cd libass-0.17.1/ $ dpkg-buildpackage -Pcross --host-arch i386 --target-arch i386 $ objdump -d ./debian/libass-dev/usr/lib/i386-linux-gnu/libass.a | grep -wc nopl 56 The next step was to attempt a fix by adjusting the dpkg-buildflags configuration locally using '~/.config/dpkg/buildflags.conf'. Adding the same workaround as the original Linux kernel developers (adding an entry to CFLAGS with '-Wa,-mtune=generic32' did **not** appear to work here. However: configuring ASFLAGS directly (bypassing the requirement for gcc to send the flags along to the assembler) does appear to resolve the problem: $ cat ~/.config/dpkg/buildflags.conf APPEND ASFLAGS -mtune=generic32 $ apt source libass $ cd libass-0.17.1/ $ dpkg-buildpackage -Pcross --host-arch i386 --target-arch i386 $ objdump -d ./debian/libass-dev/usr/lib/i386-linux-gnu/libass.a | grep -wc nopl 0 Further verification would help, but I believe that adding '-mtune=generic32' to Debian's dpkg-buildflags ASFLAGS during 32-bit i386 builds could be a way to restore package compatibility on any systems where NOPL is unavailable, such as the Geode LX. It is possible that such a change could negatively affect runtime performance on other Debian i386 systems. I'll also continue to try to determine where this bug originates. To me it seems like the assembler should not be emitting these opcodes during Debian builds for i386 (baseline CPU family i686) - but depending on the performance impact and wide support for NOPL on _most_ i686 family CPUs, perhaps it was a sensible choice for binutils to make.
Of 32,342 packages mirrored from the Debian bookworm i386 distribution archives
on 2023-03-19, there appear to be:
* 20,958 (65%) 'nopl scan candidate' packages that contain at least one file
name that ends with either '.so' or '.a', or has the exec-permission-bit
enabled.
* 4,788 packages (15%) packages where using 'objdump -D' to disassemble all
binary sections shows at least one potential 'nopl' instruction.
* 267 packages (1%) within the candidate set where using 'objdump -d' to
disassemble known-instruction-containing binary sections shows at least one
'nopl' instruction.
The set of packages that I believe would be affected and may not run correctly
on Geode LX hardware is the third item in that list. The full set of those
packages is attached as 'affected-packages.txt'.
Followup-For: Bug #1005863 Based on some (recent, Y2021) Linux kernel discussion, x86 NOP variants (including 'NOPL') have been removed[1][2] in preference for a single instruction. Performance impact in the kernel appears to have been negligible[3]. [1] - https://lkml.iu.edu/hypermail/linux/kernel/2103.1/06799.html [2] - https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=a89dfde3dc3c2dbf56910af75e2d8b11ec5308f6 [3] - https://lkml.iu.edu/hypermail/linux/kernel/2103.1/07997.html
Package: binutils
Followup-For: Bug #1005863
X-Debbugs-Cc: ballombe@debian.org
Some updated thoughts:
* I do think that I've found a possible, although confusingly-labeled, fix:
to use GNU assembler's 'mtune' option to use NOP instead of NOPL.
* My sense is that applying this archive-wide for i386 could be disruptive;
it feels possible to me that a tuning/optimization flag could alter
other binaries that didn't contain NOPL opcodes in the first place.
* I think there's an argument that there is a Debian policy violation here,
since hardware that is within our supported i386 platform -- and I believe
that Geode LX is -- may not run some packages correctly.
* Everyone's time is valuable and I don't want to unilaterally complain or
push on a particular issue, especially if there aren't demonstrably
positive results achievable.
* Applying the fix in a way that ensures that we don't forget to unpick it
in future if-and-when upstream issues are resolved seems sensible, as with
usual Debian practice.
* I'm at-or-beyond my level of technical competence here; I have some notions
about how toolchains, instruction sets, and so on operate - but I'm not
day-to-day familiar with how to apply good and safe engineering practices
in relation to them (explaining the multiple false starts during the i386
archive analysis, that I'm still not completely confident about).
* This issue feels important to me because I think we could revive and
provide longer lifetime value to widely-distributed hardware.
I'm not sure what to do next and will take a pause to think about it; I'd be
grateful for suggestions.
Hi, Having taken a look at the affected packages, I think it's beter at this stage to say that bookworm doesn't support Geode LX (and the likes). If you can come up with a fix we can include it in trixie. Paul
Followup-For: Bug #1005863 X-Debbugs-Cc: elbrus@debian.org Ack, thanks - that seems fair.
Followup-For: Bug #1005863
Further findings related to the ASFLAGS idea specifically:
* libass in fact uses the 'nasm' assembler by default
* Configuring dpkg ASFLAGS caused 'nasm' _not_ to be used (the ./configure
version check for it failed)
* Removing 'nasm' from the debian/control file and recompiling also
results in libass.a building _without_ 'nopl' instructions
So, in the case of libass at least, it appears that the bug may not be due to
GNU binutils.
I'm going to check some of the other affected packages next to see whether
nasm is a common build element for them.
Ok, cause found within libass; it is a vendored assembly file that other packages also include. Some x264 assembly code included in the x86inc.asm[1] file in libass uses the 'smart alignment'[2] feature of nasm to emit multi-byte NOP (no-op) instructions that are compatible with Pentium Pro processors. I have confirmed that patching the assembly line to replace 'ALIGNMODE p6' with 'ALIGNMODE k8' (as configured until Y2017[3]) results in an i386 binary build without NOPL (multi-byte NOP) instructions. I think this explains the existence of multibyte NOPs in many Debian packages that include x264's 'x86inc.asm' file, with 15 or so visible using code search: https://codesearch.debian.net/search?q=filetype%3Aasm+ALIGNMODE+p6&perpkg=1 This doesn't explain the issue for many of the remaining affected packages, nor the original 'sudo' package where this was reported. Many of the remaining packages appear to use Rust and/or LLVM. [1] - https://sources.debian.org/src/libass/1%3A0.17.1-1/libass/x86/x86inc.asm/#L904 [2] - https://www.nasm.us/xdoc/2.16.01/html/nasmdoc6.html#section-6.2 [3] - https://code.videolan.org/videolan/x264/-/commit/d2b5f4873e2147452a723b61b14f030b2ee760a5
After taking more time to learn about the initially-reported issue here, I'm resetting many of the bug's fields to match that, and adjusting the title to correspond with the upstream bug. The 'sudo' package is configured to use 'fcf-protection' (Intel Control-flow Enforcement Technology), and that feature uses an instruction 'endbr32' on 32-bit architectures that is a repurposed long-NOP, not supported by all i686 processors. The request in the upstream bug report is that GCC should reject the combination of i686 and fcf-protection because it emits code that does not function on all processors within that architecture class. We documented that Geode LX support is retained[1] in Debian stretch, and to my knowledge have not indicated otherwise since then. I'll continue to focus on finding other packages and toolchain elements where the NOPL opcode is generated and to find places where it's possible to safely improve on that. [1] - https://www.debian.org/releases/stretch/i386/release-notes/ch-whats-new.en.html#idm120 [2] - https://www.debian.org/releases/buster/i386/release-notes/ch-whats-new.en.html#idm120 [3] - https://www.debian.org/releases/bullseye/i386/release-notes/ch-whats-new.en.html#idm120 [4] - https://www.debian.org/releases/bookworm/i386/release-notes/ch-whats-new.en.html#idm120
Bookworm also retains Geode LX support, for the most part (Go and Rust packages, and sudo being the only notable exceptions). The key problem is with sudo. Without it, the only way for a non-privileged user to perform any action is via 'su' which opens a social trust issue since it requires knowing the root password. With 'sudo' the user only needs to confirm their own password and be authorized to use the desired command via /etc/sudoers. Martin-Éric
Es gibt eine Familienspende in Höhe von 1.850.000,00 USD von Cheng Charlie Saephan. Bitte antworten Sie für weitere Informationen. Denken Sie daran, Ihrer Familie und den Bedürftigen in Ihrer Umgebung Gutes zu tun. Dies ist bereits der zweite Versuch, Sie zu erreichen. Bitte antworten Sie für weitere Details.
Es gibt eine Familienspende in Höhe von 1.850.000,00 USD von Cheng Charlie Saephan. Bitte antworten Sie für weitere Informationen. Denken Sie daran, Ihrer Familie und den Bedürftigen in Ihrer Umgebung Gutes zu tun. Dies ist bereits der zweite Versuch, Sie zu erreichen. Bitte antworten Sie für weitere Details.