#892228 libsphinxbase3: Causes pocketsphinx to FTBFS on 64-bit big-endian architectures (fills testsuite logs on disk with errors)

Package:
libsphinxbase3
Source:
sphinxbase
Description:
Speech recognition tool - shared library
Submitter:
James Clarke
Date:
2026-02-14 19:19:01 UTC
Severity:
important
Tags:
#892228#5
Date:
2018-03-07 00:33:05 UTC
From:
To:
Hi,
The build for pocketsphinx fails on 64-bit big-endian architectures, failing
with "No space left on device", as the testsuite log files fill up with
hundreds of gigabytes of warnings. The first indication of the problem in the
log files is:

where 33554432 is 0x2000000, i.e. 32 byte-swapped. This error isn't fatal
though, and libsphinxbase3 continues to try to build the trie, with tons of
duplicate word warnings, as it's reading all kinds of garbage. The issues stem
from a widespread use of using fread to read multi-byte values with no regard
for their endianness, with the first error, the wrong number of n-grams, coming
from reading into the "counts" array in ngram_model_trie_read_bin. The library
has functions like bio_fread which can do the byte-swapping for the caller, so
presumably these should be used instead, though for this file format there does
not seem to be an easy way to determine the endianness of the file based on
some header magic like for some of the others (but maybe it's intended to
always be little-endian).

32-bit big-endian architectures have the same underlying bugs, but it seems
they die a lot earlier, failing to calloc huge sizes (presumably these same
calls are made on 64-bit architectures but can be satisfied thanks to
overcommitting) and thus don't actually try to build the trie and spew all the
warnings.

There are "only" 62 calls to fread in sphinxbase (and a further 45 in
pocketsphinx) so it shouldn't be too hard for someone with knowledge of the
codebase to audit their uses, especially since my guess is that most of them
can be turned into something like `bio_fread(..., IS_BIG_ENDIAN)`. Similarly,
the corresponding fwrite calls should be audited too.

Regards,
James

#892228#14
Date:
2018-03-07 21:39:27 UTC
From:
To:
Hello,

James Clarke, on mer. 07 mars 2018 00:33:05 +0000, wrote:

I know, I had already reported the issue a long time ago, without
feedback.

Ouch!  Perhaps we should just abort the build before that happens for
now.

Samuel

#892228#19
Date:
2018-03-08 01:13:50 UTC
From:
To:
Yeah, I found the upstream issue after I reported this, but that doesn't
quite convey the problem seen here!

Probably; either abort based on DEB_HOST_ARCH_ENDIAN and DEB_HOST_ARCH_BITS
(though maybe the 32-bit big-endian builds are broken enough to not be useful
and should be disabled too), or I guess we can mark it Not-For-Us on the
wanna-build side.

James