In a previous build of libsoup3 on the official buildds,
multithread-test failed with evidence of memory corruption:
https://buildd.debian.org/status/fetch.php?pkg=libsoup3&arch=amd64&ver=3.6.4-2&stamp=1737574120&raw=0
When the build was retried, all tests succeeded, so this is presumably
intermittent or otherwise unreproducible.
This is **not** the same as the failure mode that has been the most common
in the past, where tests that use Apache fail with "Address already in
use: AH00072: make_sock: could not bind to address 127.0.0.1:xxx".
Similarly when I tried to add Salsa-CI to this package, my first attempt
failed with a different indication of memory corruption:
https://salsa.debian.org/gnome-team/libsoup3/-/jobs/7814730
test as being essentially equivalent - if we corrupt the heap, then
glibc can fail in several different ways as a result, none of which are
meaningfully different.
There seems to be a second failure mode where multithread-test times
out (the default timeout is 60 seconds, but we use a 6x multiplier in
the Debian packaging to accommodate slower architectures). That
failure mode should be treated as a separate bug and is out of scope for
this particular bug report, although it's possible that it has the same
root cause. I will report that failure mode as a separate bug.
To get an idea of how frequent this is, I tried these steps on the amd64
porterbox, barriere:
1. build libsoup3 (from unstable):
schroot -c $chroot -r -- \
env DEB_BUILD_PROFILES=noudeb \
debuild -e CCACHE_DIR=$HOME/.ccache -e PATH=/usr/lib/ccache:$PATH -us -uc -B
2. run multithread-test repeatedly:
schroot -c $chroot -r -- \
env -C obj-x86_64-linux-gnu \
DEB_BUILD_PROFILES=noudeb CCACHE_DIR=$HOME/.ccache PATH=/usr/lib/ccache:$PATH \
DEB_PYTHON_INSTALL_LAYOUT=deb LC_ALL=C.UTF-8 \
meson test --repeat 100 -j1 multithread-test
(I tried this 3 times; optionally add --timeout-multiplier=6 to the
`meson test` command-line to emulate the original package build more
accurately)
3. read obj-x86_64-linux-gnu/meson-logs/testlog.txt for details of the
failures, if any
and my results were as follows:
- 7 successes, 1 timeout, 1 failure with memory corruption
- 19 successes, 1 timeout, 6 more successes, 1 more timeout, I cancelled
the run at this point
- 10 successes, 1 timeout, 15 more successes, 1 failure with
memory corruption
Anyone who wants libsoup3 tests to pass more often is invited to help to
debug and fix this. If the failure is reproducible under valgrind,
probably the easiest way is to build it in an environment that is
suitable for interactive debugging, then run multithread-test repeatedly
under valgrind, using something like
meson test --repeat 100 --wrapper=./valgrind.sh multithread-test
to get a backtrace for the memory corruption and figure out how it is
happening. But this might not be possible if using valgrind perturbs the
timing enough that the failure mode never actually happens.
Or it might be possible to build libsoup3 (and ideally GLib too) with
-fsanitize=address,undefined, and then run multithread-test repeatedly,
as above; but, again, AddressSanitizer slows down the binaries, which
could perturb the timing enough that the failure mode never actually
happens.
Annoyingly, it is not possible to run two or more copies of this test in
parallel, so that cannot be used to get to a failure sooner (this is
because each run of this test uses the same fixed filenames and port
numbers).
I am a member of the GNOME team, but not an Uploader of this particular
package. I am aware that some project members believe that, because I
have solved test issues it in the past, I should be held personally
responsible for every test failure that occurs in GNOME. As per the
Debian Social Contract §2.1.1, I decline that responsibility: I am not
able to fix everything all of the time, and I'm sorry if the project
considers my contributions to be inadequate.
smcv