#1109158 libsoup3: [metabug] several intermittent test failures resulting in flaky autopkgtests and FTBFS

#1109158#5
Date:
2023-05-12 08:30:44 UTC
From:
To:
Libsoap2 & libsoap3 triggers autopkgtest run when updating other packages,
such as samba. And on almost every samba upload, for quite some time, I
have to reschedule one or another libsoap autopkgtest run due to the same
failure, like in current version:

https://ci.debian.net/data/autopkgtest/testing/s390x/libs/libsoup3/33505300/log.gz

...
PASS: libsoup-3.0/session-test.test
Running test: libsoup-3.0/auth-test.test
# random seed: R02S312e10d03684417c83184f8cd3e1c8b1
# GLib-GIO-DEBUG: _g_io_module_get_default: Found default implementation gnutls (GTlsBackendGnutls) for ‘gio-tls-backend’
# [1375753.095277] enter apache_init
# Apache command: '/usr/sbin/apache2' '-d' '/usr/libexec/installed-tests/libsoup-3.0' '-f' 'httpd.conf' '-c' 'ErrorLog /tmp/test-tmp-libsoup-3.0_auth-test.test-JNCR41/error.log' '-c' 'PidFile /tmp/test-tmp-libsoup-3.0_auth-test.test-JNCR41/httpd.pid' '-k' 'start'
(98)Address already in use: AH00072: make_sock: could not bind to address 127.0.0.1:47524
no listening sockets available, shutting down
AH00015: Unable to open logs
Could not start apache
# -> failed
FAIL: libsoup-3.0/auth-test.test (Child process exited with code 1)

And subsequent tests in the category fail too, due to the same issue.

Right now we can't migrate samba to testing due to this test failure.
Usually it is sufficient to reschedule the test once, sometimes it
goes on after second attempt.

Can you please take a look?

https://tracker.debian.org/pkg/samba

#1109158#10
Date:
2024-09-06 04:21:04 UTC
From:
To:
Control: severity -1 important

After about 1.5 years, I'm pinging this bug and raising its severity.
On almost every samba package upload I have to retry a few libsoup
autopkgtest jobs due to this flakiness, as it fails to verify again
and again and stalls migration of unrelated packages.  Also raising
severity here because it affects other, unrelated packages.

https://ci.debian.net/packages/libs/libsoup3/testing/arm64/51269589/
https://ci.debian.net/packages/libs/libsoup3/testing/armhf/51269236/

Thanks,

/mjt

#1109158#19
Date:
2024-09-06 11:46:28 UTC
From:
To:
For reference, it looks like the autopkgtests are not failing so
frequently as to be especially problematic.
https://ci.debian.net/packages/libs/libsoup3/testing/armhf/
https://ci.debian.net/packages/libs/libsoup3/testing/arm64/

Severity important is fine though.

Thank you,
Jeremy Bícha

#1109158#24
Date:
2024-12-02 12:51:14 UTC
From:
To:
It failed again.  Should I keep re-scheduling it until success?

Note there are 6 failures like this just in the single last attempt:
https://ci.debian.net/packages/libs/libsoup3/testing/s390x/54980275/

Hmm..

/mjt

#1109158#29
Date:
2025-01-24 11:40:49 UTC
From:
To:
On Mon, 2 Dec 2024 15:51:14 +0300 Michael Tokarev <mjt@tls.msk.ru> wrote:
<jeremy.bicha@canonical.com> wrote:
current samba-triggered run.

RT says that flaky autopkgtest are RC, so raising severity accordingly.
Restrictions: flaky could also be added if the tests can't be fixed, as
a quick bandaid.

#1109158#36
Date:
2025-01-24 12:30:51 UTC
From:
To:
El 24/1/25 a las 12:40, Luca Boccassi escribió:

Hi. Thanks a lot for raising the severity of this bug.

This bug was hidden for a long time because the package
had also a missing BD on ca-certificates (#1054962) which
took one year to be fixed.

After the BD was fixed, this package FTBFS 20% of the time
in my environment (in the way described by this bug), which imo is
way worse than our users deserve.

So, I wonder if we can apply the same principle that
flaky tests are RC to gcr and gcr4:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1069402
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1057562

In the case of gcr4, there is an explicit request from
one of the release managers (Sebastian Ramacher) to
"either disable the flaky tests or make them non-flaky".

In fact, gcr4 fails to build in Salsa CI:

https://salsa.debian.org/gnome-team/gcr4/-/pipelines

and it also fails to build for me 99% of the time, so
imo it's even worse than libsoup3. I would say that if that's
not RC, it should be.

Not raising the severity myself, but I still expect
somebody else (maybe a RM) to do so.

Thanks.

#1109158#41
Date:
2025-05-16 10:45:43 UTC
From:
To:
found 1035983 3.6.5-1
tags 1035983 ftbfs
thanks

Hi. For some reason the BTS thinks this is only a problem in stable.
The above is my attempt at fixing that.

I'm also tagging the bug as ftbfs because I'm getting build failures
due to failing tests.

Regarding the flakiness itself, I get a failure rate around 20%
on machines with 1 CPU and 30% on machines with 2 CPUs. This is
greater than the reference thresholds given by Paul in one
of the gcr bugs.

I'd like to propose a patch, but the tests which fail are
different every time. On a sample of 200 build tries on
different machines, I get the following failures these many times:

   26 multithread-test
   23 proxy-test
   22 range-test
   22 connection-test
   22 auth-test
    6 server-test
    1 timeout-test
    1 hsts-test

If somebody wants to debug this (maybe Simon?), please contact me
privately, as I can provide a VM.

Thanks.

#1109158#50
Date:
2025-05-19 14:43:17 UTC
From:
To:
Is this still the same failure mode described in the bug title, with
"Address already in use" and "could not bind to address ..." being
reported by Apache?

Last time I looked at the libsoup* test suite, the actual tests were
each reasonably reliable, but the reliability issue was with their
setup/teardown. They run a temporary Apache web server, in order to
have a realistic server to test against. I think what's happening is
that sometimes, the web server port from one test (let's say test number
5) is still considered by the kernel to be in use by the time we reach
the setup stage of the next test (let's say test number 6).

As a result, the Apache for test number 6 can't listen on the port it
has been configured to use, and testing fails at that point. This is
rare on a per-test basis, therefore difficult to reproduce on-demand -
but running the whole test suite involves several setup/teardown cycles,
resulting in a higher failure rate for the test suite as a whole. For
example if you're seeing a 30% failure rate, that might be more like a
2% failure rate for each of 15 test executables, or perhaps even a 0.2%
failure rate for each of 150 smaller test-cases.

If that's still what is happening, then it's expected that you will see
failures in different tests (and even in different test-cases within
those larger tests) on different occasions.

Unfortunately, if that's the case, then skipping any specific test-case
is not going to be a viable workaround, because it's the common
setup/teardown done for each test-case that is the problem.

If it's possible to configure Apache to set options like SO_REUSEADDR
and/or SO_REUSEPORT then that might help (but I don't know whether
that's possible).

Or if it's possible to make the test suite use a different port for each
test then that might help (but I don't know whether that will be
feasible).

     smcv

#1109158#55
Date:
2025-05-19 15:57:50 UTC
From:
To:
El 19/5/25 a las 16:43, Simon McVittie escribió:

That's a very good question and I'm glad that you asked :-)

In some cases, yes, but not always. I've put a collection
of failed build logs here:

https://people.debian.org/~sanvila/build-logs/libsoup3/

I usually try not to report FTBFS bugs when there is already another
open bug about flaky tests that I can also reproduce, as such
duplication is not very useful, but in this case you are right
that there might be more than one issue, so feel free to
clone if required.

Thanks.

#1109158#66
Date:
2025-07-12 17:11:05 UTC
From:
To:
Control: clone 1035983 -2
Control: retitle 1035983 libsoup3: intermittent test failures: Address already in use: AH00072: make_sock: could not bind to address 127.0.0.1:xxx
Control: retitle -2 libsoup3: [metabug] several intermittent test failures resulting in flaky autopkgtests and FTBFS
Control: unblock 1035983 by 1109107 1109108
Control: block -2 by 1035983

Bug #1035983 has always mentioned the AH00072 issue in its title, so I
think it's probably best if we consider any other sources of FTBFS or
autopkgtest failures as out-of-scope for #1035983.

Regarding the topic of flaky tests in general:

Unfortunately I suspect that what's happening here is that we have a
series of different test failures, each of them individually quite rare
(therefore hard to reproduce or debug), which add up to a significant
probability that at least one of the rare failures will happen at least
once in any given test run and therefore the overall test suite fails.

I've cloned a "metabug" (-2 above) to be blocked by #1035983 and other
concrete and potentially actionable causes of test failures, but that
metabug is not going to be directly actionable, because issues that
can't be identified can't be fixed: the only way it can be solved is to
chip away at its actionable dependencies until the failure rate becomes
sufficiently low. I am not an expert on this package and I cannot commit
to being able to achieve that.

Individual tests that are sufficiently flaky can be worked around by
disabling or ignoring the test if necessary (as was done for the
tls_interaction test already), but the cost of disabling tests is that
we can no longer use them to detect RC-severity regressions
(particularly on architectures with few users where the buildds and
autopkgtest are basically the only tools we have), so there's a
trade-off here between breakage caused by false-positive failures and
breakage caused by regressions that could have been caught by running
the tests. As a non-expert trying to keep this package afloat, I don't
feel that I am able to make high-quality uploads without automated tests
to detect my inevitable mistakes. I'm sorry that this is disappointing,
and I would be delighted to stop contributing to libsoup when someone
can do a better job, but until then all I can do is to try to have a
net-positive impact to the best of my limited ability.

As mentioned previously, the AH00072 issue, #1035983, is particularly bad
for this because it affects several tests equally, and disabling all of
them would lose a lot of the overall test coverage.

Thanks, hopefully someone can analyze those at some point and pick out
the actionable equivalence classes. I cannot commit to being able to do
this myself.

I've reported some other sources of intermittent test failures as
#1109107 (no solution known, help welcome), #1109108 (no solution known,
help welcome) and #1109120 (fixed in the latest upload to unstable by an
upstream change). None of these are, individually, a high probability of
failure, but they add up.

When I tried running the test suite repeatedly on barriere, the failure
modes I saw intermittently were #1109107 and #1109108. I don't think I
saw #1109120 or #1035983, so those might be less common, at least on
that particular machine (if the failures are timing-dependent then they
might behave differently elsewhere).

Regarding #1035983 (the AH00072 issue) specifically:
the port-still-in-use problem (#1035983). (FYI this will not apply
cleanly to upstream code, it requires other changes already in
debian/patches to add more debug info, which I added last time I spent
time on trying to figure this out.)

Unfortunately it didn't work: the test made multiple attempts to start
Apache, but they all failed with the same error message shown in the
Subject, until the overall test timed out. That suggests that my theory
about the web server port being in TIME_WAIT state might not have been
correct. I don't know what else to try there.

In 3.6.5-2 I added a patch fixing an upstream issue where one of the
tests that used Apache was not marked "don't run in parallel", so it
could end up being run in parallel with other tests - that could have
resulted in a similar failure mode. We can see whether that helps. I
think I've still seen the AH00072 error occasionally even after making
that change, though, so it can't be the whole story.

     smcv