#986590 dbus-test-runner: flaky ppc64el autopkgtest: FAIL test-libdbustest-mock-test (exit status: 1)

#986590#5
Date:
2021-04-07 19:20:55 UTC
From:
To:
Dear maintainer(s),

Your package has an autopkgtest, great. However, I looked into
the history of your autopkgtest [1] and I noticed the it fails regularly
on ppc64el, while a rerun passes. I copied some of the output
at the bottom of this report. As far as I checked, it's always the same
issue.

Because the unstable-to-testing migration software now blocks on
regressions in testing, flaky tests, i.e. tests that flip between
passing and failing without changes to the list of installed packages,
are causing people unrelated to your package to spend time on these
tests.

Paul

[1] https://ci.debian.net/packages/d/dbus-test-runner/testing/ppc64el/

https://ci.debian.net/data/autopkgtest/testing/ppc64el/d/dbus-test-runner/11196676/log.gz

FAIL: ./test-libdbustest-mock
FAIL test-libdbustest-mock-test (exit status: 1)

============================================================================
Testsuite summary for dbus-test-runner 15.04.0
============================================================================
# TOTAL: 36
# PASS:  16
# SKIP:  0
# XFAIL: 19
# FAIL:  1
# XPASS: 0
# ERROR: 0

#986590#10
Date:
2021-04-08 06:40:58 UTC
From:
To:
Hi Paul,

This is a known issue for DBus unit tests in CI / buildd
infrastructures. (I don't remember the full details / the bug number,
but it is a problem caused by the interplay of libpam-systemd and
schroot.

Hmmm... maybe something similar is happening in lxc based builds
(autopkgtests use lxc, iirc)?

We could try bumping the debhelper compat level to version 13
(currently 11). That helped with libqtdbustest and libqtdbusmock a few
months back.

Would that be an RT compliant solution?
Mike

#986590#15
Date:
2021-04-08 07:15:17 UTC
From:
To:
Hi,

ci.debian.net is using lxc, yes, autopkgtest has much more backends.

https://release.debian.org/bullseye/FAQ.html last paragraph.

So, what in compat level 13 is different for you that actually fixes the
issue? Obviously if you can show in the necessary unblock request that
the *only* delta of bumping compat level is the part that fixes this
bug, we'd accept. But that would require you to e.g. run diffoscope on a
build with and a build without the bump, as we can't judge the changes
from source alone.

Paul

#986590#20
Date:
2021-04-14 19:41:17 UTC
From:
To:
HI Paul

I have now uploaded a dbus-test-runner with minimal changes to
experimental (dbus-test-runner_16.10.0~bzr100+repack1-5~exp1).

Can you re-run those autopkgtests multiple times on ppc64el and check
if the flakiness has now vanished with debhelper compat level bumped
to version 13? Thanks.

Mike

#986590#25
Date:
2021-04-15 07:21:35 UTC
From:
To:
#986590#30
Date:
2021-04-15 10:24:48 UTC
From:
To:
Hi Paul,

Am Donnerstag, 15. April 2021 schrieb Paul Gevers:

I haven't looked at the build logs (am working outside...). The URL does not look like you used the experimental version, does it?

Mike

#986590#35
Date:
2021-04-15 11:50:31 UTC
From:
To:
Hi Mike,
if

Yes, it does. I used dbus-test-runner *from* experimental *in* unstable.
ci.d.n doesn't have experimental as stand-alone suite. This is also the
default way that the release team tests packages, i.e. take the package
from the source suite and run it in the target suite.

Paul

#986590#42
Date:
2021-05-17 10:57:02 UTC
From:
To:
Hi Paul,

I have done another upload / fix attempt for dbus-test-runner's flaky
autopkgtests on ppc64el.

This time, I incremented the SESSION_MAX_WAIT value (100 -> 1000).
Let's see if this helps.

Can you do your autopkgtest magic with upcoming dbus-test-runner
16.10.0~bzr100+repack1-5~exp2 once more?

Thanks,
Mike

#986590#47
Date:
2021-05-17 11:03:44 UTC
From:
To:
We believe that the bug you reported is fixed in the latest version of
dbus-test-runner, which is due to be installed in the Debian FTP archive.

A summary of the changes between this version and the previous one is
attached.

Thank you for reporting the bug, which will now be closed.  If you
have further comments please address them to 986590@bugs.debian.org,
and the maintainer will reopen the bug report if appropriate.

Debian distribution maintenance software
pp.
Mike Gabriel <sunweaver@debian.org> (supplier of updated dbus-test-runner package)

(This message was generated automatically at their request; if you
believe that there is a problem with it please contact the archive
administrators by mailing ftpmaster@ftp-master.debian.org)
Format: 1.8
Date: Mon, 17 May 2021 12:51:50 +0200
Source: dbus-test-runner
Architecture: source
Version: 16.10.0~bzr100+repack1-5~exp2
Distribution: experimental
Urgency: medium
Maintainer: Mike Gabriel <sunweaver@debian.org>
Changed-By: Mike Gabriel <sunweaver@debian.org>
Closes: 986590
Changes:
 dbus-test-runner (16.10.0~bzr100+repack1-5~exp2) experimental; urgency=medium
 .
   * debian/patches:
     + Add 2001_be-more-generous-with-SESSION_MAX_WAIT.patch. Set
       SESSION_MAX_WAIT to 1000 (instead of 100). This hopefully
       resolves flaky autopkgtests. (Closes: #986590).
Checksums-Sha1:
 c6b17ebd3c42d7cad3a0b42b94458bbdbde6f968 2442 dbus-test-runner_16.10.0~bzr100+repack1-5~exp2.dsc
 595cc9556a3d84cc577047a09218cfb497ab803a 19300 dbus-test-runner_16.10.0~bzr100+repack1-5~exp2.debian.tar.xz
 94a0fb00f5dd93014c942653e97ca8f0a2836376 10409 dbus-test-runner_16.10.0~bzr100+repack1-5~exp2_source.buildinfo
Checksums-Sha256:
 137e3a1df9df9a37d5ac35f9061deb62b0e550014bb486eed488aa2b6c9491da 2442 dbus-test-runner_16.10.0~bzr100+repack1-5~exp2.dsc
 674d1351168d5fae399f7615724fbb9fc8f2e237826a7c2622d17bde0c5934a1 19300 dbus-test-runner_16.10.0~bzr100+repack1-5~exp2.debian.tar.xz
 f8bb34e8fceb6e2389d23c81f425f2c77cb3696127273da279454250ef9c962b 10409 dbus-test-runner_16.10.0~bzr100+repack1-5~exp2_source.buildinfo
Files:
 ef7d153f6a91e3d2a18be5f2f1eb6103 2442 devel optional dbus-test-runner_16.10.0~bzr100+repack1-5~exp2.dsc
 968ed097e74d779d72d4618b01169ee0 19300 devel optional dbus-test-runner_16.10.0~bzr100+repack1-5~exp2.debian.tar.xz
 e29f3dbe7e400587a0a08726cbea8728 10409 devel optional dbus-test-runner_16.10.0~bzr100+repack1-5~exp2_source.buildinfo
-----BEGIN PGP SIGNATURE-----

iQJJBAEBCAAzFiEEm/uu6GwKpf+/IgeCmvRrMCV3GzEFAmCiS2UVHHN1bndlYXZl
ckBkZWJpYW4ub3JnAAoJEJr0azAldxsxL+IP/2DzVVlp0qzr+ozcX3AKJxGWAO/E
Q++nFI2Lxj42A2FlYjUFgbXTa4QAMsdDQIdq9k5pBdbei+VBlqYb/dXIcpI4vdSX
CGAFnI/vKDAAnB/H9PF0uWOCiELZth5xQgHH/8grDm0503jIyif+WO/vdXTEq/mC
/MTMUF6TNyZtPPID3+5iWbpMRnO5OB2VV0M0ZYDbJaOec1mVE1wRukgUO/CkMR9S
Kyyx5BPbAyxSuJZT6YNRtRLiS2cG5r6uOdTxpPovpOaQGNhbIaqJ42kNU/geiHN6
PX2YpoS4OKYRHS2mV++zh5hjV+aWfirEQzZr9OhWiFE2Gqrn0BGorBXq3jPDidnr
etw0hzHS78L6NeMw1AkUQwwE5DQ3l5h3YpMozGOPg3cy1yPtJyn6JeuCt235RDwo
xFpah4h8H8TEttm58ExN7uQShoQQI1JUPQIkJJBaBocCYiw8K5tiRKQ/wTbIK9Hc
6+7A3DSFG+SqgeG+dUwyVUOWudVjwG28YFXnT/y3l00dX76gtA2fcBjq6QAU9w02
vyFnVpWAmCccaVjqk1Hm6FOwPCOjHHQpDulZwwlZgdGm1EBa1tS7bJWPV5K/+452
KGSVqurOWnhM29Oj6b1Qws1Ie0DmZtleRZ+46scoiHYC4VQpRpx1EJ3ffRMMk6uh
cZvxlQtzWt+rbrzD
=CSoS
-----END PGP SIGNATURE-----

#986590#52
Date:
2021-05-17 18:56:52 UTC
From:
To:
#986590#61
Date:
2021-05-18 05:38:29 UTC
From:
To:
Hi Paul,

Sigh...

Would it be a viable solution for now to not run autopkgtests on
ppcel64? I really don't have a clue why this issue comes up only on
ppcel64.

I have Cc: Adrian Bunk as he had some good ideas in the past on such
issues. Adrian, do you mind taking a look? Any hint how to track this
issue down?

Mike

#986590#66
Date:
2021-05-18 18:49:55 UTC
From:
To:
Hi Mike,

If you think this failure is not representative for the package behavior
on ppc64el, it is. Actually, I considered to propose it in my previous
e-mail. However, I do wonder what this means for users of
dbus-test-runner on ppc64el.

Paul

#986590#71
Date:
2021-05-18 18:55:34 UTC
From:
To:
Hi Paul,

dbus-test-runner is used across various DEB packages during unit
tests. We would see build failures everywhere in DEBs that do some
sort or other of DBus testing, which we don't.

So...

Mike

#986590#76
Date:
2021-05-19 09:37:58 UTC
From:
To:
Adrian asked me on IRC and was concerned about whether
https://sources.debian.org/src/dbus-test-runner/16.10.0%7Ebzr100+repack1-4.1/tests/test-libdbustest-mock.c/#L35
is thread-safe:

    g_object_add_weak_pointer(G_OBJECT(connection), (gpointer) &wait_for_close_ptr);

I think he is correct to be concerned about this. As documented,
g_object_add_weak_pointer() cannot be thread-safe for historical reasons.
See <https://bugzilla.gnome.org/show_bug.cgi?id=548954> for details.

A better way to implement this would be to use a GWeakRef, which is the
mechanism that was added to GLib to provide thread-safe weak-refs.
Unlike g_object_add_weak_pointer(), GWeakRef triggers before dispose()
(whereas g_object_add_weak_pointer() triggers after dispose() but before
finalize()), and is not tied up with g_object_weak_ref() (which calls an
arbitrary user-supplied callback that can't really be thread-safe).

Something like this (untested!):

    GWeakRef weak;

    g_weak_ref_init (&weak, connection);
    g_object_unref (connection);

    int wait_count;
    for (wait_count = 0; wait_count < SESSION_MAX_WAIT; wait_count++)
    {
        GDBusConnection *still_alive = g_weak_ref_get (&weak);

        if (still_alive == NULL)
           break;

        g_object_unref (still_alive);
        process_mainloop(200);
    }

    g_weak_ref_clear (&weak);
    g_assert (wait_count != SESSION_MAX_WAIT);

(This is basically a more complicated version of
g_assert_finalize_object() that assumes the object is temporarily being
kept alive by references held in the default main-context, and therefore
polls the default main-context instead of just asserting that the object
dies immediately.)

The good news is that if this change fixes the autopkgtest, then it was a
bug in the test, and not a bug in the parts of dbus-test-runner that are
used by other packages.

    smcv

#986590#87
Date:
2022-05-04 04:24:18 UTC
From:
To:
Unfortunately the patch in #76 still fails. Needs some more
investigation:

======================================

FAIL: test-libdbustest-mock-test
================================
** (gtester:7666): WARNING **: 21:18:27.890: Deprecated: Since GLib
2.62, gtester and gtester-report are deprecated. Port to TAP.
TEST: ./test-libdbustest-mock... (pid=7667)
/libdbustest/mock/basic:
(/builds/debian/dbus-test-runner/debian/output/source_dir/tests/.libs/test-libdbustest-mock:7667):
libdbustest-WARNING **: 21:18:27.910: Unable to start watchdog
DBus daemon: unix:abstract=/tmp/dbus-QtSQwrxNGw,guid=5439845aad7f1bb8e63d4297626c5623
DBusMock: Started with PID: 7687
DBusMock: Shutting down
DBus daemon: Shutdown
**
ERROR:test-libdbustest-mock.c:45:wait_for_connection_close: assertion
failed: (wait_count != SESSION_MAX_WAIT)
FAIL
GTester: last random seed: R02S0c733c834df4b87e3a0bb6bff1c44e77
(pid=7728)
/libdbustest/mock/properties:
(/builds/debian/dbus-test-runner/debian/output/source_dir/tests/.libs/test-libdbustest-mock:7728):
libdbustest-CRITICAL **: 21:21:48.365:
dbus_test_dbus_mock_object_add_property: assertion
'g_variant_is_of_type(value, type)' failed
(/builds/debian/dbus-test-runner/debian/output/source_dir/tests/.libs/test-libdbustest-mock:7728):
libdbustest-WARNING **: 21:21:48.367: Unable to start watchdog
DBus daemon: unix:abstract=/tmp/dbus-FB8W7MsRSE,guid=4425b2c97b1ed05ae561ebd2626c56ec
DBusMock: Started with PID: 7747
(/builds/debian/dbus-test-runner/debian/output/source_dir/tests/.libs/test-libdbustest-mock:7728):
libdbustest-CRITICAL **: 21:21:48.462: Property 'prop1' is not of same
value in dbus_test_dbus_mock_object_update_property()
DBusMock: Shutting down
DBus daemon: Shutdown
OK
/libdbustest/mock/methods:
(/builds/debian/dbus-test-runner/debian/output/source_dir/tests/.libs/test-libdbustest-mock:7728):
libdbustest-WARNING **: 21:21:48.472: Unable to start watchdog
DBus daemon: unix:abstract=/tmp/dbus-7xF4SrW7pk,guid=9514c44ce88f66e83ee408cc626c56ec
DBusMock-1: Started with PID: 7755
DBusMock-1: Shutting down
DBus daemon: Shutdown
OK
/libdbustest/mock/signals:
(/builds/debian/dbus-test-runner/debian/output/source_dir/tests/.libs/test-libdbustest-mock:7728):
libdbustest-WARNING **: 21:21:48.771: Unable to start watchdog
DBus daemon: unix:abstract=/tmp/dbus-4AGx7A42El,guid=46c8120e5b0e1fa2f9082734626c56ec
DBusMock-2: Started with PID: 7763
DBusMock-2: 1651267308.866 emit /test foo.test.interface.testsig
DBusMock-2: 1651267308.968 emit /test foo.test.interface.testsig_abc "a" "b" "c"
DBusMock-2: Shutting down
DBus daemon: Shutdown
OK
/libdbustest/mock/running:
(/builds/debian/dbus-test-runner/debian/output/source_dir/tests/.libs/test-libdbustest-mock:7728):
libdbustest-WARNING **: 21:21:49.076: Unable to start watchdog
DBus daemon: unix:abstract=/tmp/dbus-MJiLNJREgr,guid=71b2068fe312fd7ba7dd5e06626c56ed
DBusMock-3: Started with PID: 7771
DBusMock-3: Shutting down
DBus daemon: Shutdown
OK
/libdbustest/mock/running-system:
(/builds/debian/dbus-test-runner/debian/output/source_dir/tests/.libs/test-libdbustest-mock:7728):
libdbustest-WARNING **: 21:21:49.371: Unable to start watchdog
DBus daemon: unix:abstract=/tmp/dbus-KMNGhYfACr,guid=a04514c03ab8369dfed6f3f9626c56ed
DBusMock-4: Started with PID: 7779
DBusMock-4: Shutting down
DBus daemon: Shutdown
OK
/libdbustest/mock/interfaces:
(/builds/debian/dbus-test-runner/debian/output/source_dir/tests/.libs/test-libdbustest-mock:7728):
libdbustest-WARNING **: 21:21:49.668: Unable to start watchdog
DBus daemon: unix:abstract=/tmp/dbus-yeC3JJFI37,guid=e1b247ec47e3e0d499b9f17c626c56ed
DBusMock-5: Started with PID: 7787
DBusMock-5: Shutting down
DBus daemon: Shutdown
OK
FAIL: ./test-libdbustest-mock
FAIL test-libdbustest-mock-test (exit status: 1)
======================================


Anton

#986590#92
Date:
2022-05-04 18:16:24 UTC
From:
To:
The bug is still reproducible. This time on armhf [1] and in CI [2].

[1] https://buildd.debian.org/status/fetch.php?pkg=dbus-test-runner&arch=armhf&ver=19.04.0-1%7Eexp1&stamp=1651644822&raw=0
[2] https://salsa.debian.org/debian/dbus-test-runner/-/jobs/2731578

Regards

Anton

#986590#97
Date:
2022-05-05 20:57:02 UTC
From:
To:
As I mentioned before the patch does not solve the problem.
Increasing the "Session time" sometimes causes very long
test-times. So we have here definitely the deadlock.

If somebody has more experience in glib, it would be really
helpful to investigate an issue.

Best regards

Anton

#986590#102
Date:
2022-05-11 19:34:33 UTC
From:
To:
severity 986590 important
thanks

I have temporarily disabled the unreliable test. Thus I am reducing
the bug's severity.
Yes, I know, it is wrong to disable a failing test. But I am
continuing to work on it, trying
to build it with threadsanitizer and check, what is going on there.

Any help is appreciated/

Thanks

Anton