#746925 glib2.0 FTBFS on alpha and hppa: test-suite failure in async-splice-output-stream

#746925#5
Date:
2014-05-04 00:38:58 UTC
From:
To:
glib2.0 FTBFS on alpha due to the following test-suite error:

PASS: async-splice-output-stream 1 /async-splice/copy-chunks
PASS: async-splice-output-stream 2 /async-splice/copy-chunks-threaded-input
ERROR: async-splice-output-stream - missing test plan
ERROR: async-splice-output-stream - exited with status 133 (terminated by signal 5?)

Full build log at:
http://buildd.debian-ports.org/status/fetch.php?pkg=glib2.0&arch=alpha&ver=2.40.0-3&stamp=1398626766

I bisected with upstream source from a working version to find that commit
be25603b947876f13ab7d9cee6a8c367f8f528ff (Updated Serbian translation) is
the offending commit !!!!  Indeed, removing this commit from Debian source
version 2.40.0-3 results in glib2.0 building to completion on Alpha.

Cheers
Michael.

#746925#10
Date:
2014-05-04 15:03:42 UTC
From:
To:
Hi,

I suppose you / the buildd aren't running with a serbian locale?

Are you sure that this commit is the culprit? I would rather guess that the test
is intermittently failing (e.g. because of a race condition), and that it
randomly passed after you reverted that commit while it randomly failed with
that commit applied. Please run the test in a loop and see if it consistently
fails with that commit and if it consistently passes without it.

Thanks,
Emilio

#746925#15
Date:
2014-05-06 09:26:58 UTC
From:
To:
Bah, not any longer.

Despite rerunning the test with and without that commit at the time and
getting results consistent with the above being the problematic commit,
the first time I reran it today proved my hypothesis false.  It now
appears that there is about a 40% to 50% chance of the test suite failing
even without the Serb. transl. commit.

I will have to redo the bisection.

Is there a way to run an individual test?  At the moment I just run 'make
check' but it would be much quicker to bisect if I could run an
individual test.

Cheers
Michael.

#746925#20
Date:
2014-05-27 07:45:06 UTC
From:
To:
It's not possible to bisect the failed glib2.0 test
(async-splice-output-stream) on alpha as the commit that brings in the
new test is the first one to fail.  Looks like the bug was always there
but never trapped by the test suite until a test was introduced.

Interestingly the async-splice-output-steam test only fails when
run under an SMP kernel. It always passes on a single CPU system.  I
note the test does a call to the pthread code; that's probably the
problem --- there is definitely something occassionly racey in the
Alpha pthread library that is only exposed when running under a
multi-CPU system.

For the time being I have put glib2.0 into the "do-not-take-for-building"
list of the multi-cpu buildd, thus hopefully the next upload should
build successfully on the other (UP) buildd.

Cheers
Michael.

#746925#25
Date:
2014-09-13 15:09:07 UTC
From:
To:
The glibc2.0 also fails to build from source on hppa for the same reason:

ERROR: async-splice-output-stream
=================================

# random seed: R02Se318b0c8449b0f0964a60c8aa6747446
# Start of async-splice tests
ok 1 /async-splice/copy-chunks
PASS: async-splice-output-stream 1 /async-splice/copy-chunks
Segmentation fault (core dumped)
ok 2 /async-splice/copy-chunks-threaded-input
PASS: async-splice-output-stream 2 /async-splice/copy-chunks-threaded-input
ERROR: async-splice-output-stream - missing test plan
ERROR: async-splice-output-stream - exited with status 139 (terminated by signal 11?)

Full buildd log is here:
http://buildd.debian-ports.org/status/fetch.php?pkg=glib2.0&arch=hppa&ver=2.40.0-5&stamp=1410540994

#746925#30
Date:
2015-06-09 17:05:51 UTC
From:
To:
I did some further testing, and I have to agree that the async-splice-output-stream testcase is racy.
Sometimes it works on hppa (output via gdb):

Starting program: /build/glib2.0/glib2.0-2.44.1/debian/build/deb/gio/tests/.libs/lt-async-splice-output-stream
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/hppa-linux-gnu/libthread_db.so.1".
/async-splice/copy-chunks: OK
/async-splice/copy-chunks-threaded-input: [New Thread 0xf7ef0400 (LWP 29782)]
[New Thread 0xf76f0400 (LWP 29783)]
[New Thread 0xf6ef0400 (LWP 29784)]
OK
/async-splice/copy-chunks-threaded-output: OK
/async-splice/copy-chunks-threaded: OK
[Thread 0xf76f0400 (LWP 29783) exited]
[Thread 0xf7ef0400 (LWP 29782) exited]
[Thread 0xfa6fc3c0 (LWP 29769) exited]
[Inferior 1 (process 29769) exited normally]


and most times it simply fails:

Starting program: /build/glib2.0/glib2.0-2.44.1/debian/build/deb/gio/tests/.libs/lt-async-splice-output-stream
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/hppa-linux-gnu/libthread_db.so.1".
/async-splice/copy-chunks: OK
/async-splice/copy-chunks-threaded-input: [New Thread 0xf7ef0400 (LWP 29991)]
OK
/async-splice/copy-chunks-threaded-output:
Program received signal SIGSEGV, Segmentation fault.
0xf9f6321c in g_input_stream_is_closed (stream=0x2f746d70) at /build/glib2.0/glib2.0-2.44.1/./gio/ginputstream.c:1161
1161      g_return_val_if_fail (G_IS_INPUT_STREAM (stream), TRUE);
(gdb) bt
#0  0xf9f6321c in g_input_stream_is_closed (stream=0x2f746d70) at /build/glib2.0/glib2.0-2.44.1/./gio/ginputstream.c:1161
#1  0xf9f76348 in real_splice_async_complete_cb (task=0x1f490) at /build/glib2.0/glib2.0-2.44.1/./gio/goutputstream.c:1873
#2  0xf9f9b158 in g_task_return_now (task=0x1f090) at /build/glib2.0/glib2.0-2.44.1/./gio/gtask.c:1088
#3  0xf9f9b1b8 in complete_in_idle_cb (task=0x1f090, task@entry=<error reading variable: value has been optimized out>)
     at /build/glib2.0/glib2.0-2.44.1/./gio/gtask.c:1102
#4  0xf9a20468 in g_idle_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at /build/glib2.0/glib2.0-2.44.1/./glib/gmain.c:5393
#5  0xf9a24f1c in g_main_dispatch (context=0x1b448) at /build/glib2.0/glib2.0-2.44.1/./glib/gmain.c:3122
#6  g_main_context_dispatch (context=context@entry=0x1b448) at /build/glib2.0/glib2.0-2.44.1/./glib/gmain.c:3737
#7  0xf9a2539c in g_main_context_iterate (context=0x1b448, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
     at /build/glib2.0/glib2.0-2.44.1/./glib/gmain.c:3808
#8  0xf9a25874 in g_main_loop_run (loop=0x1c510) at /build/glib2.0/glib2.0-2.44.1/./glib/gmain.c:4002
#9  0x00011484 in test_copy_chunks_start (flags=<optimized out>) at /build/glib2.0/glib2.0-2.44.1/./gio/tests/async-splice-output-stream.c:162
#10 0xf9a55214 in test_case_run (tc=0x1ae30) at /build/glib2.0/glib2.0-2.44.1/./glib/gtestutils.c:2124
#11 g_test_run_suite_internal ()


My assumption is, that in the function test_copy_chunks_start() [in ./gio/tests/async-splice-output-stream.c]
new threads are started, but it's not ensured that everything is fully set up when g_output_stream_splice_async()
is called [inside the test_copy_chunks_start() function].

Helge

#746925#37
Date:
2015-09-14 15:14:29 UTC
From:
To:
Back in May 2014, Michael asked:
"Is there a way to run an individual test?  At the moment I just run 'make
check' but it would be much quicker to bisect if I could run an
individual test."

Ditto his concern...  Unlike the buildd systems, my alpha platform is slow
enough that being able to make small modifications to individual tests and
run them would be of significant value in getting the various FTBFS issues
resolved.  Yes, I said "various", as in "multiple-different-but-potentially-
related."

I'm getting a FTBFS on the current (Sept. 2015) glib2.0 source package for sid
on alpha, and while the failures are not identical to those previously reported
(one of mine occurs during the "mimeapps" testing), there are enough
similarities to be a concern.  My system (non-SMP) generates a segfault
during one of the "mimeapps" tests, whereas the buildd system has no
problem with *that* particular test, but fails elsewhere.

Recently, Michael and I have renewed efforts to get glib2.0 built on alpha.
Help from the package maintainers as far as figuring out how to run specific
individual tests standalone would be a *huge* help to us.  Thanks in advance
for any assistance provided.

#746925#42
Date:
2015-09-18 19:51:08 UTC
From:
To:
Verified the existence of at least two tests for which the notion of "setup" and "settle"
times have meaning, but since the exact failure mechanism is unknown, the best I can
offer presently is advice to watch for such things.  They *are* somewhat dependent on
the release of libc6, as Michael and I found when an update for that package was
applied a few days ago.

One build issue that showed up on my system that will NOT show up on a standard
buildd system concerns the use of "gdb", if present, by the "run-assert-msg-test.sh"
script.  First problem: "gdb" gets run on the wrapper script instead of the binary in
the ".libs" subdirectory.  Second problem: "gdb" cannot reference the memory
location corresponding to "print (char*) __glib_assert_msg" (found in the ".gdb"
file for the test).  "gdb" isn't part of the base configuration, nor is it a build dependency,
so the "gdb" package won't be present on a buildd system unintentionally.

Suggested workaround is to ignore whether "gdb" is present and simply don't run
that portion of the test.  After all, if a satisfactory result is deemed to have been
obtained in the absence of "gdb", why the essentially duplicate test?  The irony is,
the desired error message string is obtained when "gdb" runs the executable, but
it's not in the output captured for subsequent processing by "grep".