#1030096 dask.distributed: autopkgtest failure.

#1030096#5
Date:
2023-01-31 03:37:19 UTC
From:
To:
The autopkgtest for dask.distributed is failing.

https://ci.debian.net/data/autopkgtest/testing/amd64/d/dask.distributed/30859887/log.gz

dask.distributed is currently not in testing, having been removed with
a reason of "allow pandas and dask to migrate" as a result of this, the
following packages are in violation of "packages must be buildable within
the same release".

tiledb-py
ipyparallel
satpy
spyder-kernels

#1030096#10
Date:
2023-02-04 18:20:32 UTC
From:
To:
I have an attempt to fix this in Salsa now (in my fork), but it hasn't
had time to run the tests yet, so I don't know whether it works.

Note the above mention that if we lose dask.distributed then we lose
Spyder, which makes this a bigger issue than I initially thought.

#1030096#15
Date:
2023-02-04 21:35:31 UTC
From:
To:
Am Sat, Feb 04, 2023 at 06:20:32PM +0000 schrieb Rebecca N. Palmer:

Any reason to not push to master directly?

We should definitely keep dask.distributed.

Kind regards
    Andreas.

#1030096#20
Date:
2023-02-04 22:43:59 UTC
From:
To:
That removed most of the test failures, but there seem to be a few
apparently random ones left (2 runs both failed, but with different errors).

I don't do that with packages that aren't mine.  Also see above.

#1030096#25
Date:
2023-02-05 09:13:30 UTC
From:
To:
Am Sat, Feb 04, 2023 at 10:43:59PM +0000 schrieb Rebecca N. Palmer:

Per our team policy that's fine and simply easier for other team members.

Thanks a lot for your work
    Andreas.

#1030096#30
Date:
2023-02-05 21:44:29 UTC
From:
To:
I currently have this in a state where it sometimes succeeds and
sometimes doesn't:
https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/tree/fix1030096

Tests I've seen to fail multiple times (and don't have a fix for):
test_balance_expensive_tasks[enough work to steal]
https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/jobs/3902376
(Seems to be the most common problem.  Using @pytest.mark.flaky to try 3
times doesn't seem to have helped, suggesting that if it fails once it
keeps failing in that run.  Applying part of upstream pull 7253 seemed
to make things worse, but I haven't tried applying the whole thing.)
test_popen_timeout
https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/jobs/3902745

Tests I've seen to fail once:
test_stress_scatter_death
https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/jobs/3902040
test_tcp_many_listeners[asyncio]
https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/jobs/3896327

#1030096#35
Date:
2023-02-06 07:59:17 UTC
From:
To:
(Background: the pandas + dask transition broke dask.distributed and it
was hence removed from testing; I didn't notice at the time that if we
don't get it back in we lose Spyder.)
https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/jobs/3903956
(We don't have to pass build-i386 (as this is an arch:all package) or
reprotest, but if these are effectively-random failures, they might also
be able to occur in build or autopkgtest.)

I'm probably the wrong person to be working on this - I don't know
enough about this package to say whether ignoring this kind of
intermittent failure (as my 'flaky' marks do) is appropriate, or to have
much idea how to actually fix it.

We could also try upgrading dask + dask.distributed to 2023.1, but
that's a risky move at this point.

#1030096#40
Date:
2023-02-06 10:13:43 UTC
From:
To:
Hi Rebecca,

Am Mon, Feb 06, 2023 at 07:59:17AM +0000 schrieb Rebecca N. Palmer:

as far as I know Diane has put quite some effort into dask and I
understood that dask and dask.distributed are closely interconnected.

In several cases we decided to ignore some tests.  While I like the idea
to mark a test flaky instead ignoring it completely given your
experience I think ignoring these tests is a valid way to proceed with
this package for the moment.

I agree that it is risky.  We might discuss this with upstream and
possibly use an experimental branch to verify how it works.  It might be
that later versions work better with later Pandas / Python3.11.
However, the window of opportunity to get something in before the freeze
is closing and I'm afraid we do not have time for experiments.

Kind regards
   Andreas.

#1030096#45
Date:
2023-02-06 19:38:23 UTC
From:
To:
Hi now.

My fragments of time were spent fighting with numba, and I didn't have
the energy to be thinking about dask.distributed.

Numba should be in a better place right now. So I can set my build
machine to trying to build it and seeing where we are with it right
now.

The most important thing about dask / dask.distributed is they really
should be at about the same upstream version. I'm not 100% sure how to
mark that in the d/control file. Also upstream might have some ability
to do minor releases independently.

But if we do a new upstream release of dask, it needs to be paired with
a new upstream release of dask.distributed. And in my experience
dask.distributed is the one that's harder to get to work right.

Diane

#1030096#50
Date:
2023-02-06 21:39:24 UTC
From:
To:
I agree that xfailing the tests *may* be a reasonable solution.  I'm
only saying that it should be done by someone with more idea than me of
whether these particular tests are important, because blindly xfailing
everything that fails is effectively not having tests.

If we do choose that approach, at least test_balance_expensive_tasks
needs to be an outright xfail/skip not just a flaky, because when it
fails it fails repeatedly.

I knew that, and was planning on 2022.12.1 of both when I decided to go
ahead with pandas.  What went wrong was that I only tested a build, not
an autopkgtest, and thought the failing tests were dask.distributed's
(known) inability to run all its tests in a buildd environment.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1027254#21
Depends: python3-dask (>= 2022.12.1~), python3-dask (<< 2022.12.2~)
but I haven't tested that.

#1030096#55
Date:
2023-02-07 03:20:54 UTC
From:
To:
So my efforts at debugging are made harder by it working for me. I'm
using

a9771f68a28dfc65cae3ac6acf70451c264f3227

from Debian HEAD.

= 2745 passed, 93 skipped, 216 deselected, 18 xfailed, 8 xpassed in
1992.20s (0:33:12) =

I looked at the last log on ci.debian.org for dask.distributed
https://ci.debian.net/data/autopkgtest/unstable/amd64/d/dask.distributed/31090863/log.gz

And it looks like several of those errors are networking related.

CI with the previously released 2022.12.1+ds.1-1 version is failing
with these tests:

test_defaults
test_hostport
test_file
test_default_client_server_ipv6[tornado]
test_default_client_server_ipv6[asyncio]
test_tcp_client_server_ipv6[tornado]
test_tcp_client_server_ipv6[asyncio]
test_only_local_access
test_remote_access
test_adapt_then_manual
test_local_tls[True]
test_local_tls[False]
test_run_spec
test_balance_expensive_tasks[enough work to steal]

I think several of those may depend on a proper network. The host I'm
using actually has both ipv4 and ipv6 working. I'm using sbuild
automatically running autopkgtests on a oldish 2x4 8 core xeon server
with ~24 GB of ram

What's your test environment like?

I don't think head is hugely different from what was released in -1.

The diff looks like Andreas adjusted the dask dependency version,
configured a salsa CI run, and added some upstream metadata files

He had problems with a salsa build failure but that was with i386, I'm
currently setting up i386 to see if I can replicate the salsa failure.

Diane

#1030096#60
Date:
2023-02-07 07:31:44 UTC
From:
To:
Salsa CI.

That sounds like you're not looking at my branch at all.  As previously
stated, that's
https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/tree/fix1030096
(It's in a fork because I can't push to debian-python.)

See earlier in this bug for which test failures still remain.

#1030096#65
Date:
2023-02-08 07:06:33 UTC
From:
To:
I merged in most of Rebecca's changes via git cherry pick, though I
slightly edited the changelog. (making most entries a bullet point
instead of subheadings of the one line I left out).

I think I got the code to detect if IPv6 is available to work correctly
so I could set the DISABLE_IPV6 environment variable that
dask.distrubted supports.

I went with skipping the 32bit tests instead of xfailed because I don't
think they can work as written since I really think they're making
really large memory requests that can't ever succeed on 32bit.

You did a lot of work on trying to get the flaky tests to work more
reliably, and all that's included. Well except for the apply a patch
and then revert it.

All the merges are pushed to salsa debian/master. They also passed on
my local build host running i386.

Diane

#1030096#70
Date:
2023-02-08 07:58:23 UTC
From:
To:
https://salsa.debian.org/python-team/packages/dask.distributed/-/blob/debian/master/debian/tests/run-tests#L11

This doesn't work: because run-tests is set -e, failing this check
immediately ends the autopkgtest with failure (without even running the
main tests):
https://salsa.debian.org/python-team/packages/dask.distributed/-/jobs/3918057

I think you instead need the command inside the if-test, something like

if ip -6 addr | grep global ; then
    if ping6 -c 4 2001:4860:4860::8888 ; then
       echo "Working ipv6 connectivity"

but I haven't tested that.

Agreed - I'm possibly in the habit of using xfail rather than skip
because I'm often xfailing the kind of things that might get fixed later.

https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/commit/b82894aa5247cd11607b177d60975387c2fd796a
(marking a few more tests as flaky) isn't included.  However as
previously mentioned, I don't claim to know whether we actually should
be marking those particular tests as flaky, so it's fine to omit that
one if you think we shouldn't.

Also as previously mentioned, test_balance_expensive_tasks[enough work
to steal] seems to fail repeatedly when it fails, so if we want to
ignore that (again, I don't claim to know whether that's a good idea),
it needs an xfail/skip not just a flaky.

#1030096#75
Date:
2023-02-08 22:09:56 UTC
From:
To:
Hello,

So I discovered I'd forgotten to do git cherry-pick --continue so
missed the last patch from Rebecca. (b82894aa) Thank you so much for
working out a better strategy for the flaky tests.

I also found a computer I could log into that has has no working ipv6
support, and so could more quickly debug the ipv6 detection code, and
finally got a version of it that works correctly. This version just
uses ping but turns off set -e for the test.

I just got back a passed from salsa. So does anyone want to make any
more changes? Or should we release this version?

Diane

#1030096#80
Date:
2023-02-08 23:11:49 UTC
From:
To:
The *maybe* remaining issue is that test_balance_expensive_tasks[enough
work to steal] seems to fail repeatedly when it fails, so if we want to
ignore that (I don't claim to know whether that's a good idea), it needs
an xfail/skip not just a flaky.

I haven't seen that failure in your runs, but I don't know whether that
means you've fixed it or just that you were lucky.

Mostly, please upload *something* today, because we won't know for sure
whether it passes on a real buildd/debci until we try that, and if it
doesn't then the sooner we find out the better.

#1030096#85
Date:
2023-02-09 06:29:20 UTC
From:
To:
Am Wed, Feb 08, 2023 at 11:11:49PM +0000 schrieb Rebecca N. Palmer:

+1

Thanks to you both for all your work
   Andreas.

#1030096#90
Date:
2023-02-09 06:36:56 UTC
From:
To:
It's uploaded it earlier today dask.distributed is past buildd, but I
haven't seen dask.distributed on ci.debian.org yet.

Also there's still some flaky tests as the rebuild triggered by my just
committing the changelog release had a failure in "test_release_retry"

Diane

#1030096#95
Date:
2023-02-09 07:44:02 UTC
From:
To:
I don't think I've seen that particular one before, though like several
others it's a warning being treated as an error because dask.distributed
now does that (in setup.cfg).

debci doesn't appear to have run yet.  (If it does and fails, note that
there's a retry button next to failure reports.  Given how tight we are
on time (we need to be in testing by the 12th), I'd rather not re-upload
(restarting the migration clock) if we don't have to.)

Also, we need to close this bug (by email _not_ by uploading).

#1030096#100
Date:
2023-02-09 17:07:21 UTC
From:
To:
Would it make sense to drop those errors back to warnings, and do you
know enough about the setup.cfg language to do it quickly?

It says it failed on 4 tests on ci for amd64 but I could only find the
traceback for test_default_5 with a bunch of OS errors having run out
of file handles.

I went ahead and requested another run for the failed amd64 run and
left the passing arm64 run alone.


Also how did numba 0.56.4-1 get overridden to be back in testing?

Can we get dask.distributed forced back in? It looks like it mostly
works, it feels like we're mostly fighting over it not being robust to
environment specific issues.

Diane

#1030096#105
Date:
2023-02-09 21:21:55 UTC
From:
To:
Plausibly yes but I don't actually know, and remove the 'error' line at
setup.cfg:60.

That worked, but armel (test_steal_twice), armhf (something outright
crashing) and s390x (lots) all failed.

The place to ask is debian-release; no comment on the likely result.

#1030096#110
Date:
2023-02-09 21:33:23 UTC
From:
To:
My current frustrated idea is to do what's going on in d/rules and skip
the isinstalled tests.

My local build is running now, and I was probably thinking of pushing a
proposed -3 to salsa in an hour or so

Aren't those all still on -1? I only see amd64 and arm64 having run
2022.12.1+ds1-2

At https://ci.debian.net/packages/d/dask.distributed/

I'll try to ask.

Diane

#1030096#115
Date:
2023-02-09 21:53:33 UTC
From:
To:
run* the tests (just collected them, which was enough to fail on a
dask/dask.distributed mismatch), because it did -k "not ( $SKIP_TEST )"
when the variable was actually called SKIP_TESTS, causing

ERROR: Wrong expression passed to '-k': not (  ): at column 8: expected
not OR left parenthesis OR identifier; got right parenthesis

and apparently-no tests run.

(This was fixed in
https://salsa.debian.org/python-team/packages/dask.distributed/-/commit/24cb367f4608a72d9f770cc619af3520bfdb1990
, apparently without noticing that it had ever existed.)

Which makes this not-a-regression...

#1030096#120
Date:
2023-02-09 22:05:29 UTC
From:
To:
Upstream's reason for treating warnings as errors is just generic 'find
potential problems' (https://github.com/dask/distributed/issues/6048).

That summary listing is the wrong place to look for that information -
either use tracker.debian.org or click the 'testing' (*not* 'unstable')
links.

All 3 of them have failed repeatedly.  (armel's failure is sometimes
test_single_executable_deprecated instead.)

The armhf crash is a bus error (possibly unaligned memory access?) in
protocol/tests/test_highlevelgraph.py, and the traceback suggests it
*may* be in something other than dask.distributed, though it's also
possible that dask.distributed is copying objects around with the wrong
alignment:

   File
"/usr/lib/python3/dist-packages/pandas/core/array_algos/take.py", line
163 in _take_nd_ndarray
   File
"/usr/lib/python3/dist-packages/pandas/core/array_algos/take.py", line
117 in take_nd
   File
"/usr/lib/python3/dist-packages/pandas/core/internals/blocks.py", line
880 in take_nd
   File
"/usr/lib/python3/dist-packages/pandas/core/internals/managers.py", line
752 in <listcomp>
   File
"/usr/lib/python3/dist-packages/pandas/core/internals/managers.py", line
751 in reindex_indexer
   File
"/usr/lib/python3/dist-packages/pandas/core/internals/managers.py", line
978 in take
   File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line
3886 in _take
   File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line
3871 in take
   File "/usr/lib/python3/dist-packages/dask/dataframe/backends.py",
line 517 in group_split_pandas
   File "/usr/lib/python3/dist-packages/dask/utils.py", line 640 in __call__
   File "/usr/lib/python3/dist-packages/dask/dataframe/shuffle.py", line
941 in shuffle_group
   File "/usr/lib/python3/dist-packages/dask/layers.py", line 47 in __call__
   File "/usr/lib/python3/dist-packages/distributed/worker.py", line
3047 in apply_function_simple
   File "/usr/lib/python3/dist-packages/distributed/worker.py", line
3025 in apply_function
   File
"/usr/lib/python3/dist-packages/distributed/_concurrent_futures_thread.py",
line 65 in run
   File
"/usr/lib/python3/dist-packages/distributed/threadpoolexecutor.py", line
57 in _worker
   File "/usr/lib/python3.11/threading.py", line 975 in run
   File "/usr/lib/python3.11/threading.py", line 1038 in _bootstrap_inner
   File "/usr/lib/python3.11/threading.py", line 995 in _bootstrap

#1030096#125
Date:
2023-02-10 03:14:01 UTC
From:
To:
Some tests passed after I put it for (multiple) retries. The
current state looks fine

https://qa.debian.org/excuses.php?package=dask.distributed

But I am not sure if this counter would be set to 2 days (from 5 days)
or not -- will likely need to ask release team.
In any case it might be a nice idea to hold off any further uploads
until this migrates.

#1030096#132
Date:
2023-02-10 03:43:37 UTC
From:
To:
That explains why the tests suddenly passed when i wasn't looking.

When I'd last looked in the morning most of them were marked failed.

Yeah I need to beg the release team for forgiveness.

Though I made two changes that should dramatically increase the odds of
the tests passing.

One I told it to skip all the "isinstalled" marked tests, which are all
skipped during build time, and the build seems to run far more
reliably.

I got the idea because it seemed like the vast majority of test
failures are related to the daemons running or failing to shut down.

While talking on IRC about dask.distributed problems I learned of the
flaky autopkgtest restriction which says the test is expected to fail
intermittently and is not suitable for gating continuous integratin.

So I marked the current whole autopkgtest run as flaky.

So CI should also ignore the results of failed test runs now.

When under less time pressure I think a good plan might be two split
the tests internally marked as isinstalled or flaky into a separate
autopkgtest run that's marked flaky and let the CI gate on the more
reliable tests.

Diane

#1030096#137
Date:
2023-02-10 06:41:43 UTC
From:
To:
Hi,

Am Fri, Feb 10, 2023 at 08:44:01AM +0530 schrieb Nilesh Patra:

Thanks to you all three for your work.

As far as I observed the migration time is now 5 days (no matter whether
autopkgtest or not).

+1

Kind regards
   Andreas.

#1030096#142
Date:
2023-02-10 07:58:04 UTC
From:
To:
dependencies' autopkgtests haven't all been run yet, and will change to
2 days once they have:
https://release.debian.org/testing/freeze_policy.html

On 10/02/2023 03:43, Diane Trout wrote:
 > Though I made two changes that should dramatically increase the odds of
 > the tests passing.
 >
 > One I told it to skip all the "isinstalled" marked tests, which are all
 > skipped during build time, and the build seems to run far more
 > reliably.
 >
 > I got the idea because it seemed like the vast majority of test
 > failures are related to the daemons running or failing to shut down.

That might be true on amd64, but I don't think it's true of arm*/s390x:
the tests that are failing there do *not* appear to be isinstalled tests.

I suspect the tests wouldn't have worked on those architectures in
2022.02 either, and we didn't notice because the previously mentioned
bug was causing the autopkgtest to not actually run the tests.
(dask.distributed is arch:all, so it's only built once, presumably on
amd64.)

 > While talking on IRC about dask.distributed problems I learned of the
 > flaky autopkgtest restriction which says the test is expected to fail
 > intermittently and is not suitable for gating continuous integratin.
 >
 > So I marked the current whole autopkgtest run as flaky.
 >
 > So CI should also ignore the results of failed test runs now.

Having only flaky tests that fail counts as having no tests:
https://sources.debian.org/src/autopkgtest/5.28/doc/README.package-tests.rst/#L230

That presumably means 5 days, which we don't have, i.e. *don't* unless
release team tell you otherwise.

 > When under less time pressure I think a good plan might be two split
 > the tests internally marked as isinstalled or flaky into a separate
 > autopkgtest run that's marked flaky and let the CI gate on the more
 > reliable tests.

As stated above, that's probably the wrong split for this.

#1030096#147
Date:
2023-02-10 08:16:04 UTC
From:
To:
I've not seen the tracker getting reset to 2 days when a certain package doesn't show success on all archs.
Which means if a package passes on 3 archs, and shows the status as 'neutral' or 'not a regression' or 'tests not run on the architecture due to not being in the arch list (in d/t/control)' then I've seen a 5 day migration delay.
And this is the case for distributed (not a regression on 2 archs)

I don't know if things have changed lately, but I doubt.
In any case, I've sent a message on the release team IRC.
Best,
Nilesh

#1030096#152
Date:
2023-02-10 17:41:51 UTC
From:
To:
Ok yes there could also be issues with their serialization code on
unusual architectures.

It might not be enough to solve all dask.distributed's test problems,
(as you point out for the other architectures) but it does seem like
their test suites include both unit tests that don't depend on the host
networking, and integration tests that are strongly impacted by the
configuration of test runner.

And there might be value in separating the unit & integration test
types?

#1030096#157
Date:
2023-02-10 22:51:35 UTC
From:
To:

For what it's worth this is our answer from #debian-release

elbrus
detrout: I'll handle dask.distributed

detrout
elbrus, Thank you. sorry about needing to ask for an exception

elbrus
ack, thanks for working on the package; it wasn't pretty that we had to
remove it for the python3.11 transition

#1030096#162
Date:
2023-04-08 07:32:06 UTC
From:
To:
Since this bug is fixed, I am closing this.

Best,
Nilesh