- Package:
- dask.distributed
- Source:
- dask.distributed
- Submitter:
- Peter Green
- Date:
- 2023-04-08 07:45:05 UTC
- Severity:
- normal
The autopkgtest for dask.distributed is failing. https://ci.debian.net/data/autopkgtest/testing/amd64/d/dask.distributed/30859887/log.gz dask.distributed is currently not in testing, having been removed with a reason of "allow pandas and dask to migrate" as a result of this, the following packages are in violation of "packages must be buildable within the same release". tiledb-py ipyparallel satpy spyder-kernels
I have an attempt to fix this in Salsa now (in my fork), but it hasn't had time to run the tests yet, so I don't know whether it works. Note the above mention that if we lose dask.distributed then we lose Spyder, which makes this a bigger issue than I initially thought.
Am Sat, Feb 04, 2023 at 06:20:32PM +0000 schrieb Rebecca N. Palmer:
Any reason to not push to master directly?
We should definitely keep dask.distributed.
Kind regards
Andreas.
That removed most of the test failures, but there seem to be a few apparently random ones left (2 runs both failed, but with different errors). I don't do that with packages that aren't mine. Also see above.
Am Sat, Feb 04, 2023 at 10:43:59PM +0000 schrieb Rebecca N. Palmer:
Per our team policy that's fine and simply easier for other team members.
Thanks a lot for your work
Andreas.
I currently have this in a state where it sometimes succeeds and sometimes doesn't: https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/tree/fix1030096 Tests I've seen to fail multiple times (and don't have a fix for): test_balance_expensive_tasks[enough work to steal] https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/jobs/3902376 (Seems to be the most common problem. Using @pytest.mark.flaky to try 3 times doesn't seem to have helped, suggesting that if it fails once it keeps failing in that run. Applying part of upstream pull 7253 seemed to make things worse, but I haven't tried applying the whole thing.) test_popen_timeout https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/jobs/3902745 Tests I've seen to fail once: test_stress_scatter_death https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/jobs/3902040 test_tcp_many_listeners[asyncio] https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/jobs/3896327
(Background: the pandas + dask transition broke dask.distributed and it was hence removed from testing; I didn't notice at the time that if we don't get it back in we lose Spyder.) https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/jobs/3903956 (We don't have to pass build-i386 (as this is an arch:all package) or reprotest, but if these are effectively-random failures, they might also be able to occur in build or autopkgtest.) I'm probably the wrong person to be working on this - I don't know enough about this package to say whether ignoring this kind of intermittent failure (as my 'flaky' marks do) is appropriate, or to have much idea how to actually fix it. We could also try upgrading dask + dask.distributed to 2023.1, but that's a risky move at this point.
Hi Rebecca, Am Mon, Feb 06, 2023 at 07:59:17AM +0000 schrieb Rebecca N. Palmer: as far as I know Diane has put quite some effort into dask and I understood that dask and dask.distributed are closely interconnected. In several cases we decided to ignore some tests. While I like the idea to mark a test flaky instead ignoring it completely given your experience I think ignoring these tests is a valid way to proceed with this package for the moment. I agree that it is risky. We might discuss this with upstream and possibly use an experimental branch to verify how it works. It might be that later versions work better with later Pandas / Python3.11. However, the window of opportunity to get something in before the freeze is closing and I'm afraid we do not have time for experiments. Kind regards Andreas.
Hi now. My fragments of time were spent fighting with numba, and I didn't have the energy to be thinking about dask.distributed. Numba should be in a better place right now. So I can set my build machine to trying to build it and seeing where we are with it right now. The most important thing about dask / dask.distributed is they really should be at about the same upstream version. I'm not 100% sure how to mark that in the d/control file. Also upstream might have some ability to do minor releases independently. But if we do a new upstream release of dask, it needs to be paired with a new upstream release of dask.distributed. And in my experience dask.distributed is the one that's harder to get to work right. Diane
I agree that xfailing the tests *may* be a reasonable solution. I'm only saying that it should be done by someone with more idea than me of whether these particular tests are important, because blindly xfailing everything that fails is effectively not having tests. If we do choose that approach, at least test_balance_expensive_tasks needs to be an outright xfail/skip not just a flaky, because when it fails it fails repeatedly. I knew that, and was planning on 2022.12.1 of both when I decided to go ahead with pandas. What went wrong was that I only tested a build, not an autopkgtest, and thought the failing tests were dask.distributed's (known) inability to run all its tests in a buildd environment. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1027254#21 Depends: python3-dask (>= 2022.12.1~), python3-dask (<< 2022.12.2~) but I haven't tested that.
So my efforts at debugging are made harder by it working for me. I'm using a9771f68a28dfc65cae3ac6acf70451c264f3227 from Debian HEAD. = 2745 passed, 93 skipped, 216 deselected, 18 xfailed, 8 xpassed in 1992.20s (0:33:12) = I looked at the last log on ci.debian.org for dask.distributed https://ci.debian.net/data/autopkgtest/unstable/amd64/d/dask.distributed/31090863/log.gz And it looks like several of those errors are networking related. CI with the previously released 2022.12.1+ds.1-1 version is failing with these tests: test_defaults test_hostport test_file test_default_client_server_ipv6[tornado] test_default_client_server_ipv6[asyncio] test_tcp_client_server_ipv6[tornado] test_tcp_client_server_ipv6[asyncio] test_only_local_access test_remote_access test_adapt_then_manual test_local_tls[True] test_local_tls[False] test_run_spec test_balance_expensive_tasks[enough work to steal] I think several of those may depend on a proper network. The host I'm using actually has both ipv4 and ipv6 working. I'm using sbuild automatically running autopkgtests on a oldish 2x4 8 core xeon server with ~24 GB of ram What's your test environment like? I don't think head is hugely different from what was released in -1. The diff looks like Andreas adjusted the dask dependency version, configured a salsa CI run, and added some upstream metadata files He had problems with a salsa build failure but that was with i386, I'm currently setting up i386 to see if I can replicate the salsa failure. Diane
Salsa CI. That sounds like you're not looking at my branch at all. As previously stated, that's https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/tree/fix1030096 (It's in a fork because I can't push to debian-python.) See earlier in this bug for which test failures still remain.
I merged in most of Rebecca's changes via git cherry pick, though I slightly edited the changelog. (making most entries a bullet point instead of subheadings of the one line I left out). I think I got the code to detect if IPv6 is available to work correctly so I could set the DISABLE_IPV6 environment variable that dask.distrubted supports. I went with skipping the 32bit tests instead of xfailed because I don't think they can work as written since I really think they're making really large memory requests that can't ever succeed on 32bit. You did a lot of work on trying to get the flaky tests to work more reliably, and all that's included. Well except for the apply a patch and then revert it. All the merges are pushed to salsa debian/master. They also passed on my local build host running i386. Diane
https://salsa.debian.org/python-team/packages/dask.distributed/-/blob/debian/master/debian/tests/run-tests#L11 This doesn't work: because run-tests is set -e, failing this check immediately ends the autopkgtest with failure (without even running the main tests): https://salsa.debian.org/python-team/packages/dask.distributed/-/jobs/3918057 I think you instead need the command inside the if-test, something like if ip -6 addr | grep global ; then if ping6 -c 4 2001:4860:4860::8888 ; then echo "Working ipv6 connectivity" but I haven't tested that. Agreed - I'm possibly in the habit of using xfail rather than skip because I'm often xfailing the kind of things that might get fixed later. https://salsa.debian.org/rnpalmer-guest/dask.distributed/-/commit/b82894aa5247cd11607b177d60975387c2fd796a (marking a few more tests as flaky) isn't included. However as previously mentioned, I don't claim to know whether we actually should be marking those particular tests as flaky, so it's fine to omit that one if you think we shouldn't. Also as previously mentioned, test_balance_expensive_tasks[enough work to steal] seems to fail repeatedly when it fails, so if we want to ignore that (again, I don't claim to know whether that's a good idea), it needs an xfail/skip not just a flaky.
Hello, So I discovered I'd forgotten to do git cherry-pick --continue so missed the last patch from Rebecca. (b82894aa) Thank you so much for working out a better strategy for the flaky tests. I also found a computer I could log into that has has no working ipv6 support, and so could more quickly debug the ipv6 detection code, and finally got a version of it that works correctly. This version just uses ping but turns off set -e for the test. I just got back a passed from salsa. So does anyone want to make any more changes? Or should we release this version? Diane
The *maybe* remaining issue is that test_balance_expensive_tasks[enough work to steal] seems to fail repeatedly when it fails, so if we want to ignore that (I don't claim to know whether that's a good idea), it needs an xfail/skip not just a flaky. I haven't seen that failure in your runs, but I don't know whether that means you've fixed it or just that you were lucky. Mostly, please upload *something* today, because we won't know for sure whether it passes on a real buildd/debci until we try that, and if it doesn't then the sooner we find out the better.
Am Wed, Feb 08, 2023 at 11:11:49PM +0000 schrieb Rebecca N. Palmer: +1 Thanks to you both for all your work Andreas.
It's uploaded it earlier today dask.distributed is past buildd, but I haven't seen dask.distributed on ci.debian.org yet. Also there's still some flaky tests as the rebuild triggered by my just committing the changelog release had a failure in "test_release_retry" Diane
I don't think I've seen that particular one before, though like several others it's a warning being treated as an error because dask.distributed now does that (in setup.cfg). debci doesn't appear to have run yet. (If it does and fails, note that there's a retry button next to failure reports. Given how tight we are on time (we need to be in testing by the 12th), I'd rather not re-upload (restarting the migration clock) if we don't have to.) Also, we need to close this bug (by email _not_ by uploading).
Would it make sense to drop those errors back to warnings, and do you know enough about the setup.cfg language to do it quickly? It says it failed on 4 tests on ci for amd64 but I could only find the traceback for test_default_5 with a bunch of OS errors having run out of file handles. I went ahead and requested another run for the failed amd64 run and left the passing arm64 run alone. Also how did numba 0.56.4-1 get overridden to be back in testing? Can we get dask.distributed forced back in? It looks like it mostly works, it feels like we're mostly fighting over it not being robust to environment specific issues. Diane
Plausibly yes but I don't actually know, and remove the 'error' line at setup.cfg:60. That worked, but armel (test_steal_twice), armhf (something outright crashing) and s390x (lots) all failed. The place to ask is debian-release; no comment on the likely result.
My current frustrated idea is to do what's going on in d/rules and skip the isinstalled tests. My local build is running now, and I was probably thinking of pushing a proposed -3 to salsa in an hour or so Aren't those all still on -1? I only see amd64 and arm64 having run 2022.12.1+ds1-2 At https://ci.debian.net/packages/d/dask.distributed/ I'll try to ask. Diane
run* the tests (just collected them, which was enough to fail on a dask/dask.distributed mismatch), because it did -k "not ( $SKIP_TEST )" when the variable was actually called SKIP_TESTS, causing ERROR: Wrong expression passed to '-k': not ( ): at column 8: expected not OR left parenthesis OR identifier; got right parenthesis and apparently-no tests run. (This was fixed in https://salsa.debian.org/python-team/packages/dask.distributed/-/commit/24cb367f4608a72d9f770cc619af3520bfdb1990 , apparently without noticing that it had ever existed.) Which makes this not-a-regression...
Upstream's reason for treating warnings as errors is just generic 'find potential problems' (https://github.com/dask/distributed/issues/6048). That summary listing is the wrong place to look for that information - either use tracker.debian.org or click the 'testing' (*not* 'unstable') links. All 3 of them have failed repeatedly. (armel's failure is sometimes test_single_executable_deprecated instead.) The armhf crash is a bus error (possibly unaligned memory access?) in protocol/tests/test_highlevelgraph.py, and the traceback suggests it *may* be in something other than dask.distributed, though it's also possible that dask.distributed is copying objects around with the wrong alignment: File "/usr/lib/python3/dist-packages/pandas/core/array_algos/take.py", line 163 in _take_nd_ndarray File "/usr/lib/python3/dist-packages/pandas/core/array_algos/take.py", line 117 in take_nd File "/usr/lib/python3/dist-packages/pandas/core/internals/blocks.py", line 880 in take_nd File "/usr/lib/python3/dist-packages/pandas/core/internals/managers.py", line 752 in <listcomp> File "/usr/lib/python3/dist-packages/pandas/core/internals/managers.py", line 751 in reindex_indexer File "/usr/lib/python3/dist-packages/pandas/core/internals/managers.py", line 978 in take File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 3886 in _take File "/usr/lib/python3/dist-packages/pandas/core/generic.py", line 3871 in take File "/usr/lib/python3/dist-packages/dask/dataframe/backends.py", line 517 in group_split_pandas File "/usr/lib/python3/dist-packages/dask/utils.py", line 640 in __call__ File "/usr/lib/python3/dist-packages/dask/dataframe/shuffle.py", line 941 in shuffle_group File "/usr/lib/python3/dist-packages/dask/layers.py", line 47 in __call__ File "/usr/lib/python3/dist-packages/distributed/worker.py", line 3047 in apply_function_simple File "/usr/lib/python3/dist-packages/distributed/worker.py", line 3025 in apply_function File "/usr/lib/python3/dist-packages/distributed/_concurrent_futures_thread.py", line 65 in run File "/usr/lib/python3/dist-packages/distributed/threadpoolexecutor.py", line 57 in _worker File "/usr/lib/python3.11/threading.py", line 975 in run File "/usr/lib/python3.11/threading.py", line 1038 in _bootstrap_inner File "/usr/lib/python3.11/threading.py", line 995 in _bootstrap
Some tests passed after I put it for (multiple) retries. The current state looks fine https://qa.debian.org/excuses.php?package=dask.distributed But I am not sure if this counter would be set to 2 days (from 5 days) or not -- will likely need to ask release team. In any case it might be a nice idea to hold off any further uploads until this migrates.
That explains why the tests suddenly passed when i wasn't looking. When I'd last looked in the morning most of them were marked failed. Yeah I need to beg the release team for forgiveness. Though I made two changes that should dramatically increase the odds of the tests passing. One I told it to skip all the "isinstalled" marked tests, which are all skipped during build time, and the build seems to run far more reliably. I got the idea because it seemed like the vast majority of test failures are related to the daemons running or failing to shut down. While talking on IRC about dask.distributed problems I learned of the flaky autopkgtest restriction which says the test is expected to fail intermittently and is not suitable for gating continuous integratin. So I marked the current whole autopkgtest run as flaky. So CI should also ignore the results of failed test runs now. When under less time pressure I think a good plan might be two split the tests internally marked as isinstalled or flaky into a separate autopkgtest run that's marked flaky and let the CI gate on the more reliable tests. Diane
Hi, Am Fri, Feb 10, 2023 at 08:44:01AM +0530 schrieb Nilesh Patra: Thanks to you all three for your work. As far as I observed the migration time is now 5 days (no matter whether autopkgtest or not). +1 Kind regards Andreas.
dependencies' autopkgtests haven't all been run yet, and will change to 2 days once they have: https://release.debian.org/testing/freeze_policy.html On 10/02/2023 03:43, Diane Trout wrote: > Though I made two changes that should dramatically increase the odds of > the tests passing. > > One I told it to skip all the "isinstalled" marked tests, which are all > skipped during build time, and the build seems to run far more > reliably. > > I got the idea because it seemed like the vast majority of test > failures are related to the daemons running or failing to shut down. That might be true on amd64, but I don't think it's true of arm*/s390x: the tests that are failing there do *not* appear to be isinstalled tests. I suspect the tests wouldn't have worked on those architectures in 2022.02 either, and we didn't notice because the previously mentioned bug was causing the autopkgtest to not actually run the tests. (dask.distributed is arch:all, so it's only built once, presumably on amd64.) > While talking on IRC about dask.distributed problems I learned of the > flaky autopkgtest restriction which says the test is expected to fail > intermittently and is not suitable for gating continuous integratin. > > So I marked the current whole autopkgtest run as flaky. > > So CI should also ignore the results of failed test runs now. Having only flaky tests that fail counts as having no tests: https://sources.debian.org/src/autopkgtest/5.28/doc/README.package-tests.rst/#L230 That presumably means 5 days, which we don't have, i.e. *don't* unless release team tell you otherwise. > When under less time pressure I think a good plan might be two split > the tests internally marked as isinstalled or flaky into a separate > autopkgtest run that's marked flaky and let the CI gate on the more > reliable tests. As stated above, that's probably the wrong split for this.
I've not seen the tracker getting reset to 2 days when a certain package doesn't show success on all archs. Which means if a package passes on 3 archs, and shows the status as 'neutral' or 'not a regression' or 'tests not run on the architecture due to not being in the arch list (in d/t/control)' then I've seen a 5 day migration delay. And this is the case for distributed (not a regression on 2 archs) I don't know if things have changed lately, but I doubt. In any case, I've sent a message on the release team IRC. Best, Nilesh
Ok yes there could also be issues with their serialization code on unusual architectures. It might not be enough to solve all dask.distributed's test problems, (as you point out for the other architectures) but it does seem like their test suites include both unit tests that don't depend on the host networking, and integration tests that are strongly impacted by the configuration of test runner. And there might be value in separating the unit & integration test types?
For what it's worth this is our answer from #debian-release elbrus detrout: I'll handle dask.distributed detrout elbrus, Thank you. sorry about needing to ask for an exception elbrus ack, thanks for working on the package; it wasn't pretty that we had to remove it for the python3.11 transition
Since this bug is fixed, I am closing this. Best, Nilesh