- Package:
- src:libthrust
- Source:
- libthrust
- Submitter:
- Paul Gevers
- Date:
- 2022-07-10 02:15:02 UTC
- Severity:
- normal
Dear maintainer, I was checking what was happening on our infrastructure as I was seeing degraded performance on several architectures, including several host running out of disk space and even one VM that hang. I don't have solid evidence that it's all caused by libthrust, but the results on amd64, arm64 and ppc64el don't inspire confidence that this package is entirely "innocent". Please consider making your test suite much less intense. Looking at our the stats [1] of our big amd64 worker, it really looks like the test was stressing it so much that we were building up a backlog of tests, which rarely happens on amd64. Your test on amd64 [2] took 12 hours to come to a "neutral" conclusion because 4 of them timed out (but marked flaky) and all others failed (while marked flaky) or passed while marked superficial. That's a poor result for such an extreme test. On arm64 and ppc64el your tests seem to tmpfail. I am *suspecting* that is because they run out of diskspace. All our arm64 and ppc64el workers have 40 GB disk and run two debci instances in parallel. For now, I have put libthrust on our rejectlist for those three architectures and I just flushed the amd64 queue because there were several libthrust test scheduled and we lack the facilities to remove individual tests from the queue. Thanks for using our facilities, but unfortunately we can't support the tests in their current form. Paul [1] https://ci.debian.net/munin/ci-worker13/ci-worker13/index.html [2] https://ci.debian.net/data/autopkgtest/testing/amd64/libt/libthrust/22073748/log.gz
We believe that the bug you reported is fixed in the latest version of libthrust, which is due to be installed in the Debian FTP archive. A summary of the changes between this version and the previous one is attached. Thank you for reporting the bug, which will now be closed. If you have further comments please address them to 1011646@bugs.debian.org, and the maintainer will reopen the bug report if appropriate. Debian distribution maintenance software pp. Andreas Beckmann <anbe@debian.org> (supplier of updated libthrust package) (This message was generated automatically at their request; if you believe that there is a problem with it please contact the archive administrators by mailing ftpmaster@ftp-master.debian.org) Format: 1.8 Date: Fri, 27 May 2022 06:20:52 +0200 Source: libthrust Architecture: source Version: 1.15.0-4 Distribution: unstable Urgency: medium Maintainer: Debian NVIDIA Maintainers <pkg-nvidia-devel@lists.alioth.debian.org> Changed-By: Andreas Beckmann <anbe@debian.org> Closes: 1011646 Changes: libthrust (1.15.0-4) unstable; urgency=medium . * Reduce amount of autopkgtest tests. (Closes: #1011646) Checksums-Sha1: 583f4c8b00d7ed98c2e26723cc30bfe4be8bd96a 2116 libthrust_1.15.0-4.dsc a44ad7257af07551789dd28576fb21c107bec7b3 7572 libthrust_1.15.0-4.debian.tar.xz d5df591f934cf9bbdf947404a626950609090d59 6284 libthrust_1.15.0-4_source.buildinfo Checksums-Sha256: 82cd1962124d04ec04790c7890712afb77496650325f99914c3cb25a1df7a6a1 2116 libthrust_1.15.0-4.dsc ec460ee5193357d12514f1d18ed8127ce4a92375886807479221d8566f5fcc20 7572 libthrust_1.15.0-4.debian.tar.xz 9c1f42557bec5561f32d9121cf3b20e187821b99106db740d4f64d6fe485acb5 6284 libthrust_1.15.0-4_source.buildinfo Files: c1df10cf6d6e2daf47e52c458c296bda 2116 libdevel optional libthrust_1.15.0-4.dsc adaabd0ac20b5117fc4f88122c0ca7b4 7572 libdevel optional libthrust_1.15.0-4.debian.tar.xz 5d8732e4bb9262ecb852a66a86338e95 6284 libdevel optional libthrust_1.15.0-4_source.buildinfo -----BEGIN PGP SIGNATURE----- iQJEBAEBCAAuFiEE6/MKMKjZxjvaRMaUX7M/k1np7QgFAmKQVJEQHGFuYmVAZGVi aWFuLm9yZwAKCRBfsz+TWentCLFBEACiXKB/fSjUZrdmlERRKdK3E1/lHk2mt45w brjnt9Ld99jzWte0Y0hWHqmSZ7vG3aheC6hxAQtUvsEjZzbXX8vhPr5xo6RVPKgC jLpg9WVtIirbRDoZhY4t43FIrTvydR8Hgcr1I8zZCY/F9HUnlakjnvjdFZW+Mm5P rjGnhNOX0pIYcArRN68rg7FA4Agsx9kfZzN9pnbLG2I2yM1cytW5o5lvlvaml11v XmjB0NMApIyT/MgfToT1CPEeYUvg9fMOqpw6rkTXiTxDc9wKYfuKxjc6883e+Y9K 16B4sGMzVp/x/e0Qfjb6s8TjchLQuRIpjQ/lK/ucOnAmkyj/wOuqmmQdHqJfm0Q6 ADdD6TZ62284t52ucHs1TP1t4TyokIwGhDIOpHxEyNd29sBiPIcixsiod2mV+SaQ 4cKseVW7Ckm7dWoPivJtSvuKFBXSO86DfhALfvYbgurYt5ocWNCinmEy9K92tpwE JfrabZjYO3TJBagDTVBpaQPJMNkqxdj64HQPAHM8osnOny2dic+Od7FX3NLzQxNl WqyFHxuIjCTd4iC18TGEdKAvfRU9yqz7TflIOKvpD8ZCINKFseRy0ybEQ/MrZwMo eUS+izFH6+r40xe6kJdTBGwfiISBm+wKAYgxZZo5EGyGkuR0eptUt/o5HwBr1IB8 IiXqNczReQ== =yfXy -----END PGP SIGNATURE-----
Upstream comes with a rather exhaustive test matrix ... I'm now running a smaller subset of it and in a more fine grained way to make it easier to decide which parts to skip as well. The OpenMP parts were even slower than on my machine ... so far I hadn't seen invididual tests time out in ctest. Synchronizing between 56 threads seems to be a hard job ;-) With the more fine grained testing I've tried to free disk space at the end of the smaller test chinks, not sure it that was successful. probably starting with amd64, to see whether I managed to get it down to an acceptable size. Hmm, a first test for -4 on amd64 has already finished (so the blacklisting did not work?), mostly telling me 'SKIP test name may not contain / character' (that should be checked by lintian). Preparing -5 now ... The tests are temporarily all on flaky to avoid introducing regressions while testing the tests ;-) The tests are all superficial, since we can't run (but only compile) the most relevant part of the testsuite: the cuda tests. Andreas PS: src:cub needs to be trimmed down as well, that has done some 12 h runs on ci-worker13, too ... not touching that before we have resolved src:libthrust
Hi I triggered that. Paul
OK. Could you give -5 a try? That should have valid test names ... Thanks. BTW, nvidia-cuda-toolkit currently seems stuck: autopkgtest for nvidia-cuda-toolkit/blocked-on-ci-infra: ppc64el: Regression Is this related to the libthrust blocks? Andreas
Hi That has already happened automatically, as the block has been removed. No, that's because it has it's own block [1]: nvidia-cuda-toolkit All ppc64el * test suite fails to start properly (disk space in unstable) Sorry, I forgot to file a bug about that. [1] https://ci.debian.net/status/reject_list/ Paul
arm64 still times out on the cuda parts ... :-( waiting for the ppc64el run ... Can you hint against that? That can't even be prevented with an Architecture setting. I've now copied the autopkgtest from src:nvidia-cuda-toolkit to src:pycuda - all we need is an installation of nvidia-cuda-toolkit, and that is < 5 GB. If that works, I'll drop the autopkgtest from src:nvidia-cuda-toolkit again. (pycuda didn't have any tests, yet, (its testsuite would want to run cuda code on a gpu), and the new test is fast enough to be run on salsa-ci) Andreas PS: I've updated src:cub and reduced the autopkgtest by 50%. It still fails early on arm64/ppc64el (due to some char/uchar mess), I'll take that upstream once I'm running on the latest upstream release.
Hi Andreas, The ppc64el one finished while I was checking. It also timed out and took in total 8 hours. Yes, but only because of the explanation you gave below (too big source). Hmm, I think that could be something that autopkgtest could check before starting, apt knows about this, doesn't it? Ack, hence not blaming nvidia-cuda-toolkit anymore. Ack. Paul
Hi Paul, with the latest tuned autopkgtest versions of src:libthrust and src:cub having migrated to testing, could you check whether these packages now only put an "acceptable" load on the CI infrastructure? (the src:nvidia-cuda-toolkit autopkgtest will be dropped with the next upload, src:pycuda should have equivalent tests now while requiring no extraordinary space to unpack :-) Andreas
Hi Andreas, that sense... Both packages still only results neutral (with flaky skips): [cub] cmake_find_package_CUB PASS (superficial) compile_testsuite_cuda-g++_C++17 PASS (superficial) compile_testsuite_cuda-g++_C++14 PASS (superficial) compile_testsuite_g++-11_C++17 FLAKY non-zero exit status 2 compile_testsuite_g++-10_C++17 SKIP exit status 77 and marked as skippable [libthrust] cmake_find_package_Thrust PASS (superficial) run_testsuite_CPP_C++17_g++-12 PASS (superficial) run_testsuite_CPP_C++17_g++-11 FLAKY non-zero exit status 8 run_testsuite_CPP_C++17_g++-10 PASS (superficial) run_testsuite_CPP_C++14 FLAKY non-zero exit status 8 run_testsuite_TBB_C++17 PASS (superficial) compile_testsuite_CPP_CUDA_C++17_cuda-g++ PASS (superficial) compile_testsuite_TBB_CUDA_C++17_cuda-g++ PASS (superficial) compile_testsuite_CPP_CUDA_C++17_g++-11 FLAKY non-zero exit status 2 compile_testsuite_TBB_CUDA_C++17_g++-11 FLAKY non-zero exit status 2 compile_testsuite_CPP_CUDA_C++17_g++-10 SKIP exit status 77 and marked as skippable compile_testsuite_TBB_CUDA_C++17_g++-10 SKIP exit status 77 and marked as skippable That's a bit disappointing for a test that takes around 5 to 7 hours (but better than before).. Alas. Please ping me when that future upload migrates, than I can drop my entry in the reject list. Paul
After yesterdays point release all nvidia-cuda-toolkit autopkgtests should be gone from the archive. (buster-backports to experimental). Andreas