joblib is failing test_hash_numpy_performance on the riscv64
architecture:
https://ci.debian.net/data/autopkgtest/unstable/riscv64/j/joblib/71983388/log.gz
https://ci.debian.net/data/autopkgtest/testing/riscv64/j/joblib/71965180/log.gz
The error message is
3746s =================================== FAILURES ===================================
3746s _________________________ test_hash_numpy_performance __________________________
3746s
3746s @with_numpy
3746s @skipif(
3746s sys.platform == "win32",
3746s reason="This test is not stable under windows for some reason",
3746s )
3746s def test_hash_numpy_performance():
3746s """Check the performance of hashing numpy arrays:
3746s
3746s In [22]: a = np.random.random(1000000)
3746s
3746s In [23]: %timeit hashlib.md5(a).hexdigest()
3746s 100 loops, best of 3: 20.7 ms per loop
3746s
3746s In [24]: %timeit hashlib.md5(pickle.dumps(a, protocol=2)).hexdigest()
3746s 1 loops, best of 3: 73.1 ms per loop
3746s
3746s In [25]: %timeit hashlib.md5(cPickle.dumps(a, protocol=2)).hexdigest()
3746s 10 loops, best of 3: 53.9 ms per loop
3746s
3746s In [26]: %timeit hash(a)
3746s 100 loops, best of 3: 20.8 ms per loop
3746s """
3746s rnd = np.random.RandomState(0)
3746s a = rnd.random_sample(1000000)
3746s
3746s def md5_hash(x):
3746s return hashlib.md5(memoryview(x)).hexdigest()
3746s
3746s relative_diff = relative_time(md5_hash, hash, a)
3746s assert relative_diff < 0.3
3746s
3746s # Check that hashing an tuple of 3 arrays takes approximately
3746s # 3 times as much as hashing one array
3746s time_hashlib = 3 * time_func(md5_hash, a)
3746s time_hash = time_func(hash, (a, a, a))
3746s relative_diff = 0.5 * (abs(time_hash - time_hashlib) / (time_hash + time_hashlib))
3746s > assert relative_diff < 0.3
3746s E assert 0.3401559884459161 < 0.3
3746s
3746s /usr/lib/python3/dist-packages/joblib/test/test_hashing.py:238: AssertionError
(the test logs linked above also show some test_parallel errors. I think that is a separate issue)
The error in test_hash_numpy_performance is simply missing the timing
criteria by a small margin. riscv64 is evidently running a little
slower than other architectures.
Would it be appropriate to raise the upper limit for the relative_diff
assertion from 0.3 to, say, 0.35 or 0.4?
Issue raised upstream https://github.com/joblib/joblib/issues/1801