#1125694 openmolcas: autopkgtest fails on ppc64el due to numerical tolerance with OpenBLAS

#1125694#5
Date:
2025-11-22 11:03:31 UTC
From:
To:
Dear maintainer(s),

With a recent upload of openblas the autopkgtests of dolfin, gemma,
openmolcas, and xtensor-blas fail on ppc64el in testing when their
autopkgtest is run with the binary packages of openblas from unstable.
It passes when run with only packages from testing. In tabular form (for
dolfin):

                        pass            fail
openblas               from testing    0.3.30+ds-3
dolfin                 from testing    2019.2.0~legacy20240219.1c52e83-24
all others             from testing    from testing

I copied some of the output at the bottom of this report.

Currently this regression is blocking the migration of openblas to
testing [1]. Can you please investigate the situation?

Someone pointed me at https://github.com/OpenMathLib/OpenBLAS/pull/5463
which may completely unrelated, but at least include changes for ppc64el.

Paul

[1] https://qa.debian.org/excuses.php?package=openblas

https://ci.debian.net/data/autopkgtest/testing/ppc64el/d/dolfin/66420095/log.gz

284s       Start 72: demo_singular-poisson_serial
285s 29/49 Test #72: demo_singular-poisson_serial
..............Subprocess aborted***Exception:   1.17 sec
285s terminate called after throwing an instance of 'std::runtime_error'
285s   what():  285s 285s ***
-------------------------------------------------------------------------
285s *** DOLFIN encountered an error. If you are not able to resolve
this issue
285s *** using the information listed below, you can ask for help at
285s ***
285s ***     https://fenicsproject.discourse.group/
285s ***
285s *** Remember to include the error message listed below and, if
possible,
285s *** include a *minimal* running example to reproduce the error.
285s ***
285s ***
-------------------------------------------------------------------------
285s *** Error:   Unable to successfully call PETSc function 'KSPSolve'.
285s *** Reason:  PETSc error code is: 76 (Error in external library).
285s *** Where:   This error was encountered inside
./dolfin/la/PETScKrylovSolver.cpp.
285s *** Process: 0
285s *** 285s *** DOLFIN version: 2019.2.0.64.dev0
285s *** Git changeset:  debian_2019.2.0~legacy20240219.1c52e83-24
285s ***
-------------------------------------------------------------------------
285s 285s [ci-325-a328d227:14164] *** Process received signal ***
285s [ci-325-a328d227:14164] Signal: Aborted (6)
285s [ci-325-a328d227:14164] Signal code:  (-6)
285s [ci-325-a328d227:14164] [ 0]
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0) [0x3fffbb002494]
285s [ci-325-a328d227:14164] [ 1]
/lib/powerpc64le-linux-gnu/libc.so.6(+0xafd3c) [0x3fffb807fd3c]
285s [ci-325-a328d227:14164] [ 2]
/lib/powerpc64le-linux-gnu/libc.so.6(gsignal+0x2c) [0x3fffb801663c]
285s [ci-325-a328d227:14164] [ 3]
/lib/powerpc64le-linux-gnu/libc.so.6(abort+0x28) [0x3fffb7ff65f0]
285s [ci-325-a328d227:14164] [ 4]
/lib/powerpc64le-linux-gnu/libstdc++.so.6(_ZN9__gnu_cxx27__verbose_terminate_handlerEv+0x158)
[0x3fffb83ae858]
285s [ci-325-a328d227:14164] [ 5]
/lib/powerpc64le-linux-gnu/libstdc++.so.6(+0x119ad4) [0x3fffb83a9ad4]
285s [ci-325-a328d227:14164] [ 6]
/lib/powerpc64le-linux-gnu/libstdc++.so.6(_ZSt9terminatev+0x20)
[0x3fffb835527c]
285s [ci-325-a328d227:14164] [ 7]
/lib/powerpc64le-linux-gnu/libstdc++.so.6(__cxa_throw+0x7c) [0x3fffb83a9fec]
285s [ci-325-a328d227:14164] [ 8]
/lib/powerpc64le-linux-gnu/libdolfin.so.2019.2t64(_ZNK6dolfin6Logger12dolfin_errorENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES6_S6_i+0xc1c)
[0x3fffbada1bbc]
285s [ci-325-a328d227:14164] [ 9]
/lib/powerpc64le-linux-gnu/libdolfin.so.2019.2t64(_ZN6dolfin12dolfin_errorENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES5_S5_z+0x184)
[0x3fffbad9e354]
285s [ci-325-a328d227:14164] [10]
/lib/powerpc64le-linux-gnu/libdolfin.so.2019.2t64(_ZN6dolfin11PETScObject11petsc_errorEiNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES6_+0x370)
[0x3fffbad72cc0]
285s [ci-325-a328d227:14164] [11]
/lib/powerpc64le-linux-gnu/libdolfin.so.2019.2t64(_ZN6dolfin17PETScKrylovSolver5solveERNS_11PETScVectorERKS1_b+0x1c44)
[0x3fffbad57ad4]
285s [ci-325-a328d227:14164] [12]
/lib/powerpc64le-linux-gnu/libdolfin.so.2019.2t64(_ZN6dolfin17PETScKrylovSolver5solveERNS_13GenericVectorERKS1_+0x58)
[0x3fffbad58068]
285s [ci-325-a328d227:14164] [13]
/tmp/autopkgtest-lxc.qr0i5fy5/downtmp/build.bm6/src/dolfin-demo/documented/singular-poisson/cpp/demo_singular-poisson(+0xb5c8)
[0x13e65b5c8]
285s [ci-325-a328d227:14164] [14]
/lib/powerpc64le-linux-gnu/libc.so.6(+0x26f0c) [0x3fffb7ff6f0c]
285s [ci-325-a328d227:14164] [15]
/lib/powerpc64le-linux-gnu/libc.so.6(__libc_start_main+0x1ac)
[0x3fffb7ff714c]
285s [ci-325-a328d227:14164] *** End of error message ***

#1125694#14
Date:
2025-11-25 14:41:49 UTC
From:
To:
Le samedi 22 novembre 2025 à 12:03 +0100, Paul Gevers a écrit :

I talked to upstream about the problem (in an issue that was initially
about a FTBFS, due to a failure in OpenBLAS own testsuite, which has
since been fixed):
https://github.com/OpenMathLib/OpenBLAS/issues/5372#issuecomment-3353517450

Unfortunately upstream does not really know where the test failures in
third-party software come from. In particular, they can’t replicate the
issue (note that they tried with more recent git snapshot than version
0.3.30), and I couldn’t either with Debian version 0.3.30+ds-3 (tried
on the ppc64el Debian porterbox).

At this point, fixing this issue is beyond my time budget and skills (I
know next to zero about PowerPC, and the issue is probably due to some
changes to PowerPC assembly code). CC’ing the Debian PowerPC porters,
with the hope that they can help.

#1125694#21
Date:
2026-01-01 19:04:25 UTC
From:
To:
user debian-powerpc@lists.debian.org
usertag 1121177 ppc64el
thanks

Dear ppc64el porters,


We're in dire need of your help, the issue is stalling openblas'
migration to testing and because it's a key package, autoremoval doesn't
work.

Paul

#1125694#26
Date:
2026-01-02 05:36:34 UTC
From:
To:


Thanks for the ping.

I’m currently reproducing the issue on the ppc64el side and
investigating the root cause. Since openblas is a key package, this
needs a proper fix rather than a workaround.

Let me go through the bug and I’ll update with findings.

Thanks,
Trupti

#1125694#31
Date:
2026-01-02 06:23:52 UTC
From:
To:


Thanks for the ping.

I’m currently reproducing the issue on the ppc64el side and
investigating the root cause. Since openblas is a key package, this
needs a proper fix rather than a workaround.

Let me go through the bug and I’ll update with findings.

Thanks,
Trupti

#1125694#36
Date:
2026-01-05 14:06:26 UTC
From:
To:
Hello,

I tried building the package on different Power systems and observed a
machine-specific failure.
The build completes successfully on a POWER9 (p9) system, but fails
during the test phase on a POWER10 (p10) system.

On p10, the build fails with test errors:

RESULTS: 1522 tests (1518 ok, 4 failed, 0 skipped) ran in 565 ms
make[3]: *** [Makefile:87: run_test] Error 4
make[3]: Leaving directory
'/build/reproducible-path/openblas-0.3.30+ds/0-pthread/utest'
make[2]: *** [Makefile:177: tests] Error 2
make[2]: Leaving directory
'/build/reproducible-path/openblas-0.3.30+ds/0-pthread'
make[1]: *** [debian/rules:165: test_0-pthread] Error 2
make[1]: Leaving directory '/build/reproducible-path/openblas-0.3.30+ds'
make: *** [debian/rules:99: binary-arch] Error 2
dpkg-buildpackage: error: debian/rules binary-arch subprocess failed
with exit status 2



On p9, the package builds and completes successfully, including all
tests, and the binary packages are generated as expected.This indicates
that the issue is specific to POWER10 rather than a general ppc64el
failure.

I am currently investigating the failing tests on p10 to identify the
root cause and will share updates once I have more information.


For p9:

    debian/rules override_dh_shlibdeps
make[1]: Entering directory '/path/openblas/openblas-0.3.30+ds'
dh_shlibdeps -plibopenblas0-pthread -plibopenblas0-openmp
-plibopenblas0-serial -- -xlibopenblas0
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 from: /lib64/ld64.so.2
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 to: /lib64/ld64.so.2.usr-is-merged
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 from: /lib64/ld64.so.2
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 to: /lib64/ld64.so.2.usr-is-merged
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 from: /lib64/ld64.so.2
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 to: /lib64/ld64.so.2.usr-is-merged
dh_shlibdeps -plibopenblas64-0-pthread -plibopenblas64-0-openmp
-plibopenblas64-0-serial -- -xlibopenblas64-0
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 from: /lib64/ld64.so.2
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 to: /lib64/ld64.so.2.usr-is-merged
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 from: /lib64/ld64.so.2
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 to: /lib64/ld64.so.2.usr-is-merged
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 from: /lib64/ld64.so.2
dpkg-shlibdeps: warning: diversions involved - output may be incorrect
  diversion by libc6 to: /lib64/ld64.so.2.usr-is-merged
dh_shlibdeps --remaining-packages -a
make[1]: Leaving directory '/Path/openblas/openblas-0.3.30+ds'
    dh_installdeb
    dh_gencontrol
    dh_md5sums
    dh_builddeb
dpkg-deb: building package 'libopenblas0' in
'../libopenblas0_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas0-pthread' in
'../libopenblas0-pthread_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas0-pthread-dbgsym' in
'../libopenblas0-pthread-dbgsym_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas0-openmp' in
'../libopenblas0-openmp_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas0-openmp-dbgsym' in
'../libopenblas0-openmp-dbgsym_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas0-serial' in
'../libopenblas0-serial_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas0-serial-dbgsym' in
'../libopenblas0-serial-dbgsym_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas-dev' in
'../libopenblas-dev_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas-pthread-dev' in
'../libopenblas-pthread-dev_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas-openmp-dev' in
'../libopenblas-openmp-dev_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-0' in
'../libopenblas64-0_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas-serial-dev' in
'../libopenblas-serial-dev_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-0-pthread-dbgsym' in
'../libopenblas64-0-pthread-dbgsym_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-0-pthread' in
'../libopenblas64-0-pthread_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-0-openmp-dbgsym' in
'../libopenblas64-0-openmp-dbgsym_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-0-openmp' in
'../libopenblas64-0-openmp_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-0-serial' in
'../libopenblas64-0-serial_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-dev' in
'../libopenblas64-dev_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-pthread-dev' in
'../libopenblas64-pthread-dev_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-0-serial-dbgsym' in
'../libopenblas64-0-serial-dbgsym_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-serial-dev' in
'../libopenblas64-serial-dev_0.3.30+ds-3_ppc64el.deb'.
dpkg-deb: building package 'libopenblas64-openmp-dev' in
'../libopenblas64-openmp-dev_0.3.30+ds-3_ppc64el.deb'.
  dpkg-genbuildinfo -O../openblas_0.3.30+ds-3_ppc64el.buildinfo
  dpkg-genchanges -O../openblas_0.3.30+ds-3_ppc64el.changes
dpkg-genchanges: info: not including original source code in upload
  dpkg-source --after-build .
dpkg-buildpackage: info: binary and diff upload (original source NOT
included)
Now running lintian openblas_0.3.30+ds-3_ppc64el.changes ...
Finished running lintian.

Thanks,
Trupti

#1125694#41
Date:
2026-01-06 20:49:07 UTC
From:
To:
Hello Paul,

I was able to reproduce the autopkgtest failure for xtensor-blas on
ppc64el locally. And I have attached both falling and working logs.



[ RUN      ] xlinalg.pinv
/tmp/autopkgtest.nrAywe/autopkgtest_tmp/test_linalg.cpp:239: Failure
Value of: allclose(expected, res)
   Actual: false
Expected: true
[  FAILED  ] xlinalg.pinv (0 ms)

[----------] Global test environment tear-down
[==========] 77 tests from 6 test suites ran. (7 ms total)
[  PASSED  ] 76 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] xlinalg.pinv

  1 FAILED TEST
make[3]: *** [CMakeFiles/xtest.dir/build.make:70: CMakeFiles/xtest]
Error 1
make[2]: *** [CMakeFiles/Makefile2:188: CMakeFiles/xtest.dir/all] Error
2
make[1]: *** [CMakeFiles/Makefile2:195: CMakeFiles/xtest.dir/rule] Error
2
make: *** [Makefile:192: xtest] Error 2
autopkgtest [23:35:52]: test command2: -----------------------]
autopkgtest [23:35:52]: test command2:  - - - - - - - - - - results - -
- - - - - - - -
command2             FAIL non-zero exit status 2
autopkgtest [23:35:52]: @@@@@@@@@@@@@@@@@@@@ summary
command1             FAIL non-zero exit status 2
command2             FAIL non-zero exit status 2


The failure occurs in the test:
xlinalg.pinv
test/test_linalg.cpp


When running the test locally on ppc64el with OpenBLAS 0.3.30, the
maximum numerical difference between the expected result and
xt::linalg::pinv() output is:

max diff ≈ 7.0e-09
mean diff ≈ 2.7e-09


With the current test tolerance (allclose default / 1e-12), the test
fails.
When the tolerance is relaxed to 1e-8, the test passes consistently and
all results are numerically stable.

This indicates the failure is due to test tolerance rather than a
functional regression.
kindly consider reviewing the test tolerance.


Thanks,
Trupti

#1125694#46
Date:
2026-01-06 20:57:10 UTC
From:
To:
Dear Trupti,

Le mercredi 07 janvier 2026 à 02:19 +0530, Trupti a écrit :

Thanks a lot for your investigation and for the recommendation.

If you have the time, could you possibly also check that the two other
autopkgtest regressions (in src:gemma and src:openmolcas) are also
tolerance-related? (see https://tracker.debian.org/pkg/openblas for the
list of autopkgtest regressions)

#1125694#51
Date:
2026-01-07 04:52:50 UTC
From:
To:

Yes, I will do it. And share you the results as soon as possible.


Thanks,
Trupti

#1125694#56
Date:
2026-01-07 05:59:58 UTC
From:
To:
Hi,

Other possible causes:

  - IBM vs IEEE ldbl ABI
  - -march=native or -mtune=native

Especially the latter sometimes does surprising things.

    Simon

#1125694#61
Date:
2026-01-07 18:05:24 UTC
From:
To:
For src:gemma, the autopkgtest failure on ppc64el occurs during the
eigen-decomposition step.
The run reports a warning about many eigenvalues close to zero, followed
by an LU decomposition failure in GSL/LAPACK.
The failure is triggered in the following code path:

// LU decomposition.
void LUDecomp(gsl_matrix *LU, gsl_permutation *p, int *signum) {
   // debug_msg("entering");
   enforce_gsl(gsl_linalg_LU_decomp(LU, p, signum));
   return;
}



For src:openmolcas, the autopkgtest failures on ppc64el are limited to
CASPT2 tests (standard tests 009, 010 and hdf5 test 601).
The logs show floating-point exceptions (IEEE invalid, divide-by-zero,
underflow) followed by CASPT2 convergence failures (_NOT_CONVERGED_ /
_INTERNAL_ERROR_).
All non-CASPT2 tests complete successfully.  The CASPT2 output itself
indicates numerical instability and suggests increasing
linear-dependence thresholds.

I have attached the relevant .out and .err files from the failing tests
for reference

Running test standard: 005... (26%) OK
Running test standard: 006... (31%) OK
Running test standard: 009... (36%) Failed! (caspt2)
Running test standard: 010... (42%) Failed! (caspt2)
Running test standard: 011... (47%) OK
Running test standard: 012... (52%) OK
Running test standard: 014... (57%) OK
Running test standard: 015... (63%) OK
Running test standard: 019... (68%) OK
Running test standard: 023... (73%) OK
Running test standard: 025... (78%) OK
Running test standard: 026... (84%) OK
Running test standard: 028... (89%) OK
Running test standard: 029... (94%) OK
Running test hdf5: 601... (100%) Failed! (caspt2)
----> 009.err:
Note: The following floating-point exceptions are signalling:
IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO IEEE_UNDERFLOW_FLAG
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG
Note: The following floating-point exceptions are signalling:
IEEE_INVALID_FLAG IEEE_DIVIDE_BY_ZERO IEEE_UNDERFLOW_FLAG
[ process      0]: xquit (rc =     96): _NOT_CONVERGED_
Note: The following floating-point exceptions are signalling:
IEEE_UNDERFLOW_FLAG
----> 009.out:
ATVX     3  Mu3.0001  Se3.004                      -0.00118554
-0.00099545     -0.05076084      0.00005053
.....
.....
   Total nr of CASPT2 parameters:
    Before reduction:        1582
    After  reduction:        1488

  Computing the right-hand side (RHS) elements
  --------------------------------------------
   Using conventional MKRHS algorithm

  Variance of |WF0>:       0.0892508119

  The contributions to the second order correlation energy in atomic
units.
-----------------------------------------------------------------------------------------------------------------------------
   IT.      VJTU        VJTI        ATVX        AIVX        VJAI
BVAT        BJAT        BJAI        TOTAL       RNORM
-----------------------------------------------------------------------------------------------------------------------------
    1    -0.000526   -0.001147   -0.005220   -0.005485   -0.000292
-0.009164   -0.001570   -0.000613   -0.024018    0.043402
   SIGMA D. ICASE1,ISYM1:         1         1
            ICASE2,ISYM2:        14         5
   Colossal value detected in SIGMA.
   This implies that the thresholds used for linear
   dependence removal must be increased.
   Present values, THRSHN, THRSHS:   1.0000000000000000E-010
1.0000000000000000E-008
   Use keyword THRESHOLD in input to increase these
   values and then run again.
--- Stop Module: caspt2 at Thu Jan  8 04:02:30 2026 /rc=-6 ---



Thanks,
Trupti

#1125694#66
Date:
2026-01-14 19:05:26 UTC
From:
To:
Le mercredi 07 janvier 2026 à 23:35 +0530, Trupti a écrit :

Do you consider the following as a good summary of your analysis: there
is no structural problem in the new OpenBLAS on ppc64el (just slightly
numerically different results, within the usual tolerance of numerical
software), and as a consequence the adjustment needs to be done in the
testsuite of the affected reverse dependencies (xtensor-blas, gemma and
openmolcas)? (that seems clear from what you said of src:xtensor-blas
and src:openmolcas, less so for the case of src:gemma, hence my
question)

Paul: if my statement above is correct, what would be the right course
of action?

#1125694#71
Date:
2026-01-14 19:29:09 UTC
From:
To:
Hi Sébastien,
The right action would be to get those pacakges fixed, e.g. via filing
bugs (cloning and reassigning might be appropriate). When the bugs are
in place, I can hint src:openblas into testing.

For info: regularly when I see a package break another package, I file a
bug against both packages. I didn't do that in this case because there
seemed to be a pattern, and I believe that often means a problem in the
breaking package. But sometimes it's the other way around.

Paul