#961977 enhance 64-bit support in PETSc

Package:
src:petsc
Source:
petsc
Submitter:
Henning
Date:
2021-06-07 11:03:03 UTC
Severity:
normal
Tags:
Blocked By:
Bug Title
961108

  0

openmpi: providing 64-bit MPI

normal stable testing unstable about 6 years ago

989497

  1

superlu-dist: provide 64-bit build

normal stable testing unstable about 5 years ago

961976

  0

mumps: provide 64-bit build

normal stable testing unstable about 6 years ago

989550

  5

suitesparse: enhance 64-bit support in suitesparse

normal stable testing unstable about 3 years ago

#961977#5
Date:
2020-03-04 18:20:21 UTC
From:
To:
Dear Maintainer,

*** Reporter, please consider answering these questions, where appropriate ***

   * What led up to the situation?
Applying the library using matrix dimensions higher than 46340.
   * What exactly did you do (or not do) that was effective (or
     ineffective)?
Setting the dimension to 54455
   * What was the outcome of this action?
Got a report that the product of the dimension is exceeding the limit,
I should compile the PETSc libraries with the --with-64-bit-indices option.
I assume that singed 4 Byte integers are used to describe the indices
and the product of the indices.
   * What outcome did you expect instead?
Just a normal diagonalization. Compiling the library with this option
solved the situation.
*** End of the template - remove these template lines ***

#961977#10
Date:
2020-05-05 10:01:05 UTC
From:
To:
Nice bug.  There are ramifications for upgrading all pointers to 64
bit.  Probably we don't want to do that without being explicit about
it.  64 bit pointers should be enabled right through the computational
library stack, or there will be a mismatch at some point causing
failure.

This ties in with developments of the BLAS packages. These now have 64
bit variants (libblas64-dev etc).

BLAS is at the bottom of the stack, 64 bit pointers need to be
activated each step of the way upwards. Need to check 64-bit PETSc is
consistent with scalapack and MUMPS, for instance.

#961977#15
Date:
2020-05-21 02:30:17 UTC
From:
To:
Hi Gard, bringing this question over to petsc bug#953116.

I was assuming we'd carry the two bit builds, "petsc-dev" and
"petsc64-dev", at least in the medium term.  This would follow what's in
place with BLAS.  It's also the practice in CRAY which offers both
cray-petsc and cray-petsc-64 modules.

But it's a good question to consider.

Certainly.  I can anticipate it might be quite disruptive if the
standard package just jumps to 64 bit. I imagine that would break
things.


One question to consider is why petsc doesn't just use 64 bits in the
first place on 64 bit systems.

I was under the impression that a 32 bit build actually runs faster on a
64 bit system, in the sense of getting twice as much done per clock
cycle. That you only need the 64 bit build if you actually need that
much address space (i.e. if your mesh carrys that many degree of freedom
DOFs)

I guess we should clear up whether that's true or not. It would be
regrettable to drop 32 bit if it means performance on smaller jobs is
diminished.

That could be a helpful tool.  We could include it the -dev packages.

Drew

#961977#22
Date:
2020-05-23 06:02:22 UTC
From:
To:
Hi, the Debian project is discussing whether we should start providing a
64 bit build of PETSc (which means we'd have to upgrade our entire
computational library stack, starting from BLAS and going through MPI,
MUMPS, etc).

A default PETSc build uses 32 bit addressing to index vectors and
matrices.  64 bit addressing can be switched on by configuring with
--with-64-bit-indices=1, allowing much larger systems to be handled.

My question for petsc-maint is, is there a reason why 64 bit indexing is
not already activated by default on 64-bit systems?  Certainly C
pointers and type int would already be 64 bit on these systems.

Is it a question of performance?  Is 32 bit indexing executed faster (in
the sense of 2 operations per clock cycle), such that 64-bit addressing
is accompanied with a drop in performance? In that case we'd only want
to use 64-bit PETSc if the system being modelled is large enough to
actually need it. Or is there a different reason that 64 bit indexing is
not switched on by default?

Drew

#961977#27
Date:
2020-05-23 06:18:43 UTC
From:
To:
Drew Parsons <dparsons@debian.org> writes:

You don't need to change BLAS or MPI.

Umm, x86-64 Linux is LP64, so int is 32-bit.  ILP64 is relatively exotic
these days.

Sparse iterative solvers are entirely limited by memory bandwidth;
sizeof(double) + sizeof(int64_t) = 16 incurs a performance hit relative
to 12 for int32_t.  It has nothing to do with clock cycles for
instructions, just memory bandwidth (and usage, but that is less often
an issue).

It's just about performance, as above.  There are two situations in
which 64-bit is needed.  Historically (supercomputing with thinner
nodes), it has been that you're solving problems with more than 2B dofs.
In today's age of fat nodes, it also happens that a matrix on a single
MPI rank has more than 2B nonzeros.  This is especially common when
using direct solvers.  We'd like to address the latter case by only
promoting the row offsets (thereby avoiding the memory hit of promoting
column indices):

https://gitlab.com/petsc/petsc/-/issues/333


I wonder if you are aware of any static analysis tools that can
flag implicit conversions of this sort:

int64_t n = ...;
for (int32_t i=0; i<n; i++) {
  ...
}

There is -fsanitize=signed-integer-overflow (which generates a runtime
error message), but that requires data to cause overflow at every
possible location.

#961977#32
Date:
2020-05-23 06:49:47 UTC
From:
To:
I see, the PETSc API allows for PetscBLASInt and PetscMPIInt distinct
from PetscInt. That gives us more flexibility. (In any case, the Debian
BLAS maintainer is already providing blas64 packages. We've started
discussions about MPI).

But what about MUMPS? Would MUMPS need to be built with 64 bit support
to work with 64-bit PETSc?
(the MUMPS docs indicate that its 64 bit support needs 64-bit versions
of BLAS, SCOTCH, METIS and MPI).


oh ok. I had assumed int was 64 bit on x86-64. Thanks for the
correction.


Thanks Jed.  That's good justification for us to keep our current 32-bit
built then, and provide a separate 64-bit build alongside it.

An interesting extra challenge.

I'll ask the Debian gcc team and the Science team if they have ideas
about this.

Drew

#961977#37
Date:
2020-05-23 12:27:01 UTC
From:
To:
In MUMPS's manual, it is called full 64-bit. Out of the same memory
bandwidth concern, MUMPS also supports selective 64-bit, in a sense it only
uses int64_t for selected variables. One can still use it with 32-bit BLAS,
MPI etc.  We support selective 64-bit MUMPS starting from petsc-3.13.0

#961977#42
Date:
2020-05-23 15:54:23 UTC
From:
To:
If I remember correctly - the 'full 64-bit' mode relies on fortran compiler option '-i8' - which is basally equivalent to ILP64 - and this mode only works
with ILP64 MPI, BLAS etc from Intel-MPI/MKL

We haven't tried using MUMPS in this mode with PETSc

Satish

#961977#47
Date:
2020-05-23 15:45:05 UTC
From:
To:
Note: OpenBLAS supports 64bit indices. MKL has bunch of packages built as ILP64

[MPICH/OpenMPI - as far as I know is LP64]


The primary reason PETSc defaults to 32bit indices is - this is the compiler default on LP64 systems.

If debian is building ILP64 system [with compilers defaulting to 64-bit integers] - that would mean all packages would be ILP64 [obviously most packages are not tested in this mode - so might break]

#961977#52
Date:
2020-05-24 01:51:38 UTC
From:
To:

Thanks Junchao.  Sounds like we can get started on providing 64-bit
MUMPS and PETSc without needing to wait for MPI then.
That's good timing with 3.13.

Drew

#961977#57
Date:
2020-05-24 02:01:57 UTC
From:
To:

If I understand correctly, the Debian systems are LP64 (so gcc defaults
to int=int32_t).
Our user who started these discussions with Bug#953116 reports that
--with-64-bit-indices is working fine for his local build. But he may 
not have tested using MUMPS in 64-bit PETSc.

This will be the interesting test. I'll start with the 64-bit build of
MUMPS and see how tests hold up.

Drew

#961977#62
Date:
2020-05-26 03:56:45 UTC
From:
To:

Hi Jed, Thomas Schiex from Debian Science has replied to this question,
suggesting clang-static-analyzer or lgtm:

   For open source projects, a few online static analyzers are available
and usable for free. This kind of integer type mismach will be caught by
most of them. Possibly clang-static-analyzer will do the job. Otherwise,
an easy one is lgtm for example. See  https://lgtm.com/

   (I have no link with them except as an open source software developer
using their services for free).

   There are other tools (mostly geared towards security)  available for
free for open source software but I just forgot their name. Any web
search tool should help you here.

   Thomas

#961977#67
Date:
2020-05-26 04:58:57 UTC
From:
To:
Drew Parsons <dparsons@debian.org> writes:

I had tried this first, but I think it requires significant work to implement.

This looks interesting, but it isn't obvious how to implement this sort
of check in their language.  They have a bunch of examples, but they
seem simpler.

#961977#72
Date:
2020-05-27 05:09:35 UTC
From:
To:
...


The PETSc mumps tests seem to be robust with respect to 64 bit.
(64 bit MUMPS in the form of -DPORD_INTSIZE64, not all-integer
-DINTSIZE64)

That is, 32 bit PETSc passes its tests with 64 bit (PORD) MUMPS
and 64 bit PETSc passes its tests with 32 bit MUMPS.

The test in question that's passing is src/snes/tutorials/ex19, run with
'make runex19_fieldsplit_mumps'
Perhaps it's not stress-testing 64 bit conditions.

Drew

#961977#79
Date:
2020-05-27 14:00:39 UTC
From:
To:
Could you provide more details, e.g., the error stack trace?
#961977#84
Date:
2020-05-31 15:33:22 UTC
From:
To:


Hi Junchao, PETSc's mumps test runs fine, there is no error to trace as
such, just a diff with the reference output.

With 32-bit PETSc and 64-bit [PORD] MUMPS,

$ mpirun -n 2 ./ex19 -pc_type fieldsplit -pc_fieldsplit_block_size 4
-pc_fieldsplit_type SCHUR -pc_fieldsplit_0_fields 0,1,2 
-pc_fieldsplit_1_fields 3 -fieldsplit_0_pc_type lu -fieldsplit_1_pc_type 
lu -snes_monitor_short -ksp_monitor_short
-fieldsplit_0_pc_factor_mat_solver_type mumps 
-fieldsplit_1_pc_factor_mat_solver_type mumps

returns the result:

lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
   0 SNES Function norm 0.239155
     0 KSP Residual norm 0.235858
     1 KSP Residual norm < 1.e-11
   1 SNES Function norm 6.81968e-05
     0 KSP Residual norm 2.30906e-05
     1 KSP Residual norm < 1.e-11
   2 SNES Function norm < 1.e-11
Number of SNES iterations = 2


where output/ex19_fieldsplit_5.out has

lid velocity = 0.0625, prandtl # = 1., grashof # = 1.
   0 SNES Function norm 0.239155
     0 KSP Residual norm 0.239155
     1 KSP Residual norm < 1.e-11
   1 SNES Function norm 6.81968e-05
     0 KSP Residual norm 6.81968e-05
     1 KSP Residual norm < 1.e-11
   2 SNES Function norm < 1.e-11
Number of SNES iterations = 2


So the diff in this case is

$make runex19_fieldsplit_mumps
3c3
<     0 KSP Residual norm 0.239155
---
6c6
<     0 KSP Residual norm 6.81968e-05
---

#961977#89
Date:
2020-06-01 11:11:03 UTC
From:
To:
clone 961185 -1
clone 953116 -2
thanks

MUMPS can enable 64-bit ordering (PORD) and 64-bit PETSc can use that,
while other integers remain default 32 bit (in BLAS, MPI etc). This is
the fast&easy® 64-bit build.

Full 64-bit requires 64-bit MPI.

The cloned versions of these bugs will continue to track progress
towards 64-bit MPI (and other packages).