MPI parallelization¶
TsunAWI now has experimental support for MPI parallelization based on a domain decomposition performed with metis [KK99].
Download and Installation¶
git clone https://gitlab.awi.de/tsunawi/tsunawi.git
git checkout mpi
TsunAWI’s executable ompTsuna.x
can optionally be build with MPI support, and an additional executable partit.x
is build to partition the computational mesh into subdomains.
1. Option: The old fashioned way with Makefile.in¶
- Set
MPIF90
to your MPI-Fortran compiler wrapper, - uncomment
MPI_FLAG=-DUSE_MPI
2. Option: cmake¶
-DMPI=ON
- Default:
OFF
Running¶
Before you can run ompTsuna.x
with a given numer of MPI tasks, the
mesh must be partitioned into subdomains with partit.x
.
partit.x
reads the number of partitions np from the command line and
MeshPath from the namelist, computes the partitioning, and writes the
information to MeshPath/dist_np
. This partitioning can
be reused for any subsequent MPI run with the same number of
tasks as long as elemd2d.out
and nod2d.out
in MeshPath are not changed.
The following example has to be adjusted for the desired number of MPI
tasks, the command to launch MPI codes mpiexec
, and the way how
binding of MPI tasks and OpenMP threads to compute cores is specified.
./partit.x 16
export OMP_NUM_THREADS=1
mpiexec -np 16 ./ompTsuna.x
# adjust parameters or bathymetry/topography data file and
# run another simulation on the same nod2d.out, elem2d.out
mpiexec -np 16 ./ompTsuna.x
# Or you can use the same decomposition to run in hybrid
# (MPI and OpenMP) mode, here on 4*16=64 compute cores:
export OMP_NUM_THREADS=4
export OMP_SCHEDULE=dynamic
<settings to ensure binding>
mpiexec -np 16 ./ompTsuna.x
Hierarchical partitioning¶
Hierarchical partitioning that reflects the computer architecture is possible, e.g., if you want to run on 4 compute nodes of a Linux cluster with 2 CPUs, each with 24 compute cores,
./partit.x 5 2 24
will divide the mesh into 5 larger subdomains, each is then split into
2 smaller subdomains, which in turn are split into 24 subdomains each,
resulting in total in 192 small subdomains. Compared to ./partit.x
240
, the communication pattern will be slightly better, with less
messages to be send through the cluster’s interconnect.
Remark: the resulting partition information are written to dist_240,
the path name does not reflect the hierarchy!
So far, we achieved best performance with hybrid runs with 2 to 4 OpenMP threads joint in one MPI task. On the given example hardware, a good setup would be:
./partit.x 5 2 6
export OMP_NUM_THREADS=4
export OMP_SCHEDULE=dynamic
<settings to ensure binding>
mpiexec -np 60 ./ompTsuna.x
Limitations¶
The MPI implementation is still experimental, not all features of TsunAWI are covered. As this will increase step by step, we refrain from compiling a list of features here. In many code parts not yet MPI parallel, a warning will be issued.
Mesh partitioning with Metis¶
The Metis source code is included with TsunAWI with a modified build scheme to work seamlessly with TsunAWI’s Makefile and cmake approaches.
Please refer to the Metis homepage http://glaros.dtc.umn.edu/gkhome/views/metis and [KK99].
References¶
[KK99] | (1, 2) George Karypis and Vipin Kumar. A fast and highly quality multilevel scheme for partitioning irregular graph. SIAM Journal on Scientific Computing, 20(1):359—392, 1999. doi:10.5555/305219.305248. |