MPI parallelization

TsunAWI now has experimental support for MPI parallelization based on a domain decomposition performed with metis [KK99].

Download and Installation

git clone https://gitlab.awi.de/tsunawi/tsunawi.git
git checkout mpi

TsunAWI’s executable ompTsuna.x can optionally be build with MPI support, and an additional executable partit.x is build to partition the computational mesh into subdomains.

1. Option: The old fashioned way with Makefile.in

  • Set MPIF90 to your MPI-Fortran compiler wrapper,
  • uncomment MPI_FLAG=-DUSE_MPI

2. Option: cmake

-DMPI=ON
Default: OFF

Running

Before you can run ompTsuna.x with a given numer of MPI tasks, the mesh must be partitioned into subdomains with partit.x. partit.x reads the number of partitions np from the command line and MeshPath from the namelist, computes the partitioning, and writes the information to MeshPath/dist_np. This partitioning can be reused for any subsequent MPI run with the same number of tasks as long as elemd2d.out and nod2d.out in MeshPath are not changed.

The following example has to be adjusted for the desired number of MPI tasks, the command to launch MPI codes mpiexec, and the way how binding of MPI tasks and OpenMP threads to compute cores is specified.

./partit.x 16
export OMP_NUM_THREADS=1
mpiexec -np 16 ./ompTsuna.x
# adjust parameters or bathymetry/topography data file and
# run another simulation on the same nod2d.out, elem2d.out
mpiexec -np 16 ./ompTsuna.x

# Or you can use the same decomposition to run in hybrid
# (MPI and OpenMP) mode, here on 4*16=64 compute cores:
export OMP_NUM_THREADS=4
export OMP_SCHEDULE=dynamic
<settings to ensure binding>
mpiexec -np 16 ./ompTsuna.x

Hierarchical partitioning

Hierarchical partitioning that reflects the computer architecture is possible, e.g., if you want to run on 4 compute nodes of a Linux cluster with 2 CPUs, each with 24 compute cores,

./partit.x 5 2 24

will divide the mesh into 5 larger subdomains, each is then split into 2 smaller subdomains, which in turn are split into 24 subdomains each, resulting in total in 192 small subdomains. Compared to ./partit.x 240, the communication pattern will be slightly better, with less messages to be send through the cluster’s interconnect. Remark: the resulting partition information are written to dist_240, the path name does not reflect the hierarchy!

So far, we achieved best performance with hybrid runs with 2 to 4 OpenMP threads joint in one MPI task. On the given example hardware, a good setup would be:

./partit.x 5 2 6
export OMP_NUM_THREADS=4
export OMP_SCHEDULE=dynamic
<settings to ensure binding>
mpiexec -np 60 ./ompTsuna.x

Limitations

The MPI implementation is still experimental, not all features of TsunAWI are covered. As this will increase step by step, we refrain from compiling a list of features here. In many code parts not yet MPI parallel, a warning will be issued.

Mesh partitioning with Metis

The Metis source code is included with TsunAWI with a modified build scheme to work seamlessly with TsunAWI’s Makefile and cmake approaches.

Please refer to the Metis homepage http://glaros.dtc.umn.edu/gkhome/views/metis and [KK99].

References

[KK99](1, 2) George Karypis and Vipin Kumar. A fast and highly quality multilevel scheme for partitioning irregular graph. SIAM Journal on Scientific Computing, 20(1):359—392, 1999. doi:10.5555/305219.305248.