So far we have run jobs on single nodes. but a we have already seen, the power of HPC systems comes from parallelism, i.e. having lots of processors/disks etc. connected together rather than having more powerful components than your laptop or workstation. Often, when running research programs on HPC you will need to run a program that has been built to use the MPI (Message Passing Interface) parallel library. The MPI library allows programs to exploit multiple processing cores in parallel to allow researchers to model or simulate faster on larger problem sizes. The details of how MPI work are not important for this course or even to use programs that have been built using MPI; however, MPI programs typically have to be launched in job submission scripts in a different way to serial programs and users of parallel programs on HPC systems need to know how to do this. Specifically, launching parallel MPI programs typically requires four things:
Below is a simple parallel implementation of the hello world program written in fortran.
PROGRAM hello_world_mpi
include 'mpif.h'
integer process_Rank, size_Of_Cluster, ierror, tag ,status,resultlen
character*(MPI_MAX_PROCESSOR_NAME) name
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size_Of_Cluster, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, process_Rank, ierror)
call MPI_GET_PROCESSOR_NAME(name, resultlen, ierror)
print *, 'Hello World from process: ',TRIM(name), process_Rank, 'of ', size_Of_Cluster
call MPI_FINALIZE(ierror)
END PROGRAM
Create a new file named hello_world_mpi.f90
with the above in it. In order to run the program we must first compile it with the following command (make sure you have the openmpi module loaded first):
mpif90 hello_world_mpi.f90 -o hello_world_mpi
When you have issued this command you should have an executable in your directory called hello_world_mpi
. We will now run it on our cluster.
In order to run the hello_world_mpi
program in parallel we will need to write a run script.
#!/bin/bash
#SBATCH --account=prj0_phase1
#SBATCH --job-name=myjob
#SBATCH --partition=shortJob
#SBATCH --qos=shortJob
#SBATCH --time=0-00:15:00
#SBATCH --output=%x.%j.o
#SBATCH --error=%x.%j.e
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=1
module load mpi/openmpi-x86_64
mpirun -np 4 hello_world_mpi
We can see there have been some new additions:
--ntasks=
: this tells slurm the total number of processes we would like for the job.--ntasks-per-node=
: this tells slurm the number of processes to place on each node.module load mpi/openmpi-x86_64
: in order to run the job using MPI we need to have the MPI module loaded.mpirun -np 4 hello_world_mpi
: this actually runs the program. mpirun is the program that executes the program in parallel. -np 4
tells mpirun we want 4 processes, and hello_world_mpi
is the name of our program. It is possible to have fine grained control over where your MPI processes are placed (see man sbatch
, or https://slurm.schedmd.com/sbatch.html) for more information.
It is also possible to simply ask for a certain number of nodes:
#SBATCH --nodes=
--ntasks
and --ntasks-per-node
directives you can tell slurm the compute resources you wish to use. #SBATCH --nodes=
man sbatch