Parallel Jobs

Overview:

  • Teaching: 10 min
  • Exercises: 10 min

Questions

  • How do I run a job on multiple compute nodes?
  • How do I check the restrictions on the partitions?

Objectives

  • Compile and run a job on several nodes.
  • Check what limits are on different partitions.

Running a parallel job

So far we have run jobs on single nodes. but a we have already seen, the power of HPC systems comes from parallelism, i.e. having lots of processors/disks etc. connected together rather than having more powerful components than your laptop or workstation. Often, when running research programs on HPC you will need to run a program that has been built to use the MPI (Message Passing Interface) parallel library. The MPI library allows programs to exploit multiple processing cores in parallel to allow researchers to model or simulate faster on larger problem sizes. The details of how MPI work are not important for this course or even to use programs that have been built using MPI; however, MPI programs typically have to be launched in job submission scripts in a different way to serial programs and users of parallel programs on HPC systems need to know how to do this. Specifically, launching parallel MPI programs typically requires four things:

  • A special parallel launch program such as mpirun, mpiexec, srun or aprun.
  • A specification of how many processes to use in parallel. For example, our parallel program may use 256 processes in parallel.
  • A specification of how many parallel processes to use per compute node. For example, if our compute nodes each have 32 cores we often want to specify 32 parallel processes per node.
  • The command and arguments for our parallel program.

Hello world

Below is a simple parallel implementation of the hello world program written in fortran.

PROGRAM hello_world_mpi
include 'mpif.h'

integer process_Rank, size_Of_Cluster, ierror, tag ,status,resultlen

character*(MPI_MAX_PROCESSOR_NAME) name


call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, size_Of_Cluster, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, process_Rank, ierror)
call MPI_GET_PROCESSOR_NAME(name, resultlen, ierror)

print *, 'Hello World from process: ',TRIM(name), process_Rank, 'of ', size_Of_Cluster

call MPI_FINALIZE(ierror)
END PROGRAM

Create a new file named hello_world_mpi.f90 with the above in it. In order to run the program we must first compile it with the following command (make sure you have the openmpi module loaded first):

mpif90 hello_world_mpi.f90 -o hello_world_mpi

When you have issued this command you should have an executable in your directory called hello_world_mpi. We will now run it on our cluster.

Running in Parallel

In order to run the hello_world_mpi program in parallel we will need to write a run script.

#!/bin/bash

#SBATCH --account=prj0_phase1
#SBATCH --job-name=myjob
#SBATCH --partition=shortJob
#SBATCH --qos=shortJob
#SBATCH --time=0-00:15:00
#SBATCH --output=%x.%j.o
#SBATCH --error=%x.%j.e
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=1

module load mpi/openmpi-x86_64
mpirun -np 4 hello_world_mpi

We can see there have been some new additions:

  • --ntasks=: this tells slurm the total number of processes we would like for the job.
  • --ntasks-per-node=: this tells slurm the number of processes to place on each node.
  • module load mpi/openmpi-x86_64: in order to run the job using MPI we need to have the MPI module loaded.
  • mpirun -np 4 hello_world_mpi: this actually runs the program. mpirun is the program that executes the program in parallel. -np 4 tells mpirun we want 4 processes, and hello_world_mpi is the name of our program.

submit the parallel job

Compile the parallel job using the instructions above and submit it with the above run script. What happens? Can you figure out why? (Hint: scontrol can also be used to give you information about partitions)

Solution

submit the parallel job to an appropriate partition

Using the scontrol command check the node limit on the bigJob partition and submit a job with the maximum number of nodes.

Check the output to see where each MPI process is running.

configuring parallel jobs

On our test cluster we have 4 nodes, each with 1 CPU. Typically on HPC systems each node will have multiple CPUS.

What slurm directives and mpirun argument would you use to run a job using all processes on three nodes on a partition with 16 CPU nodes?

Solution

It is possible to have fine grained control over where your MPI processes are placed (see man sbatch, or https://slurm.schedmd.com/sbatch.html) for more information.

It is also possible to simply ask for a certain number of nodes:

#SBATCH --nodes=

Key Points

  • Using the --ntasks and --ntasks-per-node directives you can tell slurm the compute resources you wish to use.
  • You can also simply specify the number of nodes with #SBATCH --nodes=
  • For more information on placing processes and finer control see man sbatch