Below are example submission scripts for Nimbus to illustrate the typical workflow2.
These examples copy the run data from the submission directory in /campaign/
area to the fast local disk mounted at /mnt/resource/
before the run, and copy any necessary files back to /campaign/
afterwards. The script then cleans up by removing the working_directory
in the /mnt/resource
area (this will be done automatically, but is perhaps a good habit to get into for furture developments in terms of storage were this may not be the case).
Obviously there are an infinite number of ways to structure the logic in your submission scripts in order to handle the data transfer between /campaign/
and /mnt/resource
.
The following examples can be adapted to your own needs. Typically they will be submitted from the /campaign/
storage area.
Finally - if the local disk found in /mnt/resource/
is not big enough for your outputs then you can use the $BURSTBUFFER
environment variable which points to a folder set up with each run, specifically for that run.
Every line that starts with a #SBATCH
is a directive for slurm
this is where we tell slurm the resources we want for our job.
#SBATCH --account=
tells slurm what account you wish to run against. The account code used in your run script #SBATCH --account=ACCOUNT_CODE
should match the resource allocation you wish to run your job against. If you don't know your resource allocation code check the research computing account managemnet portal at rcam.bath.ac.uk
(you need to be on the University's VPN All traffic), or ask your account administrator. or use the command sacctmgr show associations user=userid --parsable2
. This command will also tell you the limits on the account, and what QOS (and partitions) you have access to.
#SBATCH --job-name=
gives the job a name, which you can identify in the queue - you can call the job whatever you want that helps you to identify it.
#SBATCH --partition=
tells slurm what partition to put the job on.
#SBATCH --qos=
tells slurm what Quality of Service you wish to run with. The QOS is simply a way for admins to apply rules to the resources you can access, and the priority of the job. For Nimbus HPC systems the qos will match the partition name - and remember if you don't include it you will get an error.
#SBATCH --time=
tells slurm how long you wish to run the job for, with several acceptable formats: "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes" and "days-hours:minutes:seconds" e.g. 1-01:20:00
will request a runtime of 25 hours and 20 mins.
Create a text file named test.txt
with the following in your campaign folder.
My first nimbus run
Now we can create a runscript to copy this file over to the fast local storage at /mnt/resource/
, print the details to an output file called my_output.txt
and copy the output back at the end of the run.
Hint: You can check what accounts you have access to with sacctmgr
, and you will also have access to a folder /campaign/account_code (not case sensitive - the campaign folder will be in capitals).
#!/bin/bash
#SBATCH --account=account_code_here
#SBATCH --job-name=JOB_NAME
#SBATCH --output=%x.%j.o
#SBATCH --error=%x.%j.e
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --partition=spot-fsv2-1
#SBATCH --qos=spot-fsv2-1
#SBATCH --time=04:00:00
# set campaigndir as our current working directory for copy back
campaigndir=$(pwd)
# create a workdir on the fast local disk
workdir=/mnt/resource/workdir
mkdir -p $workdir
# copy our input file over to our workdir
cp test.txt $workdir
# change dir to our workdir
cd $workdir
# do our run
cat test.txt > my_output.txt
# cp our output back to our campaigndir
cp my_output.txt $campaigndir/
We will run an example job using the OpenFoam module for the hbv3-120 instance
OpenFOAM/v2012-foss-2020a
An example run script - create a file named run_job.slm
with the following contents (remembering to replace the account code with that of your resource allocation):
#!/bin/bash
#SBATCH --account=prj3_phase1
#SBATCH --job-name=JOB_NAME
#SBATCH --output=JOB_NAME.%j.o
#SBATCH --error=JOB_NAME.%j.e
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=120
#SBATCH --partition=spot-hbv3-120
#SBATCH --qos=spot-hbv3-120
#SBATCH --time=04:00:00
# Load the openFOAM module
module purge
module load OpenFOAM/v2012-foss-2020a
# as en example we will copy the dambreak tutorial to our current
# directory
cp -r $WM_PROJECT_DIR/tutorials/multiphase/interFoam/laminar/damBreak ./
cd damBreak
# set campaigndir as our current working directory for copy back
campaigndir=$(pwd)
localdisk=/mnt/resource
mkdir $localdisk/workdir
workdir=$localdisk/workdir
# Copy any inputs required to the work directory:
# excluding and files with a <JOB_NAME> (whatever your job name actualy is...) prefix, so your
# output and error files don't get over written on the copy back
rsync -aP --exclude=JOB_NAME.* $campaigndir/* $workdir
cd $workdir;
echo "Work directory" $workdir ;
# source the foamDotFile and do the run
source $WM_PROJECT_DIR/etc/bashrc
./Allrun
# Copy back any results you need to campaign.
srun cp -Rf $workdir/* $campaigndir/
#Clean up burstbuffer
rm -rf $workdir
And to run the job issue the following command in the terminal:
sbatch run_job.slm
It is also possible to obtain a compute node and work interactively.
Issuing the following command in the terminal will request a spot-hbv3-120 compute instance for 6 hours:
srun --partition spot-hbv3-120 --nodes 1 --account prj4_phase1 --qos spot-hbv3-120 --job-name "interactive" --cpus-per-task 120 --time 6:00:00 --pty bash