Slurm

Overview:

  • Teaching: 10 min
  • Exercises: 0 min

Questions

  • What is a scheduler
  • How can I use slurm to manage and run my jobs
  • What slurm commands can I use to explore the system

Objectives

  • Know that the scheduler manages jobs on the service
  • Know how to interact with slurm to:
    • see what jobs are running
    • cancel jobs
    • find out information about my jobs

Scheduler

Similarly to Balena, Nimbus the cloud HPC service uses a scheduler to manage how jobs are run and resources are allocated.

If multiple jobs ran on a single node at the same time users would be competing for the same resources and jobs take longer to run overall. A scheduler manages individual jobs, which are allocated to the resources they need as they become available. This results in a higher overall throughput and more consistent performance.

The scheduler used by Nimbus is the same as that used by it's predecessor Balena, slurm: _Simple Linux Utility for Resource M_anagment.

Slurm: Simple Linux Utility for Resource Managment

INteracting with the sceduler is done through the terminal using an array of commands.

Below are a number of key slurm commands:

Slurm command Function
sinfo View information about SLURM nodes and partitions
squeue List status of jobs in the queue
squeue --user [userid] Jobs by user
squeue --job [jobid] Jobs by jobid
sbatch [jobscript] Submit a jobscript to the scheduler
scancel [jobid] Cancel a job in the queue
scontrol hold [jobid] Hold a job in the queue
scontrol release [jobid] Release a held job
scontrol show job [jobid] View information about a job
scontrol show node nodename Get information of a node
scontrol show license Get licenses available on SLURM

Key Points:

  • We use a scheduler to manage jobs on the cloud HPC service
  • Nimbus uses the slurm scheduler
  • Key commands are:
    • sbatch to submit jobs
    • sinfo to view information about the service
    • squeue to view the queue
    • scancel to delete a job

You can find further information about slurm and the commands here: http://slurm.schedmd.com/