Similarly to Balena, Nimbus the cloud HPC service uses a scheduler to manage how jobs are run and resources are allocated.
If multiple jobs ran on a single node at the same time users would be competing for the same resources and jobs take longer to run overall. A scheduler manages individual jobs, which are allocated to the resources they need as they become available. This results in a higher overall throughput and more consistent performance.
The scheduler used by Nimbus is the same as that used by it's predecessor Balena, slurm
: _Simple Linux Utility for Resource M_anagment.
INteracting with the sceduler is done through the terminal using an array of commands.
Below are a number of key slurm
commands:
Slurm command | Function |
---|---|
sinfo |
View information about SLURM nodes and partitions |
squeue |
List status of jobs in the queue |
squeue --user [userid] |
Jobs by user |
squeue --job [jobid] |
Jobs by jobid |
sbatch [jobscript] |
Submit a jobscript to the scheduler |
scancel [jobid] |
Cancel a job in the queue |
scontrol hold [jobid] |
Hold a job in the queue |
scontrol release [jobid] |
Release a held job |
scontrol show job [jobid] |
View information about a job |
scontrol show node nodename |
Get information of a node |
scontrol show license |
Get licenses available on SLURM |