Storage¶

Overview:

Teaching: 10 min
Exercises: 2 min

Questions

What storage is available on Nimbus?
How does this affect my workflow?
Where should I keep my data?

Objectives

Understand the storage structure of the new system.
Know where to keep your data, and how to stage data for runs.
Understand the costs associated with storage.

Filesystem on Nimbus¶

The filesystem on Nimbus, the University of Bath's Cloud HPC facility, has the following layout:

storage

The areas of the filesystem you should know about are:

/shared/home: 2TB Azure Managed Disk. Each user is given a home folder, with a strict quota of 5GB. You can check your quota usage with quota -s. This area can be accessed from both the login and compute nodes. This disk is backed up daily, and could be used for storing e.g.:
- Code/scripts for your calculations
- Template jobscripts
/campaign/: each resource allocation has an associated campaign storage area located at /campaign/<resource_allocation_id>. The quota for this disk space is set to 50GB for each account (account admins can allocate this to each Resource Allocation in the research computing account management portal). campaign is where you would keep files required for a current project thrust. This is not designed for long-term archival storage, and data should regularly be moved out of this area. Campaign storage is a resizable logical file system based on logical volume management (LVM). The file systems are created in multiple managed disks. Initially we have one 16TB disk, which can be increased to 64TB, with the ability to add more disks.
/apps/: 1TB Azure managed Disk. This space contains centrally compiled software for use of users.
/u/: The University H drive is mounted and can be accessed from the login nodes only.
/mnt/resource/: Some compute instances have fast local storage located in this area, only accessible from the compute node. e.g. HBv3 (450GB), HBv2(450GB), HB(450GB), HC44(650GB). Your jobs should be run here on the compute node in the first instance. Any files copied here or created will be cleaned up at the end of the job.
/burstbuffer/*/: when running a job on a compute node if your data will not fit in the fast local storage /mnt/resource you can access one of 26 blob stores in /burstbuffer/*/. An environment variable $BURSTBUFFER is made available in the environment when running a job, and the job specific burstbuffer gets destroyed at the end of the job. You can use this space to store temporary files that are needed during a job run, bu not afterwards. e.g. by staging your input data in $BURSTBUFFER, running your job there, and then only copying the outputs you require back to either /campaign or /home/. The blob performance is 60 GBps shared by jobs running on each blob store (allocated in a round robin fashion).
/x/: The parts of the University Research x-drive that are able to be mounted on the cluster can be found here.

Environment Variables¶

There are several environment variables to help you navigate the filesystem.

*$HOME = /shared/home/<username>: your home directory

*$BUCSHOME = /u/u/<username>: your University H folder

*$BURSTBUFFER = /burstbuffer/[a-z]/<slurm_job_id>: burst buffer blob store for use during job

Try these commands

echo $HOME
echo $BUCSHOME
cd $BUCSHOME
pwd

Go home

How can we use the environment variables we've used above to return to our home directory?

Solution

Storage Costs¶

Currently storage will not be charged, until a low cost tier three storage option is in place. At which point campaign storage will be charged, and account administrators will be able to allocate campaign storage, see usage and storage transactions in the research computing account management portal.

Although, we would encourage users to get into the habit of moving data around as a fundamental part of their workflow in order to optimise their own costs, and make most efficient use of the storage options available.

Key Points:

There are different storage areas on the Nimbus system.
Only your /home/ area is backed up.
Data required for your current project workflow should be kept in /campaign/.
You should run jobs in the fast local storage at /mnt/resource/ - make sure to copy any data you want back to campaign.
If your data will not fit in /mnt/resource/ you can use $BURSTBUFFER, is an environment variable pointing to temporary storage that can be used throughout the duration of your job - make sure to copy any data you want back to campaign.
You can use environment variables to navigate the filesystem.
Storage is currently not being charged, but will be in the future, so getting into good habits with respect to data management (and encouraging others to do the same) is certainly a good idea.