Transferring Files

Overview:

  • Teaching: 10 min

Questions

  • How do I get files onto and off a HPC system?

Objectives

  • Be able to move your files to and from a HPC cluster

Transferring files

Computing with a remote computer offers very limited use if we cannot get files to or from the cluster. There are several options for transferring data between computing resources, from command line options to GUI programs, which we will cover here.

Download files from the internet using wget

One of the most straightforward ways to download files is to use wget. Any file that can be downloaded in your web browser with an accessible link can be downloaded using wget. This is a quick way to download datasets or source code.

The syntax is: wget https://some/link/to/a/file.tar.gz. For example, download an example file with the following command:

jupyter-user:$wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-4.17.2.tar.xz

Chek the file was indeed downloaded:

jupyter-user:$ls
linux-4.17.2.tar.xz

ANd then lets clean it up:

jupyter-user:$rm linux-4.17.2.tar.xz

Transferring single files and folders with scp

To copy a single file to or from the cluster, we can use scp. The syntax can be a little complex for new users, but we’ll break it down here:

To transfer to another computer:

user@laptop:~$ scp /path/to/local/file.txt userid@nimbus.hpc.bath.ac.uk:/path/on/remote/computer

To download from another computer:

user@laptop:~$ scp userid@nimbus.hpc.bath.ac.uk:/path/on/remote/computer/file.txt /path/to/local/

Note that we can simplify doing this by shortening our paths. On the remote computer, everything after the : is relative to our home directory. We can simply just add a : and leave it at that if we don’t care where the file goes.

user@laptop:~$ scp local-file.txt userid@nimbus.hpc.bath.ac.uk:

To recursively copy a directory, we just add the -r (recursive) flag:

user@laptop:~$ scp -r some-local-folder/ userid@nimbus.hpc.bath.ac.uk:target-directory/

Tranfser a file

Open a terminal on your system and try using the scp command to send a file to nimbus.

rsync

As you gain experience with transferring files, you may find the scp command limiting. The rsync utility provides advanced features for file transfer and is typically faster compared to both scp and sftp (see below). It is especially useful for transferring large and/or many files and creating synced backup folders.

The syntax is similar to scp. To transfer to another computer with commonly used options:

[local]$ rsync -avzP /path/to/local/file.txt userid@nimbus.hpc.bath.ac.uk:/path/on/remote/computer

The a (archive) option preserves file timestamps and permissions among other things; the v (verbose) option gives verbose output to help monitor the transfer; the z (compression) option compresses the file during transit to reduce size and transfer time; and the P (partial/progress) option preserves partially transferred files in case of an interruption and also displays the progress of the transfer.

To recursively copy a directory, we can use the same options:

[local]$ rsync -avzP /path/to/local/dir userid@nimbus.hpc.bath.ac.uk:/path/on/remote/computer

The a (archive) option implies recursion.

To download a file, we simply change the source and destination:

[local]$ rsync -avzP userid@nimbus.hpc.bath.ac.uk:/path/on/remote/computer/file.txt /path/to/local/

Transferring files interactively with FileZilla (sftp)

FileZilla is a cross-platform client for downloading and uploading files to and from a remote computer. It is absolutely fool-proof and always works quite well. It uses the sftp protocol. You can read more about using the sftp protocol in the command line here.

Download and install the FileZilla client from https://filezilla-project.org. After installing and opening the program, you should end up with a window with a file browser of your local system on the left hand side of the screen. When you connect to the cluster, your cluster files will appear on the right hand side.

To connect to the cluster, we’ll just need to enter our credentials at the top of the screen:

  • Host: sftp://login.archer2.ac.uk
  • User: Your cluster username
  • Password: Your cluster password
  • Port: (leave blank to use the default port)

Hit “Quickconnect” to connect! You should see your remote files appear on the right hand side of the screen. You can drag-and-drop files between the left (local) and right (remote) sides of the screen to transfer files.

Key Points

  • wget downloads a file from the internet.
  • scp transfer files to and from your computer.
  • You can use an SFTP client like FileZilla to transfer files through a GUI.