“High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business.” - insideHPC
Lets consider seeveral research scenarios:
In all these cases, access to more (and larger) computers is needed. Those computers should be usable at the same time, solving many researchers’ problems in parallel.
Your PC is your local computing resource, good for small computational tasks. It is flexible, easy to set-up and configure for new tasks, though it has limited computational resources.
Let’s dissect what resources programs running on a laptop require:
When the task to solve become heavy on computations, the operations are typically out-sourced from the local laptop or desktop to elsewhere. Take for example the task to find the directions for your next business trip. The capabilities of your laptop are typically not enough to calculate that route in real time, so you use a website, which in turn runs on a computer that is almost always a machine that is not in the same room as you are. Such a remote machine is generically called a server.
The internet made it possible for these data centers to be far remote from your laptop.
The server itself has no direct display or input methods attached to it. But most importantly, it has much more storage, memory and compute capacity than your laptop will ever have. However, you still need a local device (laptop, workstation, mobile phone or tablet) to interact with this remote machine.
If the computational task or analysis to complete is daunting for a single server, larger agglomerations of servers are used. These go by the name of clusters or supercomputers.
A HPC system is typically described as a cluster as it is made up of a cluster
of computers, or compute nodes. Each individual compute node is typically a lot more powerful than any PC - i.e. more memory, many more and faster CPU cores.
The methodology of providing the input data, communicating options and flags as well as retrieving the results is quite different to using a plain laptop. Moreover, using a GUI style interface is often discarded in favor of using the command line. This imposes a double paradigm shift for prospect users:
But how doe we use the resources on the cluster?
Lets start with the idea of processing 1 input file to generate 1 output (result) file. On a personal computer this would happen with a single core in the CPU.
On a cluster we have access to many cores on a single node, so in theory we could split up the analysis of a single file into multiple distinct processes and use as many cores to speed up the generation of an output file. This is called multithreading, i.e. using multiple threads or cores.
Now, what if we had 3 input files? Well, we could process these files in serial, i.e. use the same core(s) over and over again, as shown in the image below.
This is great, but it is not as efficient as multithreading each analysis, and using a set of 8 cores for each of the three input samples. This is actually considered to be true parallelization.
As we have already seen, the power of HPC systems comes from parallelism, i.e. having lots of processors/disks etc. connected together rather than having more powerful components than your laptop or workstation.
Often, when running research programs on HPC you will need to run a program that has been built to use the MPI (Message Passing Interface) parallel library. The MPI library allows programs to exploit multiple processing cores in parallel to allow researchers to model or simulate faster on larger problem sizes. The details of how MPI work are not important for this course or even to use programs that have been built using MPI; however, MPI programs typically have to be launched in job submission scripts in a different way to serial programs and users of parallel programs on HPC systems need to know how to do this. Specifically, launching parallel MPI programs typically requires four things: