Introducing the Shell

Overview:

  • Teaching: 10 min
  • Exercises: 5 min

Questions

  • What is a command shell and why would I use one?

Objectives

  • Explain how the shell relates to the keyboard, the screen, the operating system, and users’ programs.
  • Explain when and why command-line interfaces should be used instead of graphical interfaces.

Background

At a high level, computers do four things:

  • run programs
  • store data
  • communicate with each other, and
  • interact with us

They can do the last of these in many different ways, including through a keyboard and mouse, or touch screen interfaces, or speech recognition using systems. While such hardware interfaces are becoming more commonplace, most interaction is still done using screens, mice, touchpads and keyboards.

GUI vs CLI

Create new notebook

We are all familiar with graphical user interfaces (GUI): windows, icons and pointers. They are easy to learn and fantastic for simple tasks where a vocabulary consisting of "click" translates easily into "do the thing I want". But this magic relies on wanting a simple set of things, and having programs that can do exactly those things.

If you wish to do complex, purpose-specific things it helps to have a richer means of expressing your instructions to the computer. It doesn't need to be complicated or difficult, just a vocabulary of commands and a simple grammar for using them.

This is what the shell provides - a simple language and a command-line interface to use it through.

The heart of a command-line interface is a read-evaluate-print loop (REPL) called so because when you type a command and press . The shell reads it, executes (or "evaluates" it), prints the output, prints the prompt and waits for you to enter another command.

The Shell

This description makes it sound as though the user sends commands directly to the computer, and the computer sends output directly to the user. In fact, there is usually a program in between called a command shell. What the user types goes into the shell, which then figures out what commands to run and orders the computer to execute them. (Note that the shell is called “the shell” because it encloses the operating system in order to hide some of its complexity and make it simpler to interact with.)

A shell is a program like any other. What's special about it is that its job is to run other programs rather than to do calculations itself. The most popular Unix shell is Bash, the Bourne Again SHell (so-called because it's derived from a shell written by Stephen Bourne). Bash is the default shell on most modern implementations of Unix and in most packages that provide Unix-like tools for Windows.

Is it difficult?

It is a different model of interacting than a GUI, and that will take some effort - and some time - to learn. A GUI presents you with choices and you select one. With a command line interface (CLI) the choices are combinations of commands and parameters, more like words in a language than buttons on a screen. They are not presented to you so you must learn a few, like learning some vocabulary in a new language. But a small number of commands gets you a long way, and we'll cover those essential few today.

Flexibility and automation

The grammar of a shell allows you to combine existing tools into powerful pipelines and handle large volumes of data automatically. Sequences of commands can be written into a script, improving the reproducibility of workflows.

In addition, the command line is often the easiest way to interact with remote machines and supercomputers. Familiarity with the shell is near essential to run a variety of specialized tools and resources including high-performance computing systems. As clusters and cloud computing systems become more popular for scientific data crunching, being able to interact with the shell is becoming a necessary skill. We can build on the command-line skills covered here to tackle a wide range of scientific questions and computational challenges.

Let’s get started.

When the shell is first opened, you are presented with a prompt, indicating that the shell is waiting for input:

jupyter-user:~$

The shell typically uses $ as the prompt, typically with some information in front of it (in our case the username), but may use a different symbol. In the examples for this lesson, we’ll show the prompt as:

jupyter-user:~$
Important: When typing commands, either from these lessons or from other sources, do not type the prompt, only the commands that follow it. Also note that after you type a command, you have to press the Enter key to execute it.

Our First Command

You should be able to recognise your username in the command prompt (prefixed with a jupyter- as that is your username on this server), but lets check who we are by running our first command.

Type whoami into the terminal and press the enter key:

jupyter-user:~$whoami

Your server username jupyter-<bathusername> should have been written to the terminal, and the command prompt should have returned.

whoami is a simple program that outputs the ID of the current user, i.e. it tells us who the shell thinks we are. When we typed the whoami command the shell:

  1. Finds a program called whoami
  2. Runs the program whoami
  3. Displays the program's output
  4. Displays a new command prompt to tell us it's ready for more commands

whoareyou

What happens when you try the command whoareyou?

jupyter-user:~$whoareyou

Solution

Nelle's Pipeline: A Typical Problem

Nelle Nemo, a marine biologist, has just returned from a six-month survey of the North Pacific Gyre, where she has been sampling gelatinous marine life in the Great Pacific Garbage Patch. She has 1520 samples that she’s run through an assay machine to measure the relative abundance of 300 proteins. She needs to run these 1520 files through an imaginary program called goostats.sh she inherited. On top of this huge task, she has to write up results by the end of the month so her paper can appear in a special issue of Aquatic Goo Letters.

The bad news is that if she has to run goostats.sh by hand using a GUI, she’ll have to select and open a file 1520 times. If goostats.sh takes 30 seconds to run each file, the whole process will take more than 12 hours of Nelle’s attention. With the shell, Nelle can instead assign her computer this mundane task while she focuses her attention on writing her paper.

The next few lessons will explore the ways Nelle can achieve this. More specifically, they explain how she can use a command shell to run the goostats.sh program, using loops to automate the repetitive steps of entering file names, so that her computer can work while she writes her paper.

As a bonus, once she has put a processing pipeline together, she will be able to use it again whenever she collects more data.

In order to achieve her task, Nelle needs to know how to:

  • navigate to a file/directory
  • create a file/directory
  • check the length of a file
  • chain commands together
  • retrieve a set of files
  • iterate over files
  • run a shell script containing her pipeline

Key Points:

  • A shell is a program whose primary purpose is to read commands and run other programs.
  • The shell's main advantages are its high action-to-keystroke ratio, its support for automating repetitive tasks, and its capacity to access networked machines.
  • The shell's main disadvantages are its primarily textual nature and how cryptic its commands and operation can be.