Skip to content

Lesson 2: Overview of Biowulf environment and navigating Unix file systems

Quick review

In lesson 1, we saw an overview of the course series and learned of the rationale for using Biowulf. Importantly, we learned to connect to our Biowulf accounts from our local machine using the ssh command found in the Windows Command Prompt or Mac Terminal.

Lesson objectives

After this lesson, we should

  • Understand the limitations of what we can do in the various spaces within Biowulf, including
    • Login node
    • Home directory
    • Data directory
    • Scratch space
  • Understand Unix directory path structure
  • Know how to get help with Unix commands
  • Be able to navigate directories and list directory contents

Note: do not store Personal Identifiable Information on Biowulf

Unix commands that we will visit in this lesson

  • pwd (to print present working directory)
  • ls (to list directory content)
  • cd (to change directory)

Overview of Biowulf environment

Biowulf user dashboard

A useful feature on the Biowulf website is the user dashboard. See Figure 1. For those using student accounts, please use the student dashboard.

Figure 1: The user dashboard on Biowulf provides useful information for the user's account.

Clicking on the User Dashboard tab will take you to an authentication page (Figure 2). Use your NIH credentials to log in.

Figure 2: Use your NIH credentials to sign into the user dashboard (even if you are using a student account).

Once logged in, we will be presented with our account information including group affiliations (Figure 3).

Figure 3: User account information on the Biowulf user dashboard.

Disk quota and usage information is also available in the user dashboard. Note that we can request a quota increase for our data directory (Figure 4) and that the home directory only has 16 gb of space.

Figure 4: User disk quota and usage shown in the Biowulf user dashboard.

We can also view information and status for the jobs that we have submitted (Figure 5).

Figure 5: Information and status for jobs submitted to Biowulf.

Connecting to Biowulf

To get started, open the Command Prompt (Windows) or the Terminal (Mac) and connect to Biowulf. Remember you need to be connected to the NIH network either by being on campus or through VPN. Recall from lesson 1 that you use the ssh command below to connect to Biowulf, where username is the username you use to sign in. Remember that when prompted to enter your password, you are not going to be able to see it, but keep typing.

ssh username@biowulf.nih.gov

Figure 6: Upon logging in, users will see a prompt where we will interact with Biowulf. The prompt tells us a couple of things that help orient us to where we are. First is what the user is connected to (Biowulf in this case as denoted by username@biowulf). Second, once logged in, we land in our home directory (denoted by ~).

Log in node

We land on the log in node once we connect to Biowulf.

"The log in node is your point of access to the Biowulf cluster" -- Biowulf accounts and log in node

The log in node is meant for the following (Source: Biowulf accounts and log in node)

  • Submitting jobs (main purpose)
  • Editing/compiling code
  • File management
  • File transfer
  • Brief testing of code or debugging (under 20 minutes)

There are many users signed on to the log in node at the same time, so do not perform anything that is compute intensive in this space. Request an interactive session or submit a job instead. We will talk about interactive sessions in another lesson.

Home directory

Recall from Figure 4 that users only have 16 gb (gigabytes) of storage space in their home directory. The home directory is your landing spot upon connecting to Biowulf. At the prompt (see Figure 6) it is denoted by "~" and the full directory path is /home/username. As an example, my username is wuz8 so the path to my home directory on Biowulf is /home/wuz8. The home directory does not have much storage space and users cannot request a quota increase for this directory; thus, do not store data or write analysis outputs to the home directory. See the quote below on what the home directory should be used for.

"Each user has a home directory called /home/username which is accessible from every HPC system. The /home area has a quota of 16 GB which cannot be increased. It is commonly used for config files (aka dotfiles), code, notes, executables, state files, and caches." -- Biowulf

If we use the pwd command, we can identify the present working directory that we are in.

pwd
/home/username

Data directory

The data directory is much larger and quota can be increased. The path to the data directory is /data/username. My username is wuz8, so when I do pwd, I should see /data/wuz8. We can use the data directory to store our analysis input and output.

pwd
/data/username

lscratch

In Biowulf, lscratch is local storage space available on individual nodes. This can be helpful and used for jobs that read or write a lot of temporary files. We will further discuss lscratch in a future lesson.

scratch

The scratch area is a shared storage space accessible to users for storing temporary files. The path to this is /username/scratch where username is the username you use to log into Biowulf. The path to my scratch directory is /wuz8/scratch where wuz8 is my NIH username. A word of caution is that files in scratch are deleted after 10 days. While each user can store up to 10 TB (terabyte) of data in scratch, it is not guaranteed that this amount will always be available. Finally, Biowulf staff will delete files if scratch becomes more than 80% full.

Snapshots

When working in Unix, we need to keep in mind that there is no Recycling Bin (Windows) or Trash can (Mac) that hold deleted items and allow us to recover it. Once we delete something in Unix, it is gone. Fortunately, Biowulf keeps snapshots, which are read-only copy of data at a certain time and we can use these to restore content that we deleted. See here for snapshots on Biowulf.

Unix directory path structure

Figure 7 shows an example of the file system hierarchy structure in Unix, which starts with the root folder (denoted by /). The root directory is the one where the other directories branch off from. In Figure 7, we see that the home and data directories branch off the root directory. As a matter of fact, if we do ls / in Biowulf we will see that the home and data directories are inside the root folder. In Figure 7, the data directory also contains a subfolder, P, which in turn has a folder for the project input (P_in) and a folder for the project output (P_out).

Figure 7: Example of file system hierarchy structure.

ls /
data 
home

Note that because the home and data directories are both branches of the root, the path to these will be /home and /data respectively. For the P_out folder, the path is /data/P/P_out. Any time we start a path from the root (or "/"), we call it an absolute path. Note that each section of a path is separated by "/". This differs from Windows where parts of a path are separated by "\".

"An absolute path is defined as specifying the location of a file or directory from the root directory (/). In other words,we can say that an absolute path is a complete path from start of actual file system from / directory." -- https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/

Getting help with Unix commands

Any time we are unsure of how a command works, we can print the manual for the command using the man command followed by the command we want to learn about.

For instance, if we do not know how to use pwd, then we can do the following and this prints out the pwd manual (Figure 8).

man pwd

Figure 8: The manual for the pwd command

To exit the manual page, hit q.

Changing directory

Recall that upon logging into Biowulf, you land in the home directory, which is limited to only 16 gb of storage space. Thus, you should work in your data folder. Recall that the path to the data folder is /data/username. To change into this directory, we will use a command called cd.

cd /data/username

In my case, since my username is wuz8, I can do the following.

cd /data/wuz8

Note that once we have changed into the data directory, the "~" (indicating home) is replaced by your username.

[username@biowulf username]

Now, use pwd to confirm that you are in your data directory.

pwd
/data/username

To go back to the home directory, we can do either of the following. But let's stay in the data folder though.

cd ~

or

cd

or

cd /home/username

Go back to your data directory and make a new folder called lesson_2.

cd /data/username
mkdir lesson_2

Next, change into lesson_2 from your data directory. Note that because you are already in a directory inside the root (ie. /data), we do not need to supply "/" when changing into lesson_2. In essence, we are providing a relative path.

"Relative path is defined as the path related to the present working directory(pwd). It starts at your current directory and never starts with a / ." -- https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/

cd lesson_2

Note that Unix uses "." to denote here in the present working directory and ".." to refer to one directory up. For instance, if you do the following command, we will just get the absolute path to our present working directory (same thing as just doing pwd).

pwd .
/data/wuz8/lesson_2

Let's make a directory in the lesson_2 folder called unix_on_biowulf_2023. We use mkdir to make a new directory.

mkdir unix_on_biowulf_2023

Change into the unix_on_biowulf_2023.

cd unix_on_biowulf_2023

To go back one directory to the lesson_2 folder, use the cd command.

cd ..

Listing directory contents

We use ls to list the contents of a directory. Staying in lesson_2 folder, we can use ls to see what is in it. It should be blank because we have not placed any files or folders in it.

ls

If you wanted to check the content of a folder other than the present working directory, but do not want to leave the directory you are currently in, you can provide a path to ls.

ls /data/classes/BTEP/unix_on_biowulf_2023_documents/
SRR1553606_fastqc  unix_on_biowulf_2023  unix_on_biowulf_2023.zip

Remember that you can include options in Unix commands, which will alter how the command runs. In the above, ls just spat out the contents of the data folder, but we do not see details regarding file size, date and time the file or folder was last modified, etc. To see more details we can append -l to ls.

ls -l /data/classes/BTEP/unix_on_biowulf_2023_documents/
total 56
drwxrwsrwx. 2 wuz8 GAU  4096 Jan  5 10:48 SRR1553606_fastqc
drwxrwsrwx. 2 wuz8 GAU  4096 Jan 12 17:46 unix_on_biowulf_2023
-rwxrwxrwx. 1 wuz8 GAU 41734 Jan  5 10:48 unix_on_biowulf_2023.zip

Earlier, we used the man command to view the manual for pwd. With ls, we can also append the --help option to pull up help documenations (Figure 9).

ls --help

Figure 9: Getting help with the ls command using the --help option.

Biowulf status

You can use the Biowulf status page to check for outages.