Lesson 2: Overview of Biowulf environment and navigating Unix file systems
Quick review
In lesson 1, we saw an overview of the course series and learned of the rationale for using Biowulf. Importantly, we learned to connect to our Biowulf accounts from our local machine using the ssh
command found in the Windows Command Prompt or Mac Terminal.
Lesson objectives
After this lesson, we should
- Understand the limitations of what we can do in the various spaces within Biowulf, including
- Login node
- Home directory
- Data directory
- Scratch space
- Understand Unix directory path structure
- Know how to get help with Unix commands
- Be able to navigate directories and list directory contents
Note: do not store Personal Identifiable Information on Biowulf
Unix commands that we will visit in this lesson
pwd
(to print present working directory)ls
(to list directory content)cd
(to change directory)
Overview of Biowulf environment
Biowulf user dashboard
A useful feature on the Biowulf website is the user dashboard. See Figure 1. For those using student accounts, please use the student dashboard.
Figure 1: The user dashboard on Biowulf provides useful information for the user's account.
Clicking on the User Dashboard tab will take you to an authentication page (Figure 2). Use your NIH credentials to log in.
Figure 2: Use your NIH credentials to sign into the user dashboard (even if you are using a student account).
Once logged in, we will be presented with our account information including group affiliations (Figure 3).
Figure 3: User account information on the Biowulf user dashboard.
Disk quota and usage information is also available in the user dashboard. Note that we can request a quota increase for our data directory (Figure 4) and that the home directory only has 16 gb of space.
Figure 4: User disk quota and usage shown in the Biowulf user dashboard.
We can also view information and status for the jobs that we have submitted (Figure 5).
Figure 5: Information and status for jobs submitted to Biowulf.
Connecting to Biowulf
To get started, open the Command Prompt (Windows) or the Terminal (Mac) and connect to Biowulf. Remember you need to be connected to the NIH network either by being on campus or through VPN. Recall from lesson 1 that you use the ssh
command below to connect to Biowulf, where username is the username you use to sign in. Remember that when prompted to enter your password, you are not going to be able to see it, but keep typing.
ssh username@biowulf.nih.gov
Figure 6: Upon logging in, users will see a prompt where we will interact with Biowulf. The prompt tells us a couple of things that help orient us to where we are. First is what the user is connected to (Biowulf in this case as denoted by username@biowulf). Second, once logged in, we land in our home directory (denoted by ~).
Log in node
We land on the log in node once we connect to Biowulf.
"The log in node is your point of access to the Biowulf cluster" -- Biowulf accounts and log in node
The log in node is meant for the following (Source: Biowulf accounts and log in node)
- Submitting jobs (main purpose)
- Editing/compiling code
- File management
- File transfer
- Brief testing of code or debugging (under 20 minutes)
There are many users signed on to the log in node at the same time, so do not perform anything that is compute intensive in this space. Request an interactive session or submit a job instead. We will talk about interactive sessions in another lesson.
Home directory
Recall from Figure 4 that users only have 16 gb (gigabytes) of storage space in their home directory. The home directory is your landing spot upon connecting to Biowulf. At the prompt (see Figure 6) it is denoted by "~" and the full directory path is /home/username. As an example, my username is wuz8 so the path to my home directory on Biowulf is /home/wuz8. The home directory does not have much storage space and users cannot request a quota increase for this directory; thus, do not store data or write analysis outputs to the home directory. See the quote below on what the home directory should be used for.
"Each user has a home directory called /home/username which is accessible from every HPC system. The /home area has a quota of 16 GB which cannot be increased. It is commonly used for config files (aka dotfiles), code, notes, executables, state files, and caches." -- Biowulf
If we use the pwd
command, we can identify the present working directory that we are in.
pwd
/home/username
Data directory
The data directory is much larger and quota can be increased. The path to the data directory is /data/username. My username is wuz8, so when I do pwd
, I should see /data/wuz8. We can use the data directory to store our analysis input and output.
pwd
/data/username
lscratch
In Biowulf, lscratch is local storage space available on individual nodes. This can be helpful and used for jobs that read or write a lot of temporary files. We will further discuss lscratch in a future lesson.
scratch
The scratch area is a shared storage space accessible to users for storing temporary files. The path to this is /username/scratch where username is the username you use to log into Biowulf. The path to my scratch directory is /wuz8/scratch where wuz8 is my NIH username. A word of caution is that files in scratch are deleted after 10 days. While each user can store up to 10 TB (terabyte) of data in scratch, it is not guaranteed that this amount will always be available. Finally, Biowulf staff will delete files if scratch becomes more than 80% full.
Snapshots
When working in Unix, we need to keep in mind that there is no Recycling Bin (Windows) or Trash can (Mac) that hold deleted items and allow us to recover it. Once we delete something in Unix, it is gone. Fortunately, Biowulf keeps snapshots, which are read-only copy of data at a certain time and we can use these to restore content that we deleted. See here for snapshots on Biowulf.
Navigating directories, creating and removing directories, and getting help
Unix directory path structure
Figure 7 shows an example of the file system hierarchy structure in Unix, which starts with the root folder (denoted by /). The root directory is the one where the other directories branch off from. In Figure 7, we see that the home and data directories branch off the root directory. As a matter of fact, if we do ls /
in Biowulf we will see that the home and data directories are inside the root folder. In Figure 7, the data directory also contains a subfolder, P, which in turn has a folder for the project input (P_in) and a folder for the project output (P_out).
Figure 7: Example of file system hierarchy structure.
ls /
data
home
Note that because the home and data directories are both branches of the root, the path to these will be /home and /data respectively. For the P_out folder, the path is /data/P/P_out. Any time we start a path from the root (or "/"), we call it an absolute path. Note that each section of a path is separated by "/". This differs from Windows where parts of a path are separated by "\".
"An absolute path is defined as specifying the location of a file or directory from the root directory (/). In other words,we can say that an absolute path is a complete path from start of actual file system from / directory." -- https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/
Getting help with Unix commands
Any time we are unsure of how a command works, we can print the manual for the command using the man
command followed by the command we want to learn about.
For instance, if we do not know how to use pwd
, then we can do the following and this prints out the pwd
manual (Figure 8).
man pwd
Figure 8: The manual for the pwd
command
To exit the manual page, hit q.
Changing directory
Recall that upon logging into Biowulf, you land in the home directory, which is limited to only 16 gb of storage space. Thus, you should work in your data folder. Recall that the path to the data folder is /data/username. To change into this directory, we will use a command called cd
.
cd /data/username
In my case, since my username is wuz8, I can do the following.
cd /data/wuz8
Note that once we have changed into the data directory, the "~" (indicating home) is replaced by your username.
[username@biowulf username]
Now, use pwd
to confirm that you are in your data directory.
pwd
/data/username
To go back to the home directory, we can do either of the following. But let's stay in the data folder though.
cd ~
or
cd
or
cd /home/username
Go back to your data directory and make a new folder called lesson_2.
cd /data/username
mkdir lesson_2
Next, change into lesson_2 from your data directory. Note that because you are already in a directory inside the root (ie. /data), we do not need to supply "/" when changing into lesson_2. In essence, we are providing a relative path.
"Relative path is defined as the path related to the present working directory(pwd). It starts at your current directory and never starts with a / ." -- https://www.geeksforgeeks.org/absolute-relative-pathnames-unix/
cd lesson_2
Note that Unix uses "." to denote here in the present working directory and ".." to refer to one directory up. For instance, if you do the following command, we will just get the absolute path to our present working directory (same thing as just doing pwd
).
pwd .
/data/wuz8/lesson_2
Let's make a directory in the lesson_2 folder called unix_on_biowulf_2023. We use mkdir
to make a new directory.
mkdir unix_on_biowulf_2023
Change into the unix_on_biowulf_2023.
cd unix_on_biowulf_2023
To go back one directory to the lesson_2 folder, use the cd
command.
cd ..
Listing directory contents
We use ls
to list the contents of a directory. Staying in lesson_2 folder, we can use ls
to see what is in it. It should be blank because we have not placed any files or folders in it.
ls
If you wanted to check the content of a folder other than the present working directory, but do not want to leave the directory you are currently in, you can provide a path to ls
.
ls /data/classes/BTEP/unix_on_biowulf_2023_documents/
SRR1553606_fastqc unix_on_biowulf_2023 unix_on_biowulf_2023.zip
Remember that you can include options in Unix commands, which will alter how the command runs. In the above, ls
just spat out the contents of the data folder, but we do not see details regarding file size, date and time the file or folder was last modified, etc. To see more details we can append -l
to ls
.
ls -l /data/classes/BTEP/unix_on_biowulf_2023_documents/
total 56
drwxrwsrwx. 2 wuz8 GAU 4096 Jan 5 10:48 SRR1553606_fastqc
drwxrwsrwx. 2 wuz8 GAU 4096 Jan 12 17:46 unix_on_biowulf_2023
-rwxrwxrwx. 1 wuz8 GAU 41734 Jan 5 10:48 unix_on_biowulf_2023.zip
Earlier, we used the man
command to view the manual for pwd
. With ls
, we can also append the --help
option to pull up help documenations (Figure 9).
ls --help
Figure 9: Getting help with the ls command using the --help option.
Biowulf status
You can use the Biowulf status page to check for outages.