Lesson 2: Navigating file systems with Unix

Lesson 1 Review

Biowulf is the high performance computing cluster at NIH.
When you apply for a Biowulf account you will be issued two primary storage spaces: 1) /home/$User and 2) /data/$USER, with 16 GB and 100 GB of default disk space.
Hundreds of pre-installed bioinformatics programs are available through the module system.
Computational tasks on Biowulf should be submitted as a job (sbatch, swarm) or through an interactive session sinteractive.
Connect to Biowulf using HPC OnDemand or ssh.
Do not run computational tasks on the login node.

Lesson Objectives

In lesson 1, you learned about the NIH HPC cluster, Biowulf. Biowulf nodes use a Unix-like (Linux) operating system (distributions RHEL8/Rocky8), which requires knowledge and use of the command line interface (shell) to direct computational functionality. The purpose of today's lesson is to get you familiar with working on the command line. To this end, we will...

Learn the basic structure of a unix command.
Learn how to navigate your file system, including absolute vs relative directories.
Learn unix commands related to navigating directories, creating and removing files or directories, and getting help.

A word about mistakes

YOU WILL MAKE MISTAKES...but, it is okay. We all make mistakes, and mistakes are how we learn. Remember, existing safeguards make it nearly impossible for individual Biowulf users to irreparably mess up the system for others. However, you can make your life difficult, for example, by misusing commands, ignoring existing tools, overwriting files, failing to redirect or output results, disregarding warnings, and not seeking help.

How can we overcome mistakes?

We practice. The more you use unix and bash scripting the better you will become.

You will need to learn how to troubleshoot error messages. Often this will involve googling the error in reference to the entered command. There are many forums that post help regarding specific errors (e.g., stack overflow, program repositories such as github).

File system

We manage files and directories through the operating system's file system. A directory is synonymous with a "folder", which is used to organize files, other directories, executables, etc.

On a Windows or Mac, we usually open and scroll through our directories and files using a GUI. For example, Finder is the default file management GUI from which we can access files or deploy programs on a macbook.

Finder Example

This same file system can be accessed and navigated via command line from the unix shell.

Some useful unix commands to navigate our file system and tell us some things about our files

pwd (print working directory)
ls (list)
touch (creates an empty file)
nano (basic editor for creating small text files)
using the rm command to remove files. Be careful!
mkdir (make a directory) and rmdir (remove a directory, must be empty of all files)
cd (change directory), by itself will take you home, cd .. (will take you up one directory), cd /results_dir/exp1 (go directly to this directory)
mv (for renaming files or moving files)
less (for viewing files, or more)
man (for viewing the man pages when you need help on a command)
cp (copy) for copying files

Getting Started

We have already seen some unix commands relevant to Biowulf. For example, we learned about ssh. The ssh command is used to securely log into a remote machine and execute commands on that machine.

ssh is the command and username@biowulf.nih.gov is a command line argument, where username is the username that you wish to connect to on the remote system and biowulf.nih.gov is the hostname of the remote machine. For this lesson and the lessons that follow, we will use NIH HPC student accounts to connect to Biowulf.

More about student accounts

At the beginning of each class you must sign up for a student account. You can sign up for a student account using a Google spreadsheet, the link for which will be supplied at the beginning of each class via Webex. Click on the supplied link, and find an empty slot under the "Name" column. Type your name in the empty slot. The username under "Account Username" will now serve as your username for logging in to Biowulf.

This task will be repeated at the beginning of each lesson to allow students the option of flexible attendance.

Let's go ahead and get connected. Open a terminal and type the following:

ssh username@biowulf.nih.gov

username = NIH/Biowulf login username. Remember to use the student account username here.

Note

If this is your first time logging into Biowulf, you will see a warning statement with a yes/no choice. Type "yes".

Type in your password at the prompt. The cursor will not move as you type your password!

Success

We will connect to Biowulf at the beginning of each session.

We are now on the login node. Remember, you should not do work on the login node. However, you can do basic file management on the login node and edit and compile code. For now, we will stay on the login node.

Our Second Unix Command (`ls`)

Let's continue learning about the structure of linux commands using another common command, ls. The ls command "lists" the contents of the directory you are in. You may see files and other directories here.

ls

At this point, you are in your home directory, and so you will see whatever files and directories are located here. For example, if I had logged in to my Biowulf account, I would see the following:

Desktop  R  bin  ncbi_error_report.txt

However, since this is a student account, I do not see anything, as I have not yet added any files.

How can you tell the difference between a file and a directory?

We can add some additional options (flags) to our command.

ls -lh

will show permissions and indicate directories (d). The -lh are flags. -l refers to listing in long format, while -h provides human readable file sizes.

Or, many systems offset directories and files using colors (e.g., blue for directories). If you don't see colorized output, try the -G flag.

We can also label output by adding a marker to indicate files, directories, links, and executables using the -F flag.

ls -F

a terminal / = directory
a @ = link
a * = executable

Anatomy of a command

Using ls as an example, we can get an idea of the overall structure of a unix command.

Image inspired by *Learn Enough Command Line to Be Dangerous*
Image inspired by "Learn Enough Command Line to Be Dangerous"

The first thing we see is the command line prompt, usually $ or %, which will vary by operating system. The prompt let's us know that the computer is waiting for a command. Next we see the actual command, in this case, ls, telling the computer to list the files and directories. Most commands will have various options / flags that can be included to modify the command function. We can also supply an argument, which in the case of ls is optional. For example, here we supplied an alternative directory from which we are interested in listing files and directories. We hit enter or return after each command, and when the command has finished running, the command prompt will reappear prompting us to enter more commands.

Where am I? (`pwd`)

pwd

pwd stands for "print working directory". When you run this, you should see something like this.

/home/username

where username is your name or student account. This is your home directory - where you start from when you open a terminal. This is an example of a "path". The path tells us the location of a file or directory.

Note

While Windows computers use a \ as a path separator, unix systems use a /.

Therefore, the pwd command is very helpful for figuring out where you are in the directory structure. If you are getting "file not found" errors while trying to run something, it is a good idea to pwd and see if you are where you think you are. Type the pwd command and make a note of the directory you are in.

More on the home directory (`~`)

We see that we are in our home directory. But where is that exactly?

The file system on any computer is hierarchical. On a Unix system, the top level of the file system, or root directory, is denoted by /. All subdirectories on the file system branch from this root directory. See the below example.

Example of file system hierarchy structure.

In our example hierarchy, we have subdirectories /home and /data, and within data, we see additionally subdirectories, P_in and P_out. Only the first / denoted a directory (root). All other /s in the path serve as separating characters.

Absolute vs Relative directories

A file path that starts with the root or / is known as an absolute path. A path that does not start with the root directory is called a relative path. For example, in Unix, . is used to denote here in the present working directory and .. is used to denote one directory back. Thus, a path that starts with . or ..is a relative path. Going back to pwd. The output (/home/username) is an absolute file path. Absolute file paths will break scripts when collaborating because the likelihood that your file system matches another's is low.

We can use the tree command to get an idea of the structure of our home directory on Biowulf.

Getting around the Unix directory structure (`cd`)

How do we navigate this directory tree. We use cd, which means "change directory". Let's change directory to our data directory, which is the larger of the two allocations we are allotted on Biowulf.

cd /data/$USER # (1) change to your data directory
pwd #print working directory  
ls #list the contents of /data/$USER

$USER and other environment variables

$USER is an example of an environment variable.

Environment variables contain user-specific or system-wide values that either reflect simple pieces of information (your username), or lists of useful locations on the file system. --- Griffith Lab

We can display these variables using echo.

echo $USER  
echo $HOME

$PATH is an important environment variable.

echo $PATH

This results in a colon separated list of directories containing programs that you can run without specifying those directories each time you run the program.

You will likely need to add to your $PATH at some point in the future.

To do this use:

export PATH=$PATH:/path/to/folder

This change will not remain when you close the terminal. To permanently add a location to your path, add the above line to your bash shell configuration file, ~/.bashrc.

By itself, the cd command takes you home. Let's try that, and then do a pwd to see where we are.

cd
pwd

We are back in our home directory.

/home/username

How can we go back to the /data/$USER directory? We need to give the "path" to that directory.

cd /data/$USER
pwd
ls

Check where you are with pwd and look at the contents of the directory with ls. What do you see?

Home shortcut

We can also use ~ as a shortcut for our home directory.

ls ~

Once we create more files and directories, we can learn a bit more about the directory structure and absolute vs relative file paths.

Creating files (touch)

The touch command creates a file, but the file is empty, so it is not a command you will use very often, but good to know about.

touch file1.txt
touch file2.txt
ls

Now we see something like this.

file1.txt  file2.txt

The `nano` editor is a text editor useful for small files.

nano file2.txt

Let's put something in this file.

Unix is an operating system, just like Windows or MacOS. 
Linux is a Unix like operating system; 
sometimes the names are used interchangeably.

Nano commands for saving your file and exiting nano:

control O - write file (equivalent to save as)
File name to write: file2.txt (Hit return/enter on your keyboard to save the file with this name).
control X - to Exit

This brings us to our next topic which is very important!

Avoid spaces in file names and directories.

There should not be spaces in Unix file names or directories. There are many strategies that can be used to avoid spaces in file names. Though, consistency is key. One good method is using snake_case, in which words are separated by an _.

For example, we can use the underscore (_) where a space would go, like this, to name a directory for module 1.

Module_1

To use snake_case with file names, we may see something like this:

brain_rna.fastq
liver_rna.fastq

The first part of the file name provides info about the file, and the extension (.fastq) tells what kind of file it is. (Examples of file extensions are .csv, .txt, .fastq, .fasta and many more.)

Removing files with `rm`

Warning

A Unix system will delete something when you ask to delete it and there is usually no way of getting it back. Be extremely careful when removing files and directories.

By adding the -i option, the system will ask if you're sure you want to delete. Generally speaking, when a file on a Unix system is deleted, it is gone.

rm -i file1.txt

will remove a file we created.

Creating directories (`mkdir`)

A couple things to note - this is a good time to give your directories meaningful names, which will help you keep things organized. Organization is key. I generally like to have a new directory per project, and within that directory, subdirectories separating raw data from analysis files. From there, each analysis would also get its own subdirectory. However, there are many ways to organize files and you should do whatever makes sense for your data and helps you (and others) stay organized.

Keep raw data raw

Always keep your raw data raw, and save outputs to new files. Do not overwrite raw data! Consider setting the permissions on these files to "read only". More on permissions later.

For now, let's create a directory called Module_1, where we can store Module 1 lesson content. To create a directory, we use mkdir.

mkdir Module_1

Removing directories (`rmdir`)

Directories must be completely empty of all files and other contents before you can delete them with rmdir. There are ways to "recursively" remove files and directories using the -r option of rm, for example (rm -r directory). This would remove all of the files and subdirectories in our hypothetical directory. Keep in mind that once these files are deleted they are gone for good. Be extremely careful with the -r option. As beginners it can be safer to navigate to the directory and remove content directly.

Let's take a quick second to apply some of the things we have learned and create more content to work with.

Navigate to the directory we just made (Module_1), and make another directory within it called directory_to_delete.

cd Module_1 #change directory
pwd #print working directory (wd)
mkdir directory_to_delete #make a new directory
ls #list the contents of wd

Now let's move to directory_to_delete and create a file, myseq.txt.

cd directory_to_delete #change directory
touch myseq.txt #Create a new file

Let's check the contents of our directory and see where we are located in our directory tree.

ls
pwd

To summarize what we have done:

We've moved to the Module_1 directory, checked our directory with pwd, created a directory called directory_to_delete, and listed the contents of Module_1, so we can see the directory we just created. We then navigated to directory_to_delete using cd, created a file with touch, listed the contents with ls, and printed our working directory (pwd).

Let's move up one directory back to Module_1.

cd ..

Test Your Knowledge: Question 1

How could you move back to your home directory?

Answer Question 1

cd
pwd

/home/username

Test Your Knowledge: Question 2

If you changed to your home directory, how can you return to directory_to_delete?

Answer Question 2

We need to give the "path" to that directory.

cd /data/$USER/Module_1/directory_to_delete
pwd
ls

Getting back to removing directories (`rmdir`)

Now that we have created some directories, let's use rmdir to remove one.

What do you see when you try to remove this directory?

rmdir directory_to_delete

What should we do? We need to remove the contents of a directory before we can remove the directory. Here's one safe option.

cd directory_to_delete
ls
rm myseq.txt
ls
cd ..
ls
rmdir directory_to_delete

Moving and renaming files and directories, all with one command (`mv`)

The mv command is a handy way to rename files if you've created them with a typo or decide to use a more descriptive name. For example:

cd ..
mv file2.txt README.txt
ls

We can also move this file into a different location using the same command.

mv README.txt Module_1
cd Module_1
ls

Be careful when moving files, a mistake in the command can yield unexpected results. The -i interactive option will help keep you safe.

For example:

mkdir dir1 
touch dir1/hello.txt
touch hello.txt
mv -i dir1/hello.txt hello.txt

Copying files (`cp`)

This is similar to mv but will create an actual copy of a file. You will need to specify what you are copying (the source) and where you want to make the copy (the target).

For example, let's copy a file from the BTEP teaching materials to Module_1.

cp  /data/classes/BTEP/B4B_2025/Module_1/sample.fast* .

Remember, the . is a relative path shortcut denoting our current directory, so we are copying this into our current working directory.

* is a wildcard, matching zero or more characters including spaces. We are using this to copy two files that differ in the last letter of the file extension. More on these in Lesson 3.

We can also copy an entire directory using the recursive flag (cp -r). For example, let's copy the directory Practice_Sessions from the BTEP teaching materials to our current directory.

cp -r /data/classes/BTEP/B4B_2025/Module_1/Practice_Sessions .

Viewing file content

There are several ways to view files. We can use the less command to view the contents of a file like this.

less sample.fasta

You'll need to type q to get out of less and back to the command line. Before the less command was available, the more command was commonly used to look at file content. The less command has more options for scrolling through files, so it is now the preferred command.

Another command for reading files is cat, but this will print the contents in their entirety.

Help! (`man`)

Lastly, all Unix commands have a man or "manual" page that describes how to use them. If you need help remembering how to use the command ls, you would type:

man ls

To exit man, again use q.

There are quite a few flags/options that we can use with the ls command, and we can learn all about them on the man page. My favorite flags for ls are -l and -h. We will use flags often, and you won't get far in Unix without knowing about them. Try this:

cd
ls -lh

We have already seen these flags, but as a reminder...
-h when used with the -l option, use unit suffixes (Byte, Kilobyte, Megabyte, Gigabyte, Terabyte and Petabyte) in order to reduce the number of digits to three or less using base 2 for sizes.

-l (The lowercase letter "ell".) List in long format. (See below). If the output is to a terminal, a total sum for all the file sizes is output on a line before the long listing.

Additional Resources

Software Carpentry: The Unix Shell

Help Session

Practice navigating the file system and creating files. Instructions are here.

Lesson 2: Navigating file systems with Unix

Lesson 1 Review

Lesson Objectives

A word about mistakes

How can we overcome mistakes?

File system

Some useful unix commands to navigate our file system and tell us some things about our files

Getting Started

Our Second Unix Command (ls)

Anatomy of a command

Where am I? (pwd)

More on the home directory (~)

Getting around the Unix directory structure (cd)

Creating files (touch)

The nano editor is a text editor useful for small files.