Skip to content

Lesson 2: Unix command structure, navigating Biowulf directories, and tools for data transfer

After this lesson, participants will

  • Know how to get help with Unix commands
  • Know the tools for transferring data from local computer to the cluster
  • Be able to navigate the Unix file systems
  • Be able to list directory content
  • Be able to describe file and directory permissions as well as know how to modify them

Connecting to Biowulf

To get started, open the Command Prompt (Windows) or the Terminal (Mac) and connect to Biowulf. Remember you need to be connected to the NIH network either by being on campus or through VPN. Recall from lesson 1 that you use the ssh command below to connect to Biowulf, where username is the student account ID that was assigned to you (see student assignments). Remember that when prompted to enter your password, you are not going to be able to see it, but keep typing.

ssh username@biowulf.nih.gov

Unix file system hierarchy

Figure 1 shows an example hierarchy of Unix file system hierarchy. At the very top, there is the root folder and every subfolder branches of from this. The root folder is denoted as /.

Figure 1: Example of file system hierarchy structure.

In Biowulf, the home and data folders stem from the root and this is evident by typing ls / at the command line. The ls command is used to list directory content.

data
home

Note

A file path that starts with the root or / is known as an aboslute path. One that does not start with a root is called a relative path. For example, in Unix, . is used to denote here in the present working directory and .. is used to denote one directory back. Thus, a path that starts with . or .. is a relative path.

Recall that upon signing on to Biowulf, you will land in the home directory (/home/username or ~). Use pwd to confirm the directory in which you are in.

pwd

This should return /home/username. Again, replace username with the student account ID that was assigned to you.

To change into the data directory, use cd /data/username (note the absolute path to the data folder was provided to the cd command).

Make a new directory

Once in the data folder, use the mkdir command to create a directory called lesson2.

mkdir lesson2

Then change into it. Because we are in the data folder already, we can just do cd lesson2 without providing the absolute path the directory. Note that cd ./lesson2 works as well where . denotes here in the present working directory (ie. the data folder) but it this not needed. Parts of a Unix file path are separated by "/" or forward slash.

cd lesson2

To go back to the data folder, which is one directory up, just do cd ...

cd ..

Listing directory content

The ls command is used to list directory content.

ls
lesson2

Make a new directory called lesson2a.

mkdir lesson2a
ls
lesson2
lesson2a

Tip

If there are many items in a directory, use the -1 option in ls to list the items one line at a time.

To get a detailed view of directory content, use the -l option with ls.

ls -l
drwxr-x---. 2 wuz8     wuz8            4096 Jan 17 17:18 lesson2
drwxr-x---. 2 wuz8     wuz8            4096 Jan 17 17:24 lesson2a

Unix file and directory permissions

The column "drwr-x---" in the above results from ls -1 tells us the permission (ie. who can read - r, write - w, or execute - x contents of the file or directory), which is an important aspect of work in Unix systems like Biowulf. Figures 1 and 2 gives a breakdown of the information provided in the permission block.

Figure 1

Figure 2

The command for modifying permissions is chmod. If we append --help to chmod, then we can see how to use it.

chmod --help
Usage: chmod [OPTION]... MODE[,MODE]... FILE...
  or:  chmod [OPTION]... OCTAL-MODE FILE...
  or:  chmod [OPTION]... --reference=RFILE FILE...
Change the mode of each FILE to MODE.
With --reference, change the mode of each FILE to that of RFILE.

  -c, --changes   like verbose but report only when a change is made
  -f, --silent, --quiet  suppress most error messages
  -v, --verbose          output a diagnostic for every file processed
      --no-preserve-root  do not treat '/' specially (the default)
      --preserve-root    fail to operate recursively on '/'
      --reference=RFILE  use RFILE's mode instead of MODE values
  -R, --recursive        change files and directories recursively
      --help     display this help and exit
      --version  output version information and exit

Each MODE is of the form 
'[ugoa]*([-+=]([rwxXst]*|[ugo]))+|[-+=][0-7]+'.

GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
For complete documentation, run: info coreutils 'chmod invocation'

Tip

The man command can be used to pull up manuals for various Unix command. The --help option may sometimes be shorten as -h but these are command specific (ie. not every Unix command will provide help documentation using --help and/or -h).

To use chmod, we need to be aware that

  • u is user or owner
  • g is group
  • o is others
  • "-" is used to remove a permission
  • "+" is used to add a permission
  • "=" sets permission

We can also numerically set permissions where

  • 0: No permission
  • 1: Execute permission
  • 2: Write permission
  • 3: Execute and write permission (1+2=3)
  • 4: Read permission
  • 5: Read and execute permission (1+4=5)
  • 6: Read and write permission (2+4=6)
  • 7: All permission (1+2+4=7)

For instance, to change the permission for the lesson2 folder to group writable do the following.

chmod g+w lesson2
drwxrwx---. 2 wuz8     wuz8            4096 Jan 17 17:18 lesson2

Tools for transferring data between local computer and Biowulf.

See the Biowulf guide for transferring data to and from the cluster for options.

Globus

The preferred method for transferring large data files (ie. FASTQ generated from high throughput sequencing) is Globus and instructions can be found at https://hpc.nih.gov/docs/globus/. Users will need to download a Globus desktop client and set up the appropriate end points for data transfer.

Helix

Definition

"Helix (helix.nih.gov) is the interactive data transfer and file management node for the NIH HPC Systems." -- Biowulf.

Tip

"Interactive Data Transfers should be performed on helix.nih.gov, the designated system for interactive data transfers and large-scale file manipulation. (An interactive session on a Biowulf compute node is also appropriate). Such processes should not be run on the Biowulf login node. For example, tarring and gzipping a large directory, or rsyncing data to another server, are examples of such interactive data transfer tasks" -- Biowulf.

To sign on to Helix, do

ssh username@helix.nih.gov

See https://bioinformatics.ccr.cancer.gov/docs/intro-to-bioinformatics-ss2023/Lesson4/HPCintro/ for useful tips on when to use Helix.

  • Transferring >100 GB using scp
  • gzipping a directory containing >5K files, or > 50 GB
  • copying > 150 GB of data from one directory to another
  • uploading or downloading data from the cloud

SCP

The scp command can be used to securely copy files between local and Biowulf. For instance, the command below can used to secure copy the lesson2 folder from Biowulf to local. Note that a connection to Helix was used and that you will be prompted to enter your Biowulf password.

scp -r username@helix.nih.gov:/data/username/lesson2 .

To copy the lesson2 folder from local back to Biowulf do the following.

scp -r ./lesson2 username@helix.nih.gov:/data/username/lesson2