Lesson 2: Unix command structure, navigating Biowulf directories, and tools for data transfer
After this lesson, participants will
- Know how to get help with Unix commands
- Know the tools for transferring data from local computer to the cluster
- Be able to navigate the Unix file systems
- Be able to list directory content
- Be able to describe file and directory permissions as well as know how to modify them
Connecting to Biowulf
To get started, open the Command Prompt (Windows) or the Terminal (Mac) and connect to Biowulf. Remember you need to be connected to the NIH network either by being on campus or through VPN. Recall from lesson 1 that you use the ssh
command below to connect to Biowulf, where username is the student account ID that was assigned to you (see student assignments). Remember that when prompted to enter your password, you are not going to be able to see it, but keep typing.
ssh username@biowulf.nih.gov
Navigating the Biowulf directory structure
Unix file system hierarchy
Figure 1 shows an example hierarchy of Unix file system hierarchy. At the very top, there is the root folder and every subfolder branches of from this. The root folder is denoted as /
.
Figure 1: Example of file system hierarchy structure.
In Biowulf, the home and data folders stem from the root and this is evident by typing ls /
at the command line. The ls
command is used to list directory content.
data
home
Note
A file path that starts with the root or /
is known as an aboslute path. One that does not start with a root is called a relative path. For example, in Unix, .
is used to denote here in the present working directory and ..
is used to denote one directory back. Thus, a path that starts with .
or ..
is a relative path.
Recall that upon signing on to Biowulf, you will land in the home directory (/home/username
or ~
). Use pwd
to confirm the directory in which you are in.
pwd
This should return /home/username
. Again, replace username with the student account ID that was assigned to you.
To change into the data directory, use cd /data/username
(note the absolute path to the data folder was provided to the cd
command).
Make a new directory
Once in the data folder, use the mkdir
command to create a directory called lesson2.
mkdir lesson2
Then change into it. Because we are in the data folder already, we can just do cd lesson2
without providing the absolute path the directory. Note that cd ./lesson2
works as well where .
denotes here in the present working directory (ie. the data folder) but it this not needed. Parts of a Unix file path are separated by "/" or forward slash.
cd lesson2
To go back to the data folder, which is one directory up, just do cd ..
.
cd ..
Listing directory content
The ls
command is used to list directory content.
ls
lesson2
Make a new directory called lesson2a.
mkdir lesson2a
ls
lesson2
lesson2a
Tip
If there are many items in a directory, use the -1
option in ls
to list the items one line at a time.
To get a detailed view of directory content, use the -l
option with ls
.
ls -l
drwxr-x---. 2 wuz8 wuz8 4096 Jan 17 17:18 lesson2
drwxr-x---. 2 wuz8 wuz8 4096 Jan 17 17:24 lesson2a
Unix file and directory permissions
The column "drwr-x---" in the above results from ls -1
tells us the permission (ie. who can read - r, write - w, or execute - x contents of the file or directory), which is an important aspect of work in Unix systems like Biowulf. Figures 1 and 2 gives a breakdown of the information provided in the permission block.
Figure 1
Figure 2
The command for modifying permissions is chmod
. If we append --help
to chmod
, then we can see how to use it.
chmod --help
Usage: chmod [OPTION]... MODE[,MODE]... FILE...
or: chmod [OPTION]... OCTAL-MODE FILE...
or: chmod [OPTION]... --reference=RFILE FILE...
Change the mode of each FILE to MODE.
With --reference, change the mode of each FILE to that of RFILE.
-c, --changes like verbose but report only when a change is made
-f, --silent, --quiet suppress most error messages
-v, --verbose output a diagnostic for every file processed
--no-preserve-root do not treat '/' specially (the default)
--preserve-root fail to operate recursively on '/'
--reference=RFILE use RFILE's mode instead of MODE values
-R, --recursive change files and directories recursively
--help display this help and exit
--version output version information and exit
Each MODE is of the form
'[ugoa]*([-+=]([rwxXst]*|[ugo]))+|[-+=][0-7]+'.
GNU coreutils online help: <http://www.gnu.org/software/coreutils/>
For complete documentation, run: info coreutils 'chmod invocation'
Tip
The man
command can be used to pull up manuals for various Unix command. The --help
option may sometimes be shorten as -h
but these are command specific (ie. not every Unix command will provide help documentation using --help
and/or -h
).
To use chmod
, we need to be aware that
- u is user or owner
- g is group
- o is others
- "-" is used to remove a permission
- "+" is used to add a permission
- "=" sets permission
We can also numerically set permissions where
- 0: No permission
- 1: Execute permission
- 2: Write permission
- 3: Execute and write permission (1+2=3)
- 4: Read permission
- 5: Read and execute permission (1+4=5)
- 6: Read and write permission (2+4=6)
- 7: All permission (1+2+4=7)
For instance, to change the permission for the lesson2 folder to group writable do the following.
chmod g+w lesson2
drwxrwx---. 2 wuz8 wuz8 4096 Jan 17 17:18 lesson2
Tools for transferring data between local computer and Biowulf.
See the Biowulf guide for transferring data to and from the cluster for options.
Globus
The preferred method for transferring large data files (ie. FASTQ generated from high throughput sequencing) is Globus and instructions can be found at https://hpc.nih.gov/docs/globus/. Users will need to download a Globus desktop client and set up the appropriate end points for data transfer.
Helix
Definition
"Helix (helix.nih.gov) is the interactive data transfer and file management node for the NIH HPC Systems." -- Biowulf.
Tip
"Interactive Data Transfers should be performed on helix.nih.gov, the designated system for interactive data transfers and large-scale file manipulation. (An interactive session on a Biowulf compute node is also appropriate). Such processes should not be run on the Biowulf login node. For example, tarring and gzipping a large directory, or rsyncing data to another server, are examples of such interactive data transfer tasks" -- Biowulf.
To sign on to Helix, do
ssh username@helix.nih.gov
See https://bioinformatics.ccr.cancer.gov/docs/intro-to-bioinformatics-ss2023/Lesson4/HPCintro/ for useful tips on when to use Helix.
- Transferring >100 GB using scp
- gzipping a directory containing >5K files, or > 50 GB
- copying > 150 GB of data from one directory to another
- uploading or downloading data from the cloud
SCP
The scp
command can be used to securely copy files between local and Biowulf. For instance, the command below can used to secure copy the lesson2 folder from Biowulf to local. Note that a connection to Helix was used and that you will be prompted to enter your Biowulf password.
scp -r username@helix.nih.gov:/data/username/lesson2 .
To copy the lesson2 folder from local back to Biowulf do the following.
scp -r ./lesson2 username@helix.nih.gov:/data/username/lesson2