Lesson 1: Getting connected to Biowulf
Lesson objectives
After this lesson, we should be able to
- Describe the Unix operating system
- Describe Biowulf
- Connect onto Biowulf via local computer
Unix commands that we will visit in this lesson
ssh
(to connect to Biowulf)id
(to check user id and group affiliation)mkdir
(to create directories)
Overview of Unix
In Windows and MacOS, we interact with the computer through a graphical user interface (GUI). On the contrary, in Unix, we interact with the computer by typing commands.
Basic Unix command syntax
The Unix command syntax is composed of
- The command
- Option(s) that will alter how a command functions
- Argument(s), what you want the command to operate on
command options argument
For instance, to make a new folder in Unix, we use the command mkdir
. Here, we enter the command followed by the argument(s) that we want the command to operate on. In this case, the argument is the name of the folder that we would like to create. This is different from the graphical based approach that we use to create new folders in Windows or MacOS
mkdir new_folder
Above, we just learned our first Unix command, which is just one of many. Before moving further, we should clarify the rationale for using Unix. While there is a steep learning curve, once we have mastered working in Unix, we can perform many of our computing processes. Unix allows for easy file management, editing of text files, and allows us to view tabular data that is too large for Excel. Further, many of the applications used in bioinformatics are made to work in Unix.
Overview of Biowulf
Biowulf is the high performance and Unix-based computing system at NIH. Below are some rationale for using Biowulf.
- Biowulf offers more computing power and space for data storage compared to our local machine.
- Biowulf also houses many applications for bioinformatics, which are installed and updated by their staff.
- The GUI-based bioinformatics package, Partek Flow runs on Biowulf.
Visit https://hpc.nih.gov/docs/accounts.html to learn how to obtain a Biowulf account, which requires PI approval and costs $35 a month. Our Biowulf accounts will need to be renewed each year.
Figure 1 shows the hierarchical architecture of Biowulf. This is useful to know so that we know what we are asking for when requesting compute resources.
Figure 1: In Biowulf, many computers make up a cluster. Each individual computer or node has disk space for storage and random access memory (RAM) for running tasks. The individual computer is composed of processors, which are further divided into cores, and cores are divided into CPUs. In this example, the individual computer has 2 processors, 4 cores, and 8 CPUs.
Biowulf accounts
If you already have a Biowulf account, please use it for this course series. For those who do not have a Biowulf account, we have access to 30 student accounts.
Signing onto Biowulf
When working on Biowulf, we are working on a remote computer; thus, we need a way to connect to it. We can use Secure Shell Protocol (ssh) to connect to Biowulf. When connecting to Biowulf, we need to either be connected to the NIH network by being on campus or via VPN.
Signing onto Biowulf with a PC
For those using Windows 10 or newer, ssh is built into the command prompt (Figure 2 and Figure 3).
Figure 2: At the search box next to the Windows start menu, type cmd and click on the command prompt application.
Figure 3: When the command prompt opens, you can type ssh to confirm that it is available
Signing onto Biowulf with a Mac
The best way to sign onto Biowulf from a Mac is to use the built-in terminal (Figure 4). Use the Spot Light search at the Mac menu bar to search for the Terminal application. Click on it to open the Terminal.
Figure 4: Use the Mac Spot Light search to find the Terminal.
Connect to Biowulf
Remember that if you are not on campus, then you need to connect to the NIH network through VPN. Regardless whether you are using the Windows Command Prompt or Mac Terminal, the construct for ssh to connect to Biowulf is (see Figure 5).
The username in the ssh command is either
- your NIH username if you are using your own Biowulf account for this course OR
- one of the student accounts
ssh username@biowulf.nih.gov
For first time users, when connecting you may see the message below. Respond with yes.
The authenticity of host 'biowulf.nih.gov (128.231.2.9)' can't be established. ECDSA key fingerprint is SHA256:BoP/KLS17g+gUuQ7mrCHa9oPPO+MHi/h8WML44iA1dw. Are you sure you want to continue connecting (yes/no)? yes
Next, you will see a message warning you that you are accessing a government computer system and that you should not do anything suspicious. At the end of the message, you will be asked to enter your password, which is either your NIH password (if you are using your own Biowulf account) or the password for the student accounts. The cursor will not move and nothing will be displayed when entering your password, but keep typing.
Warning: Permanently added 'biowulf.nih.gov' (ED25519) to the
list of known hosts.
***WARNING***
You are accessing a U.S. Government information system, which
includes (1) this computer, (2) this computer network, (3) all
computers connected to this network, and (4) all devices
and storage media attached to this network or to a computer on
this network. This information system is provided for U.S.
Government-authorized use only.
Unauthorized or improper use of this system may result in
disciplinary action, as well as civil and criminal penalties.
By using this information system, you understand and consent to the
following:
* You have no reasonable expectation of privacy regarding any
communications or data transiting or stored on this information
system. At any time, and for any lawful Government purpose,
the government may monitor, intercept, record, and search and
seize any communication or data transiting or stored on this
information system.
* Any communication or data transiting or stored on this information
system may be disclosed or used for any lawful Government purpose.
--
Notice to users: This system is rebooted for patches and
maintenance on the first Sunday of every month at 8:00 pm unless
Monday is a holiday, in which case it is rebooted the following
Sunday evening at 8:00 pm. Running cluster jobs are not
affected by the monthly reboot.
username@biowulf.nih.gov's password:
You will be taken to the prompt after successfully entering your password (see below). It is at the prompt where we type commands and interact with Biowulf.
[username8@biowulf ~]$
The id
command informs groups that the user might be affiliated with. This is important when collaborating with others Biowulf such that our affiliation with groups will indicate that we have access to the data.
id
Running the id
command we see my user id (uid) and primary group id (gid). We also see that I am a part of the GAU and LCP_Omics groups.
uid=58740(wuz8) gid=58740(wuz8) groups=58740(wuz8),57888(GAU)
Lesson wrap up
In this lesson, we were presented with a high-level overview of Unix and why it is used in bioinformatics. We also learned about the NIH high performance computing system (Biowulf), which runs Unix and why it would be useful to work in this environment for bioinformatics. Finally, we learned how to connect to Biowulf from our local computers.
Even though this was the first lesson, we already learned two Unix commands.
mkdir
, which is used to make a new directoryid
, which tells users their group affiliation with in a high performance compute system
We also learned the ssh
command, which is used to connect to Biowulf either from the Windows Command Prompt or Mac Terminal.