Lesson 1: Short introduction to Python, signing onto Biowulf, and starting Jupyter Lab
Learning objectives
After this lesson, participants will
- Be able to describe Python and provide rationale for using it
- Know how to start a Jupyter Lab session on Biowulf (Jupyter Lab will be used to interact with Python throughout this course)
- Be familiar with places for getting Python packages
- Become familiar with navigating the Jupyter Lab environment
- Be able to describe Python command syntax
- Know how to find help for Python commands
- Become familiar with continuing and self-learning resources
What is Python and why use it?
- Scripting language
- Facilitates reuse and reproducibility
- Can be used to analyze large datasets
- Extensive external packages that can be used for
- Data wrangling
- Data visualization
- Single cell RNA sequencing analysis
- Working with biological sequences
- Interfacing with bioinformatics databases
- Strong support community
- Easy to learn
Note
Python packages can be found at The Python Package Index.
Signing onto Biowulf
In this course series, participants will interact with Python through Jupyter Lab on Biowulf. Thus, the first step is to sign onto Biowulf using ssh
. Replace username with participant's own Biowulf username.
ssh username@biowulf.nih.gov
- Mac: use
ssh
through the Terminal - Windows: use
ssh
through the command prompt
Change into Biowulf data directory
Use cd
to change into the participant's data directory on Biowulf. Again, replace username with participant's Biowulf username.
cd /data/username
Request an interactive session
Request an interactive session using sinteractive
with the following options.
--gres=lscratch:5
: to allocate 5gb of local temporary/scratch storage space--mem=2gb
: to request 2gb of memory or RAM--tunnel
: to open up a channel of communication between local machine and Biowulf to allow interaction with applications like Jupyter Lab
sinteractive --gres=lscratch:5 --mem=2g --tunnel
After resources for the interactive session has been granted, users will see the information similar to that shown in Figure 1.
Figure 1: After interactive session resources have been allocated, users will see a ssh
command that looks like that enclosed in the red rectangle. Open a new terminal (if working on a Mac) or command prompt (if working on a Windows computer) and then copy and paste this ssh
command into the new terminal.
After copying and pasting the ssh
command shown in Figure 1 to a new terminal or command prompt, hit enter to supply password and log in to Biowulf. This will complete the tunnel.
Figure 2: Hit enter after copying and pasting the ssh
command to a new terminal to provide password and log into Biowulf. This will complete the tunnel.
Figure 3: In the ssh
command shown in Figure 1 and Figure 2, the numbers preceding and following "localhost" will differ depending on user. Also, the Biowulf username will differ for each user (wuz8 is the instructor's Biowulf username).
Load Jupyter
After the tunnel has been created, go back terminal (Mac) or command prompt (Windows) with the Biowulf interactive session and activate Jupyter (see Figure 4).
module load jupyter
Figure 4: Go back to the terminal (Mac) or command prompt (Windows) with the interactive session (look for cn#### at the prompt). Do module load jupyter
from here.
Start Jupyter Lab
Use the command below to start a Jupyter Lab session. Copy and paste either of the http links to a local browser to interact with Jupyter (see Figure 5).
jupyter lab --ip localhost --port $PORT1 --no-browser
Figure 5: Start a Jupyter lab session using jupyter lab --ip localhost --port $PORT1 --no-browser
and copy and paste either one of the http links to a local browser.
Warning
The URLs change with each Jupyter Lab session, so please do not copy from the examples shown below. Copy from the URLs provided in the Biowulf interactive session terminal instead.
Jupyter Lab - file explorer and launcher
- File explorer
- Launcher for starting language specific notebooks (for this course series, choose the python/3.10 notebook)
Jupyter Notebook - cells
Python education resources
- Coursera
- Programming for Everybody (Getting Started with Python)
- Instructor: Charles Severance, PhD (University of Michigan)
- Data Analysis with Python
- Instructor: IBM staff
- Includes data wrangling and regression analysis
- Data Visualization with Python
- Intructor: IBM staff
- Introduces data visualization using packages such as Matplotlib and Seaborn
- Programming for Everybody (Getting Started with Python)
- Dataquest
Visit the self learning resources page to request a Dataquest or Coursera license.
Python command syntax
The command syntax for Python is composed of the
- Command
- Argument, which is enclosed in the parentheses and what the command will act on
- Options, which is enclosed in parentheses and alters the way the command runs
command(argument, options)
Example of a Python command with and without options
print("Hello", "welcome to Python")
Hello welcome to Python
Include option sep
to place a comma between "Hello" and "welcome to Python".
print("Hello", "welcome to Python", sep=", ")
Hello, welcome to Python
Finding help for Python commands
The help
command can be used to view documentations for Python commands. It follows the Python command syntax. Insert the command in which help is needed into the parentheses.
help()
Example of using help
help(print)
Help on built-in function print in module builtins:
print(...)
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Copy class data to data directory
The example datasets used for this course series reside in /data/classes/BTEP/pies_2023_data
. Make a copy in your data
directory.
cp -r /data/classes/BTEP/pies_2023_data ./pies_2023