Pies lesson1 slides
Python Introductory Course Series
Joe Wu, PhD NCI CCR Bioinformatics Training and Education Program ncibtep@nih.gov
Learning objectives
After this lesson, participants will
- Be able to describe Python and provide rationale for using it
- Know how to start a Jupyter Lab session on Biowulf (Jupyter Lab will be used to interact with Python throughout this course)
- Be familiar with places for getting Python packages
- Become familiar with navigating the Jupyter Lab environment
- Be able to describe Python command syntax
- Know how to find help for Python commands
- Become familiar with continuing and self-learning resources
What is Python and why use it?
- Scripting language
- Facilitates reuse and reproducibility
- Can be used to analyze large datasets
- Extensive external packages that can be used for
- Data wrangling
- Data visualization
- Single cell RNA sequencing analysis
- Working with biological sequences
- Interfacing with bioinformatics databases
- Strong support community
- Easy to learn
- Python packages can be found at see the Python Package Index
Signing onto Biowulf
In this course series, participants will interact with Python through Jupyter Lab on Biowulf. Thus, the first step is to sign onto Biowulf using ssh
. Replace username with participant's own Biowulf username.
ssh username@biowulf.nih.gov
- Mac: use
ssh
through the Terminal - Windows: use
ssh
through the command prompt
Change into Biowulf data directory
Use cd
to change into the participant's data directory on Biowulf. Again, replace username with participant's Biowulf username.
cd /data/username
Request an interactive session
Request an interactive session using sinteractive
with the following options.
--gres=lscratch:5
: to allocate 5gb of local temporary/scratch storage space--mem=2gb
: to request 2gb of memory or RAM--tunnel
: to open up a channel of communication between local machine and Biowulf to allow interaction with applications like Jupyter Lab
sinteractive --gres=lscratch:5 --mem=2g --tunnel
Complete the tunneling process
salloc: Pending job allocation 3448569
salloc: job 3448569 queued and waiting for resources
salloc: job 3448569 has been allocated resources
salloc: Granted job allocation 3448569
salloc: Waiting for resource configuration
salloc: Nodes cn4335 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
error: unable to open file /tmp/slurm-spank-x11.3448569.0
slurmstepd: error: x11: unable to read DISPLAY value
Created 1 generic SSH tunnel(s) from this compute node to
biowulf for your use at port numbers defined
in the $PORTn ($PORT1, ...) environment variables.
Please create a SSH tunnel from your workstation to these ports on biowulf.
On Linux/MacOS, open a terminal and run:
ssh -L 35671:localhost:35671 wuz8@biowulf.nih.gov
For Windows instructions, see https://hpc.nih.gov/docs/tunneling
Load Jupyter
After the tunnel has been created, go back to the Biowulf interactive session and activate Jupyter.
module load jupyter
Start Jupyter Lab
Use the command below to start a Jupyter Lab session.
jupyter lab --ip localhost --port $PORT1 --no-browser
Users will be provided with three URLs (example below) and they can copy and paste either one into a web browser to interact with Jupyter Lab.
Warning The URLs change with each Jupyter Lab session, so please do not copy from the examples shown below. Copy from the URLs provided in the Biowulf interactive session terminal instead.
To access the server, open this file in a browser:
file:///spin1/home/linux/wuz8/.local/share/jupyter/runtime/jpserver-275748-open.html
Or copy and paste one of these URLs:
http://localhost:35671/lab?token=c09d4f0483ed4780912198ed0d8a657d93ca4d01999d545b
or http://127.0.0.1:35671/lab?token=c09d4f0483ed4780912198ed0d8a657d93ca4d01999d545b
Jupyter Lab - file explorer and launcher
- File explorer
- Launcher for starting language specific notebooks (for this course series, choose the python/3.10 notebook)
Jupyter Notebook - cells
Python education resources
- Coursera
- Dataquest
Visit the self learning resources page to request a Dataquest or Coursera license.
Coursera recommendations
- Programming for Everybody (Getting Started with Python)
- Instructor: Charles Severance, PhD (University of Michigan)
- Data Analysis with Python
- Instructor: IBM staff
- Includes data wrangling and regression analysis
- Data Visualization with Python
- Intructor: IBM staff
- Introduces data visualization using packages such as Matplotlib and Seaborn
Dataquest recommendations
- https://www.dataquest.io/course/introduction-to-python/
- https://www.dataquest.io/path/data-scientist/
- https://www.dataquest.io/path/data-analyst/
Python command syntax
The command syntax for Python is composed of the
- Command
- Argument, which is enclosed in the parentheses and what the command will act on
- Options, which is enclosed in parentheses and alters the way the command runs
command(argument, options)
Example of a Python command with and without options
print("Hello", "welcome to Python")
Hello welcome to Python
Include option sep
to place a comma between "Hello" and "welcome to Python".
print("Hello", "welcome to Python", sep=", ")
Hello, welcome to Python
Finding help for Python commands
The help
command can be used to view documentations for Python commands. It follows the Python command syntax. Insert the command in which help is needed into the parentheses.
help()
Example of using help
help(print)
Help on built-in function print in module builtins:
print(...)
print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Copy class data to data directory
The example datasets used for this course series reside in /data/classes/BTEP/pies_2023_data
. Make a copy in your data
directory.
cp -r /data/classes/BTEP/pies_2023_data ./pies_2023