Python Introductory Course Series

Joe Wu, PhD
NCI CCR Bioinformatics Training and Education Program
ncibtep@nih.gov

Learning objectives

After this lesson, participants will

  • Be able to describe Python and provide rationale for using it
  • Know how to start a Jupyter Lab session on Biowulf (Jupyter Lab will be used to interact with Python throughout this course)
  • Be familiar with places for getting Python packages
  • Become familiar with navigating the Jupyter Lab environment
  • Be able to describe Python command syntax
  • Know how to find help for Python commands
  • Become familiar with continuing and self-learning resources

What is Python and why use it?

  • Scripting language
    • Facilitates reuse and reproducibility
  • Can be used to analyze large datasets
  • Extensive external packages that can be used for
    • Data wrangling
    • Data visualization
    • Single cell RNA sequencing analysis
    • Working with biological sequences
    • Interfacing with bioinformatics databases
  • Strong support community
  • Easy to learn
  • Python packages can be found at see the Python Package Index

Signing onto Biowulf

In this course series, participants will interact with Python through Jupyter Lab on Biowulf. Thus, the first step is to sign onto Biowulf using ssh. Replace username with participant's own Biowulf username.

ssh username@biowulf.nih.gov
  • Mac: use ssh through the Terminal
  • Windows: use ssh through the command prompt

Change into Biowulf data directory

Use cd to change into the participant's data directory on Biowulf. Again, replace username with participant's Biowulf username.

cd /data/username

Request an interactive session

Request an interactive session using sinteractive with the following options.

  • --gres=lscratch:5: to allocate 5gb of local temporary/scratch storage space
  • --mem=2gb: to request 2gb of memory or RAM
  • --tunnel: to open up a channel of communication between local machine and Biowulf to allow interaction with applications like Jupyter Lab
sinteractive --gres=lscratch:5 --mem=2g --tunnel

Complete the tunneling process

salloc: Pending job allocation 3448569
salloc: job 3448569 queued and waiting for resources
salloc: job 3448569 has been allocated resources
salloc: Granted job allocation 3448569
salloc: Waiting for resource configuration
salloc: Nodes cn4335 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
error: unable to open file /tmp/slurm-spank-x11.3448569.0
slurmstepd: error: x11: unable to read DISPLAY value

Created 1 generic SSH tunnel(s) from this compute node to 
biowulf for your use at port numbers defined 
in the $PORTn ($PORT1, ...) environment variables.


Please create a SSH tunnel from your workstation to these ports on biowulf.
On Linux/MacOS, open a terminal and run:

    ssh  -L 35671:localhost:35671 wuz8@biowulf.nih.gov

For Windows instructions, see https://hpc.nih.gov/docs/tunneling

Load Jupyter

After the tunnel has been created, go back to the Biowulf interactive session and activate Jupyter.

module load jupyter

Start Jupyter Lab

Use the command below to start a Jupyter Lab session.

jupyter lab --ip localhost --port $PORT1 --no-browser

Users will be provided with three URLs (example below) and they can copy and paste either one into a web browser to interact with Jupyter Lab.

Warning
The URLs change with each Jupyter Lab session, so please do not copy from the examples shown below. Copy from the URLs provided in the Biowulf interactive session terminal instead.

To access the server, open this file in a browser:
        file:///spin1/home/linux/wuz8/.local/share/jupyter/runtime/jpserver-275748-open.html
    Or copy and paste one of these URLs:
        http://localhost:35671/lab?token=c09d4f0483ed4780912198ed0d8a657d93ca4d01999d545b
     or http://127.0.0.1:35671/lab?token=c09d4f0483ed4780912198ed0d8a657d93ca4d01999d545b

Jupyter Lab - file explorer and launcher

  • File explorer
  • Launcher for starting language specific notebooks (for this course series, choose the python/3.10 notebook)

Jupyter Lab

Jupyter Notebook - cells

Jupyter Notebook

Python education resources

  • Coursera
  • Dataquest

Visit the self learning resources page to request a Dataquest or Coursera license.

Coursera recommendations

  • Programming for Everybody (Getting Started with Python)
    • Instructor: Charles Severance, PhD (University of Michigan)
  • Data Analysis with Python
    • Instructor: IBM staff
    • Includes data wrangling and regression analysis
  • Data Visualization with Python
    • Intructor: IBM staff
    • Introduces data visualization using packages such as Matplotlib and Seaborn

Dataquest recommendations

Python command syntax

The command syntax for Python is composed of the

  • Command
  • Argument, which is enclosed in the parentheses and what the command will act on
  • Options, which is enclosed in parentheses and alters the way the command runs
command(argument, options)

Example of a Python command with and without options

print("Hello", "welcome to Python")
Hello welcome to Python

Include option sep to place a comma between "Hello" and "welcome to Python".

print("Hello", "welcome to Python", sep=", ")
Hello, welcome to Python

Finding help for Python commands

The help command can be used to view documentations for Python commands. It follows the Python command syntax. Insert the command in which help is needed into the parentheses.

help()

Example of using help

help(print)
Help on built-in function print in module builtins:

print(...)
    print(value, ..., sep=' ', end='\n', file=sys.stdout, flush=False)
    
    Prints the values to a stream, or to sys.stdout by default.
    Optional keyword arguments:
    file:  a file-like object (stream); defaults to the current sys.stdout.
    sep:   string inserted between values, default a space.
    end:   string appended after the last value, default a newline.
    flush: whether to forcibly flush the stream.

Copy class data to data directory

The example datasets used for this course series reside in /data/classes/BTEP/pies_2023_data. Make a copy in your data directory.

cp -r /data/classes/BTEP/pies_2023_data ./pies_2023