Getting Started with Python

Joe Wu, PhD
NCI/CCR Bioinformatics Training and Education Program
ncibtep@nih.gov

Lesson 1 learning objectives

After this class, participants will be able to:

Describe Python and provide rationale for using Python
List tools for interacting with Python
Sign onto Biowulf, start a Jupyter Lab session, and become familiar with the Jupyter Notebook interface.
Describe Python command syntax
Describe where to get and how to install external packages
Get help for Python commands

Why use Python?

General purpose scripting language
- Analyze and visualize large datasets
- Reusability and reproducibility
- Versioning and keeping track of changes is possible when analyzing data using scripts
- Easy to learn
External packages that enhances functionality
Large community support

Python enables elegant data visualization

An abundance of external packages make scientific computing and data presentation easy. For instance, the packages matplotlib and seaborn good tools for generating data visualizations. With a few lines of code, scientists can generate scatter plots to view relationship between variables and/or heatmaps that can reveal distinct clusters in a dataset.

Generating a scatter plot using Matplotlib

import matplotlib.pyplot as plt
import numpy

x=numpy.array([0,1,2,3,4,5,6,7,8])
y=numpy.array([0.5,2,5,6,7,10,13,14,16])
plt.scatter(x,y) 
slope, intercept=numpy.polyfit(x,y,1) 
plt.plot(x,slope*x+intercept) 
plt.text(1,14,'y='+str(round(slope,3))+'x' ' + ' + str(round(intercept,3)))
plt.xlabel('x') 
plt.ylabel('y')

Generating a gene expression heatmap using Seaborn

import pandas
import seaborn
counts1=pandas.read_csv("../data/hbr_uhr_normalized_counts.csv", index_col=[0])
seaborn.clustermap(counts1,z_score=0,cmap="viridis", figsize=(5,5))
plt.suptitle("Gene expression heatmap",y=1.1)

Tools for interacting Python

Python can be run at the command prompt
Ipython
Run python script at the command prompt
Integrated Development Environments such as:
- Spyder
- Pycharm
Visual Studio Code from Microsoft has extensions that support Python scripting
R Studio
Juptyer Lab/Notebook

Python at the command prompt

Assuming Python is installed, just type python at the command prompt to start using Python. Hit control-d to exit back to the command prompt. The downside to this is that users cannot save the commands into a script.

Ipython

Ipython enables users to run Python commands interactively at the terminal. It features autocomplete of commands and allows for saving of commands to a python script using %save followed. The example below save some commands to a file called pies_class_2025_ipython.py in the /data/$USER/pies_class_2025 directory on Biowulf.

Hit control-d to exit Ipython and return to the command prompt.

Stay /data/$USER/pies_class_2025 and list the content to make sure that pies_class_2025_ipython.py is there.

ls

pies_class_2025_ipython.py  pies_data

While using Ipython is better than just running commands on the terminal, it still is not very efficient in terms of saving work. Also, users will not be able to view plots on HPC systems such as Biowulf since these do not support inspection of graphical outputs.

Note

The pies_class_2025_ipython.py script can be run from the command line. To run a Python script from command line, just do python followed by name of the script. Python scripts can also be submitted as job to the Biowulf batch system.

python pies_class_2025_ipython.py

hello
3.141592653589793
5.0

Using Python through IDE

Integrated Development Environments or IDE are ideal for scripting in Python as well as other languages. See https://ritza.co/comparisons/pycharm-vs-spyder-vs-jupyter-vs-visual-studio-vs-anaconda-vs-intellij.html for a breakdown of of common ones such as Spyder, Pycharm, VS Code, R Studio, and Jupyter Lab. Essentially, IDE enable users to write scripts, access as well as view data, and view plots. These also enable users to generate analysis report that details steps of an analysis as well as the tool and the code use.

Accessing Python at NIH

Biowulf (HPC OnDemand is recommended).
Use Python locally on government furnished personal computer via NIH Anaconda Professional License. This will require users to install Anaconda to local computer.
NCI scientists also can use Python through Posit Workbench. Fill out the form at https://forms.office.com/pages/responsepage.aspx?id=eHW3FHOX1UKFByUcotwrBnYgWNrH6QdOsCsoiQ9eiaZUQ1ZZODJKT0FERUdHOVZYUkJaMzA2UDAxSi4u&route=shorturl to request access.

Using Python through Biowulf

This class will use Jupyter Lab installed on Biowulf for interactions with Python. To get started, open a Terminal (if working on a Mac) or a Command Prompt (if working on Windows) and sign into the user's Biowulf accounts.

In the ssh command construct below, be sure to replace user with the participant's own Biowulf login ID.

ssh user@biowulf.nih.gov

Next, change into the participant's Biowulf data directory. Remember to replace user with the participant's own Biowulf login ID.

cd /data/user

In the participant's data directory, create a folder called pies_class_2025.

mkdir pies_class_2025

Finally, copy the pies_data directory in /data/classes/BTEP on Biowulf to the pies_class_2025.

cp -r /data/classes/BTEP/pies_data .

Spin up Jupyter Lab in HPC OnDemand.

Open a web browser on local computer (Google Chrome is recommended) and go to https://hpcondemand.nih.gov/, which is the URL for Biowulf's HPC OnDemand.
Once at HPC OnDemand, sign in with participant's NIH credentials.
After signing in, users will see quick links to applications available through HPC OnDemand. Click on the one for Jupyter.

In subsequent page will allow users to specify compute resources. Leave these as is for this class.

Make sure to specify for Jupyter to start in the /data/$USER/pies_class_2025 directory.

Click on "Connect to Jupyter" when the Jupyter Lab session has been granted.

users will see an interface that looks like below. The left hand panel is the file explorer. Users can navigate through files and folders that are available in the directory in which Jupyter Lab was started. The launcher panel contains quick links for iniitiating a Jupyter Notebook in the user's language of choice.

Create a new Jupyter Notebook

Create a new Jupyter Notebook in Python 3.12 (click on the "python/3.12" tile). The new notebook has the name "Untitled.ipynb". Click on the disk icon in the notebook menu bar to rename it pies_class_2025.

Tip

For a detailed overview of Jupyter Lab, see BTEP's Documenting Analysis Steps using Jupyter Lab

Python Command Syntax

Arguments and options for Python commands are enclosed in parentheses. In general, the anatomy is command(argument, option).

For example, the command below is print and it will display the argument, "Hello BTEP".

print("Hello BTEP")

Hello BTEP

To get help for a Python command, use help.

For instance:

help(print)

From the print command's help information, line breaks can be added using \n. Try the following to print three sentences, one in each line.

print("University of Florida is in Gainesville, Florida.\n"
"Their mascot is the Gators.\n"
"The Gators men's basketball team won the national championship in 2025, 2007, and 2006.")

University of Florida is in Gainesville, Florida.
Their mascot is the Gators.
The Gators men's basketball team won the national championship in 2025, 2007, and 2006.

Installing external packages

Python external packages are found at the Python Package Index. To install a package from PyPi, just use pip install package_name, where package_name can be any package of choice. For instance, to install scipy, do:

pip install scipy

pip is the package installer for Python. If pip is not available with the user's Python installation, see https://pip.pypa.io/en/stable/installation/ to learn how to get it.

To uninstall, do pip uninstall package_name.

To update a package, use pip install --upgrade package_name.

pip freeze will pull up a list of currently installed Packages installed via pip.

Those who chose to use the package manager Anaconda can install via the command line using conda install package_name. Again, package_name is the user's package of choice. Package managers offer the benefit of reducing issues that arise from versioning, dependency, and security when installing software. See https://docs.conda.io/projects/conda/en/stable/user-guide/tasks/manage-pkgs.html to learn more about installing, updating, and uninstalling packages using Conda. For working locally on government furnished personal computer, researchers are recommended to use the NIH Anaconda Professional License. Biowulf also has a guide on manage Anaconda environments on the cluster. See https://hpc.nih.gov/docs/diy_installation/conda.html.

https://github.com/igvteam/igv-reports http://gorgonzola.cshl.edu/pfb/2014/problem_sets/IGVTutorial_CSH_2014/igvtools_exercise.pdf