Getting Started with Python
Joe Wu, PhD
NCI/CCR Bioinformatics Training and Education Program
ncibtep@nih.gov
Lesson 1 learning objectives
After this class, participants will be able to:
- Describe Python and provide rationale for using Python
- List tools for interacting with Python
- Sign onto Biowulf, start a Jupyter Lab session, and become familiar with the Jupyter Notebook interface.
- Describe Python command syntax
- Describe where to get and how to install external packages
- Get help for Python commands
Why use Python?
- General purpose scripting language
- Analyze and visualize large datasets
- Reusability and reproducibility
- Versioning and keeping track of changes is possible when analyzing data using scripts
- Easy to learn
- External packages that enhances functionality
- Large community support
Python enables elegant data visualization
An abundance of external packages make scientific computing and data presentation easy. For instance, the packages matplotlib and seaborn good tools for generating data visualizations. With a few lines of code, scientists can generate scatter plots to view relationship between variables and/or heatmaps that can reveal distinct clusters in a dataset.
Generating a scatter plot using Matplotlib
import matplotlib.pyplot as plt
import numpy
x=numpy.array([0,1,2,3,4,5,6,7,8])
y=numpy.array([0.5,2,5,6,7,10,13,14,16])
plt.scatter(x,y)
slope, intercept=numpy.polyfit(x,y,1)
plt.plot(x,slope*x+intercept)
plt.text(1,14,'y='+str(round(slope,3))+'x' ' + ' + str(round(intercept,3)))
plt.xlabel('x')
plt.ylabel('y')
Generating a gene expression heatmap using Seaborn
import pandas
import seaborn
counts1=pandas.read_csv("../data/hbr_uhr_normalized_counts.csv", index_col=[0])
seaborn.clustermap(counts1,z_score=0,cmap="viridis", figsize=(5,5))
plt.suptitle("Gene expression heatmap",y=1.1)
Tools for interacting Python
- Python can be run at the command prompt
- Ipython
- Run python script at the command prompt
- Integrated Development Environments such as:
- Visual Studio Code from Microsoft has extensions that support Python scripting
- R Studio
- Juptyer Lab/Notebook
Python at the command prompt
Assuming Python is installed, just type python
at the command prompt to start using Python. Hit control-d to exit back to the command prompt. The downside to this is that users cannot save the commands into a script.
Ipython
Ipython enables users to run Python commands interactively at the terminal. It features autocomplete of commands and allows for saving of commands to a python script using %save
followed. The example below save some commands to a file called pies_class_2025_ipython.py
in the /data/$USER/pies_class_2025
directory on Biowulf.
Hit control-d to exit Ipython and return to the command prompt.
Stay /data/$USER/pies_class_2025
and list the content to make sure that pies_class_2025_ipython.py
is there.
ls
pies_class_2025_ipython.py pies_data
While using Ipython is better than just running commands on the terminal, it still is not very efficient in terms of saving work. Also, users will not be able to view plots on HPC systems such as Biowulf since these do not support inspection of graphical outputs.
Note
The pies_class_2025_ipython.py
script can be run from the command line. To run a Python script from command line, just do python
followed by name of the script. Python scripts can also be submitted as job to the Biowulf batch system.
python pies_class_2025_ipython.py
hello
3.141592653589793
5.0
Using Python through IDE
Integrated Development Environments or IDE are ideal for scripting in Python as well as other languages. See https://ritza.co/comparisons/pycharm-vs-spyder-vs-jupyter-vs-visual-studio-vs-anaconda-vs-intellij.html for a breakdown of of common ones such as Spyder, Pycharm, VS Code, R Studio, and Jupyter Lab. Essentially, IDE enable users to write scripts, access as well as view data, and view plots. These also enable users to generate analysis report that details steps of an analysis as well as the tool and the code use.
Accessing Python at NIH
- Biowulf (HPC OnDemand is recommended).
- Use Python locally on government furnished personal computer via NIH Anaconda Professional License. This will require users to install Anaconda to local computer.
- NCI scientists also can use Python through Posit Workbench. Fill out the form at https://forms.office.com/pages/responsepage.aspx?id=eHW3FHOX1UKFByUcotwrBnYgWNrH6QdOsCsoiQ9eiaZUQ1ZZODJKT0FERUdHOVZYUkJaMzA2UDAxSi4u&route=shorturl to request access.
Using Python through Biowulf
This class will use Jupyter Lab installed on Biowulf for interactions with Python. To get started, open a Terminal (if working on a Mac) or a Command Prompt (if working on Windows) and sign into the user's Biowulf accounts.
In the ssh
command construct below, be sure to replace user with the participant's own Biowulf login ID.
ssh user@biowulf.nih.gov
Next, change into the participant's Biowulf data directory. Remember to replace user with the participant's own Biowulf login ID.
cd /data/user
In the participant's data
directory, create a folder called pies_class_2025
.
mkdir pies_class_2025
Finally, copy the pies_data
directory in /data/classes/BTEP
on Biowulf to the pies_class_2025
.
cp -r /data/classes/BTEP/pies_data .
Spin up Jupyter Lab in HPC OnDemand.
- Open a web browser on local computer (Google Chrome is recommended) and go to https://hpcondemand.nih.gov/, which is the URL for Biowulf's HPC OnDemand.
- Once at HPC OnDemand, sign in with participant's NIH credentials.
- After signing in, users will see quick links to applications available through HPC OnDemand. Click on the one for Jupyter.
- In subsequent page will allow users to specify compute resources. Leave these as is for this class.
- Make sure to specify for Jupyter to start in the
/data/$USER/pies_class_2025
directory.
Click on "Connect to Jupyter" when the Jupyter Lab session has been granted.
users will see an interface that looks like below. The left hand panel is the file explorer. Users can navigate through files and folders that are available in the directory in which Jupyter Lab was started. The launcher panel contains quick links for iniitiating a Jupyter Notebook in the user's language of choice.
Create a new Jupyter Notebook
Create a new Jupyter Notebook in Python 3.12 (click on the "python/3.12" tile). The new notebook has the name "Untitled.ipynb". Click on the disk icon in the notebook menu bar to rename it pies_class_2025.
Tip
For a detailed overview of Jupyter Lab, see BTEP's Documenting Analysis Steps using Jupyter Lab
Python Command Syntax
Arguments and options for Python commands are enclosed in parentheses. In general, the anatomy is command(argument, option)
.
For example, the command below is print
and it will display the argument, "Hello BTEP".
print("Hello BTEP")
Hello BTEP
To get help for a Python command, use help
.
For instance:
help(print)
From the print
command's help information, line breaks can be added using \n
. Try the following to print three sentences, one in each line.
print("University of Florida is in Gainesville, Florida.\n"
"Their mascot is the Gators.\n"
"The Gators men's basketball team won the national championship in 2025, 2007, and 2006.")
University of Florida is in Gainesville, Florida.
Their mascot is the Gators.
The Gators men's basketball team won the national championship in 2025, 2007, and 2006.
Installing external packages
Python external packages are found at the Python Package Index. To install a package from PyPi, just use pip install package_name
, where package_name can be any package of choice. For instance, to install scipy, do:
pip install scipy
pip
is the package installer for Python. If pip
is not available with the user's Python installation, see https://pip.pypa.io/en/stable/installation/ to learn how to get it.
To uninstall, do pip uninstall package_name
.
To update a package, use pip install --upgrade package_name
.
pip freeze
will pull up a list of currently installed Packages installed via pip
.
Those who chose to use the package manager Anaconda can install via the command line using conda install package_name
. Again, package_name is the user's package of choice. Package managers offer the benefit of reducing issues that arise from versioning, dependency, and security when installing software. See https://docs.conda.io/projects/conda/en/stable/user-guide/tasks/manage-pkgs.html to learn more about installing, updating, and uninstalling packages using Conda. For working locally on government furnished personal computer, researchers are recommended to use the NIH Anaconda Professional License. Biowulf also has a guide on manage Anaconda environments on the cluster. See https://hpc.nih.gov/docs/diy_installation/conda.html.
https://github.com/igvteam/igv-reports http://gorgonzola.cshl.edu/pfb/2014/problem_sets/IGVTutorial_CSH_2014/igvtools_exercise.pdf