Course Overview

The BTEP Bioinformatics Summer Series is five classes that introduce novices and new NIH scientists the essentials for getting started with bioinformatics analysis of next generation sequencing (NGS) data. Topics include a broad survey of bioinformatics resources available to all scientists at NIH, an overview of the NIH High Performance Computing System (Biowulf), a discussion and comparison between programming languages R and Python, and tips for making data FAIR and data analyses reproducible. These classes are open to NIH audience only.

August 7, 2025 (Thursday, 1 PM – 2 PM):
- Introduction to Bioinformatics Resources will inform participants of software (commercial and open-source), self-learning tools, and resources for bioinformatics and data science.
August 14, 2025 (Thursday, 1 PM – 2 PM):
- Introduction to Unix and Biowulf will serve as a crash course for using the NIH Unix-based High Performance Computing system (Biowulf). Participants will learn to navigate through directories, work with files, and use bioinformatics applications that are installed on the system.
August 21, 2025 (Thursday, 1 PM – 2 PM):
- Overview of R and Python will discuss the benefits that these two popular programming languages can bring to a bioinformatics project. After this session, participants should be able to decide which language to use for a given data analysis.
September 4, 2025 (Thursday, 1 PM – 2 PM):
- Managing Data Analysis Projects using Jupyter Lab. This class will introduce participants to Jupyter Lab, a tool for maintaining data, code, output, and description of analyses all in one place, which facilitates transparency and reproducibility of data analysis.

Course Expectations / Learning Objectives

This course will make participants aware of available bioinformatics training resources and software at NIH. Participants will walk away from these classes with the knowledge and confidence needed to pursue bioinformatics and continual self learning. In sum, after this course series, participants will:

Be able to describe available training resources and software available for bioinformatics at NIH.
Know the rationale and basics of getting started with working on Unix-based high performance computing platforms.
Be informed of when to use Python or R in a bioinformatics project.
Understand best practices for managing and organizing data.
Become familiar with rationale and tools used to make data analysis reproducible.

Required Course Materials

Participants will not need to have bioinformatics experience to attend and there are no required course material. However, the following may help with following along.

Biowulf account. This would be helpful for participants to follow along in the Unix and Biowulf class. Biowulf will also provide a limited number (40) of student/temporary accounts for those who would like to follow along. Student/temporary accounts are meant for use during teaching activities and must be approved and supplied by Biowulf.
Have R, Python, and/or Jupyter Lab installed on participant's government furnished computer. Reach out to institutional computing help desk to request installation of software. NCI CCR affiliates can submit a ticket with service.cancer.gov for software installation.
To install R, see https://cran.r-project.org/
To install Python, see https://www.python.org/downloads/
To install Jupyter Lab, see https://jupyter.org/install