Introducing Bioinformatics for Beginners

December 2, 2024

What is bioinformatics?

Bioinformatics integrates biology, statistics, and computer science to develop and apply theory, methods, and tools for the collection, storage, and analysis of biological and related data. Bioinformatics plays a critical role in cancer research including research on the origin, evolution, progression, and treatment of cancer.

Why learn bioinformatics?

Bioinformatics gives you the power to analyze your own data.
Learning bioinformatics will enhance other scientific skills. For example, by understanding the principles involved with data generation and analysis, you’ll be better equipped to design robust experiments and interpret their results effectively.
Some knowledge of bioinformatics can open doors to exciting career opportunities.
A solid foundation in bioinformatics will allow you to gain a deeper appreciation for how others analyze biological data, fostering collaboration and critical thinking.

What is Bioinformatics for Beginners?

The BTEP mission is to enable scientists to understand and analyze their own experimental data. While BTEP does this year-round by providing instruction and training in bioinformatics methods, tools, databases, and emerging technologies, we also offer an annual or biannual multi-lesson course devoted to teaching introductory bioinformatics, Bioinformatics for Beginners.

Bioinformatics for Beginners was designed to teach the basic skills needed for bioinformatics in the context of next generation sequencing experiments:

Working on the command line
Working with a high-performance computing (HPC) cluster
Introduction to key bioinformatics tools and software suites
Getting started with workflows and pipelines
Quality control analysis
Sequence alignment and mapping
Statistical analysis and visualization (Intro to R Programming)
Functional enrichment analysis or pathway analysis

This course teaches these skills by focusing on a single workflow: bulk RNA-Seq. All steps needed to complete an RNA-Seq workflow from raw data to differential expression and gene ontology analysis, will be covered. Many of the skills learned are foundational to most bioinformatics analyses and can be applied to the analysis of other types of next generation sequencing experiments.

How will the course be organized?

This course will be divided into three modules. Each module will include several lessons. Module 1 will focus on Unix and Biowulf. Specifically, lessons will focus on developing command line skills, getting started and working on Biowulf (the NIH HPC cluster), and downloading and working with data from publicly available database (e.g., SRA, GEO). Module 2 will focus on the primary steps involved with RNA-Seq analysis including quality control analysis, adapter trimming, alignment-based methods, classification-based methods, feature counts, and differential expression analysis. Lastly, Module 3 will focus on gene ontology and pathway analysis.

Who can take this course?

There are no prerequisites to take this course. This course is open to NIH researchers interested in learning bioinformatics skills, especially those relevant to analyzing bulk RNA sequencing data. Lessons will be taught on the NIH HPC Biowulf. While it is recommended that learners obtain a Biowulf account, a limited number of student accounts will be available.

When does the course begin? The course will begin with Module 1 in January 2025. Researchers with the Center for Cancer Research will receive a registration announcement toward the end of December / beginning of January. Please email ncibtep@nih.gov if you have any comments, questions, or concerns.

Bioinformatics Training and Education Program