Supported by CCR Office of Science and Technology Resources (OSTR)
ncibtep@nih.gov

Bioinformatics Training and Education Program

Featured

Upcoming Classes & Events

August

No scheduled events

September

Organized by
BTEP
Description

Jupyter Lab is a platform to organize code and analysis steps in one place, allowing users to easily keep track of all steps taken in an analysis, thereby facilitating collaboration and research presentation. This class is a demo and not hands-on. Participants will learn how to access Jupyter Lab and steps involved in producing reproducible analysis reports using this software. Experience using or installation of Jupyter Lab is not needed to participate. Attendance is Read More

Jupyter Lab is a platform to organize code and analysis steps in one place, allowing users to easily keep track of all steps taken in an analysis, thereby facilitating collaboration and research presentation. This class is a demo and not hands-on. Participants will learn how to access Jupyter Lab and steps involved in producing reproducible analysis reports using this software. Experience using or installation of Jupyter Lab is not needed to participate. Attendance is restricted to NIH staff. Meeting link will be provided upon approval of registration. This session is a part of the BTEP Introduction to Bioinformatics Summer Series.

 

Registration: https://cbiit.webex.com/weblink/register/r815ad6bb5bed14200c89b48c92134b82 

Organized by
NHLBI
Description

Join the National Heart, Lung, and Blood Institute (NHLBI) for a hybrid workshop to explore how medicine will be transformed by the current artificial intelligence (AI) revolution. Participants will engage with leading experts to learn the current state and visionary outlook of the field and to identify research gaps and opportunities in AI, focusing on clinical decision support.

The primary aim of this workshop is to Read More

Join the National Heart, Lung, and Blood Institute (NHLBI) for a hybrid workshop to explore how medicine will be transformed by the current artificial intelligence (AI) revolution. Participants will engage with leading experts to learn the current state and visionary outlook of the field and to identify research gaps and opportunities in AI, focusing on clinical decision support.

The primary aim of this workshop is to explore how AI can be utilized to aid in diagnosing and treating heart, lung, blood, and sleep disorders (HLBS). AI is a computer science field focused on creating systems that perform tasks requiring human intelligence. These tasks include learning, reasoning, problem-solving, perception, language understanding, and interaction. AI includes subfields like machine learning, natural language processing, robotics, and computer vision. In biomedical research and health care, AI analyzes complex datasets, enhances diagnostic accuracy, personalizes treatment plans, and improves healthcare delivery. 

Additionally, the workshop aligns with the broader mission of NHLBI to promote the prevention and treatment of heart, lung, and blood diseases, and enhance the health of all individuals so that they can live longer and more fulfilling lives.

Organized by
NIH Research Festival
Description

https://researchfestival.nih.gov/2025-nih-research-festival

Organized by
BTEP
Description

This event will be in-person only, with no Webex/hybrid option. 

We will record the sessions and make them available in the BTEP Video Archive.

There is no registration necessary. 

Morning Session

(10 – 10:30 AM) Resources for Bioinformatics Data Analysis at NIH (Amy Stonelake, BTEP)

(10:30 – 11:00 AM) Supercharge your Data Analysis Read More

This event will be in-person only, with no Webex/hybrid option. 

We will record the sessions and make them available in the BTEP Video Archive.

There is no registration necessary. 

Morning Session

(10 – 10:30 AM) Resources for Bioinformatics Data Analysis at NIH (Amy Stonelake, BTEP)

(10:30 – 11:00 AM) Supercharge your Data Analysis with Biowulf (Antonio Ulloa, CIT)

(11 AM – 11:30 AM) From Data to Discovery: Highlights from the Computational Genomics and Bioinformatics Branch (CGBB/CBIIT/NCI) (Daoud Meerzaman, CBIIT)

(11:30 – 12 noon) Gen AI Community of Practice Tools and Training (Nick Weber, CIT)

(12 noon – 1 PM) Bioinformatics Q & A and Lunch Break

Afternoon Session

(1 – 1:30 PM) GeneAgent: An LLM Powered Tool for Gene Set Analysis (Zhiyong Lu, NCBI/NLM)

(1:30 – 2 PM) AI in Clinical Pathology Image Analysis (Baris Turkbey, NCI/CCR/MIB)

(2 -2:30 PM) SCassist: An AI-Powered Workflow Assistant for Single-Cell Analysis (Vijay Nagarajan, NEI)

(2:30 – 3 PM) Single Cell and Spatial Transcriptomics SIG (Stefan Cordes, NHLBI)

(3 – 3:30 PM) NIH Artificial Intelligence Interest Group (Ryan O’Neill, NHLBI)

(3:30 - 4:00 PM) Bioinformatics Academic Offerings at FAES (Morgan Merriman, FAES)

 

Organized by
Rare Disease Informatics SIG
Description

Rare Disease Informatics SIG will host a hybrid workshop at NIH Research Festival to discuss the current research landscape, key challenges, and potential solutions in rare disease informatics across NIH.

Agenda:

1. Opening Remarks (5 min)

2. Scientific Talks (20 min each session) 

  • Session A: AI applications in RD Research
  • Read More

Rare Disease Informatics SIG will host a hybrid workshop at NIH Research Festival to discuss the current research landscape, key challenges, and potential solutions in rare disease informatics across NIH.

Agenda:

1. Opening Remarks (5 min)

2. Scientific Talks (20 min each session) 

  • Session A: AI applications in RD Research
  • Session B: Standards, Resources & Ontologies
  • Session C: LLMs and Generative AI

3. Roundtable Discussion (25 mins) 

Organized by
NCBI
Description

The National Center for Biotechnology Information (NCBI) is hosting a workshop at the NIH Research Festival 2025! This hands-on workshop, led by experts from the NIH Sequence Read Archive (SRA), will guide researchers through the latest tools and formats for accessing and working with large-scale genomic data.

Learning Objectives:

  • Participants will explore cloud-optimized formats like SRA Lite
  • Learn how to Read More

The National Center for Biotechnology Information (NCBI) is hosting a workshop at the NIH Research Festival 2025! This hands-on workshop, led by experts from the NIH Sequence Read Archive (SRA), will guide researchers through the latest tools and formats for accessing and working with large-scale genomic data.

Learning Objectives:

  • Participants will explore cloud-optimized formats like SRA Lite
  • Learn how to efficiently search and retrieve data, and
  • Gain practical insights from real-world case studies

Skill level: this workshop is suitable for both new and experienced users, the session includes interactive exercises and strategies for sustainable data management.

 

Organized by
NCI
Description

Federated data analysis allows for collaboration and analysis across institutions without physically moving individual-level data to a central location, thus protecting sensitive data and maintaining data security. Join the NCI Cohort Consortium for an insightful webinar on federated data analysis methods and tools, featuring advances in Privacy-Preserving Data Analysis (PDA) and practical applications of Stata for registry data.

The session will include an overview of the

Federated data analysis allows for collaboration and analysis across institutions without physically moving individual-level data to a central location, thus protecting sensitive data and maintaining data security. Join the NCI Cohort Consortium for an insightful webinar on federated data analysis methods and tools, featuring advances in Privacy-Preserving Data Analysis (PDA) and practical applications of Stata for registry data.

The session will include an overview of the PDA toolbox, real-world use cases, and guidance on when to use federated vs. metadata-based approaches. The session will also showcase how Stata can be leveraged to manage, harmonize, and analyze large-scale registry datasets, with practical examples and best practices in the context of epidemiologic research.

Organized by
NIH Library
Description

This one-hour online training will provide a high-level overview of Python coding concepts, as well as some of the integrative development environments (IDEs, such as Jupyter notebooks) used for Python coding. Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. The training will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. This overview Read More

This one-hour online training will provide a high-level overview of Python coding concepts, as well as some of the integrative development environments (IDEs, such as Jupyter notebooks) used for Python coding. Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. The training will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. This overview training will demonstrate how these skills can boost productivity, rigor, and transparency in reporting research findings.  

By the end of the training, attendees will be able to: 

  • Recognize four freely available IDEs for python coding 
  • Identify fundamental components of python code 
  • Understand how and why notebooks support rigor and transparency in analysis 

Attendees are not expected to have any prior knowledge of python coding or the IDEs to be successful in this training.  

If you choose to follow along with Google Colab or Jupyter Notebooks, these IDEs should be installed and ready to go. Code will be provided during the training for this option.

Distinguished Speakers Seminar Series

Organized by
BTEP
Description

The ability to measure gene expression levels for individual cells (vs. pools of cells) and with spatial resolution is crucial to address many important biological and medical questions, such as the study of stem cell differentiation, the discovery of cellular subtypes in the brain, and cancer diagnosis and treatment. Single-cell transcriptome sequencing (RNA-Seq) allows the high-throughput measurement of gene expression levels for entire genomes at the resolution of single Read More

The ability to measure gene expression levels for individual cells (vs. pools of cells) and with spatial resolution is crucial to address many important biological and medical questions, such as the study of stem cell differentiation, the discovery of cellular subtypes in the brain, and cancer diagnosis and treatment. Single-cell transcriptome sequencing (RNA-Seq) allows the high-throughput measurement of gene expression levels for entire genomes at the resolution of single cells. Spatially-resolved transcriptomics further allows the measurement of gene expression levels along with the location of the RNA molecules within a tissue. Transcriptomics exemplifies the range of issues one encounters in a data science workflow, where the data are complex in a variety of ways, questions are not always clearly formulated, there are multiple analysis steps, and drawing on rigorous statistical principles and methods is essential to derive meaningful and reliable biological results. 

In this talk, Dr. Dudoit will provide a survey of statistical questions related to the analysis of single-cell transcriptome sequencing data to investigate the differentiation of stem cells in the brain, including, exploratory data analysis, expression quantitation, cluster analysis, and the inference of cellular lineages. She will also address differential expression analysis in spatial transcriptomics.

Single Cell Seminar Series

Organized by
BTEP
Description
Over the past decade, the field of computational cell biology has undergone a transformation — from cataloging cell types to modeling how cells behave, interact, and respond to perturbations. In this talk, Dr. Theis will review and explore how machine learning is enabling this shift, focusing on two converging frontiers: integrated cellular mapping and actionable generative models.   He'll begin with a brief overview of recent advances in representation learning for atlas-scale integration, highlighting work Read More
Over the past decade, the field of computational cell biology has undergone a transformation — from cataloging cell types to modeling how cells behave, interact, and respond to perturbations. In this talk, Dr. Theis will review and explore how machine learning is enabling this shift, focusing on two converging frontiers: integrated cellular mapping and actionable generative models.   He'll begin with a brief overview of recent advances in representation learning for atlas-scale integration, highlighting work across the Human Cell Atlas and beyond. These efforts aim to unify diverse single-cell and spatial modalities into shared manifolds of cellular identity and state. As one example, he will present our recent multimodal atlas of human brain organoids, which integrates transcriptomic variation across development and lab protocols.   From there, he'll review the emerging landscape of foundation models in single-cell genomics, including their work on Nicheformer, a transformer trained on millions of spatial and dissociated cells. These models offer generalizable embeddings for a range of tasks—but more importantly, they set the stage for predictive modeling of biological responses.   He'll close by introducing perturbation models leveraging generative AI to model interventions on these systems. As example he will show Cellflow, a generative framework that learns how perturbations such as drugs, cytokines or gene edits — shift cellular phenotypes. It enables virtual experimental design, including in silico protocol screening for brain organoid differentiation. This exemplifies a move toward models that not only interpret biological systems but help shape them.
Organized by
NIH Library
Description

The "Data Visualization in R" series focuses on using ggplot2 and the broader tidyverse ecosystem to create visualizations. Attendees will progress from foundational plotting techniques to advanced customization, learning to create multi-faceted displays and apply professional styling. The series emphasizes ggplot's flexibility and power within a tidy data workflow. By the end of the series, attendees will have a solid foundation in building effective visualizations using the tidyverse ecosystem.

This hour and Read More

The "Data Visualization in R" series focuses on using ggplot2 and the broader tidyverse ecosystem to create visualizations. Attendees will progress from foundational plotting techniques to advanced customization, learning to create multi-faceted displays and apply professional styling. The series emphasizes ggplot's flexibility and power within a tidy data workflow. By the end of the series, attendees will have a solid foundation in building effective visualizations using the tidyverse ecosystem.

This hour and half online training will explore the topics of perception and cognition, and how these apply to data visualization. This training will also teach you how to visualize your data using ggplot2. We will start by creating a simple scatterplot and use that to introduce aesthetic mappings and geometric objects, the fundamental building blocks of ggplot2. You must have taken Introduction to R and RStudio training to be successful in this training. 

By the end of this training, participants should be able to: 

  • Distinguish between aesthetic mappings and geometric objects, the fundamental building blocks of ggplot.
  • Create a simple scatterplot.
  • Create a plot and save it in a high-resolution format.
  • Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:

    Organized by
    NCI Cancer AI Conversations Series
    Description

    There are many challenges associated with moving cancer AI originally developed in a research setting into a clinical setting, including into clinical trials. During this event, participants will discuss the integration of AI in the clinic and in clinical trials for oncology.

    There are many challenges associated with moving cancer AI originally developed in a research setting into a clinical setting, including into clinical trials. During this event, participants will discuss the integration of AI in the clinic and in clinical trials for oncology.

    Organized by
    NIH Library
    Description

    This one-hour online training covers various aspects of sharing code using MATLAB community tools like File Exchange and GitHub. Well-documented methods and workflows enable reproducible research by helping scientists follow each other’s experimental logic and interpret results.

    By the end of this training, attendees will be able to: 

    • Share code with collaborators and the scientific community
    • Create notebook-style Live Scripts using MATLAB Live Read More

    This one-hour online training covers various aspects of sharing code using MATLAB community tools like File Exchange and GitHub. Well-documented methods and workflows enable reproducible research by helping scientists follow each other’s experimental logic and interpret results.

    By the end of this training, attendees will be able to: 

    • Share code with collaborators and the scientific community
    • Create notebook-style Live Scripts using MATLAB Live Editor 
    • Leverage MATLAB Community Resources to make code, projects, and toolboxes available 
    • Learn how to access MATLAB through the browser and share licenses with collaborators 

    This is an introductory-level training taught by MathWorks. No installation of MATLAB is necessary.

    Organized by
    NCI Office of Data Sharing
    Description

    Please use this link to access overview, registration, and other information:

    https://events.cancer.gov/nci/ods-data-jamboree

    Childhood cancer is a rare disease with ~15,000 cases diagnosed annually in the United States in individuals younger than 20 years. Despite extensive efforts made over the last two decade by programs such as National Institutes of Health (NIH)'s Gabriela Miller Kids First Programand&Read More

    Please use this link to access overview, registration, and other information:

    https://events.cancer.gov/nci/ods-data-jamboree

    Childhood cancer is a rare disease with ~15,000 cases diagnosed annually in the United States in individuals younger than 20 years. Despite extensive efforts made over the last two decade by programs such as National Institutes of Health (NIH)'s Gabriela Miller Kids First Programand NCI's Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and Childhood Cancer Data Initiative (CCDI) to generate, collect and share the data, pediatric and AYA cancer datasets remain underutilized. Finding and accessing datasets, building specific pediatric cancer cohorts, and aggregating or linking datasets from various data systems still present tremendous challenges for the wider community. To overcome these barriers and raise awareness of existing childhood cancer data resources to inform better diagnosis and treatment options for children, this data jamboree is to bring together researchers and citizen scientists with diverse expertise and experience to collaborate and explore scientific or other questions using childhood cancer data. The goals of the jamboree include:

    • Promoting access and reuse of pediatric cancer data and raising awareness about the availability of these datasets.
    • Promoting interdisciplinary collaborations to expand the size, technical, and scientific diversity of the pediatric cancer research community.
    • Promoting development of new methods and tools for data analysis.
    • Identifying gaps and limitations of existing data and resources including barriers to real time access to the data.
     
    Organized by
    NIH Library
    Description

    The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem. 

    This one hour and Read More

    The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem. 

    This one hour and half online training builds on the topics covered in the Data Visualization in ggplot training. This training emphasizes advanced customization techniques in ggplot, to create effective and clear visualizations. Participants will build on the foundational skills learned in Part 1 of the series and apply various customization options, such as faceting, labeling, themes, and color scales.  You must have taken Data Visualization in R: Introduction to ggplot: Part 1 of 2 training to be successful in this training.  

    By the end of this training, attendees should be able to:  

    • Create a scatterplot in ggplot 
    • Learn how to facet a plot 
    • Demonstrate options for customizing the title and axis 
    • Apply different ggplot themes 

    Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:

    • Installed R and RStudio.
    • Have a basic understanding of R and RStudio.
    • Reviewed our R basics training on the NIH Data Services: On Demand Content YouTube Playlist, if you are new to R.

    October

    No scheduled events