Supported by CCR Office of Science and Technology Resources (OSTR)
ncibtep@nih.gov

Bioinformatics Training and Education Program

Upcoming Classes & Events

January

No scheduled events

February

Description

Don’t miss this brand-new double-feature of two essential AI courses in one! We will start with AI Done Right: Ethics and Privacy to set the stage for understanding the AI landscape and the rules of the road at NIH and then start building your skills with Prompt Like a Pro: Getting the Most from AI with Effective Prompt Engineering.

AI Done Right: Ethics and Privacy Just because you can use Read More

Don’t miss this brand-new double-feature of two essential AI courses in one! We will start with AI Done Right: Ethics and Privacy to set the stage for understanding the AI landscape and the rules of the road at NIH and then start building your skills with Prompt Like a Pro: Getting the Most from AI with Effective Prompt Engineering.

AI Done Right: Ethics and Privacy Just because you can use AI doesn’t mean you should. This session focuses on how to use Copilot and other AI tools responsibly at NIH—protecting privacy, avoiding sensitive data issues, and understanding boundaries. Learn what’s allowed, what’s not, and how to model ethical AI use.

Prompt Like a Pro: Getting the Most from AI with Effective Prompt Engineering What’s the secret to great AI results? Great prompts. This hands-on class teaches you how to craft clear, specific, and effective instructions for Copilot and other AI tools. Practice real-world examples and get a toolkit of reusable prompt templates you can start using right away.

Organized by
NIH Library
Description

This one-hour online training provides researchers with an overview of online resources for locating research datasets, data repositories, and data publications for data sharing and re-use. Participants will learn search strategies for locating datasets through federated data search portals and generalist data repositories, including directories for locating discipline-specific and institutional data repositories. An overview of key issues to consider when re-using datasets or when locating a data repository for sharing Read More

This one-hour online training provides researchers with an overview of online resources for locating research datasets, data repositories, and data publications for data sharing and re-use. Participants will learn search strategies for locating datasets through federated data search portals and generalist data repositories, including directories for locating discipline-specific and institutional data repositories. An overview of key issues to consider when re-using datasets or when locating a data repository for sharing and preservation purposes will be discussed. 

By the end of this training, attendees will be able to:  

  • Locate different types of data repositories and datasets 

  • Identify issues to consider with data repositories 

  • Discuss how data repositories can improve reproducibility
  • Identify issues to consider when re-using datasets 

  • Describe guidelines and resources for citing datasets 

Attendees are not expected to have any prior knowledge of these resources to be successful in this training. 

Organized by
NIH Library
Description

This hour-and-a-half online training will examine how humans process and encode visual information and how visual attributes can be utilized to create effective visualizations. This will focus on enhancing graphic literacy, exploring methods for making better visualizations, and using stakeholder needs to guide your design choices.

By the end of this training, attendees will be able to:

  • Analyze how different visual encodings affect the accuracy of data interpretation.
  • <Read More

This hour-and-a-half online training will examine how humans process and encode visual information and how visual attributes can be utilized to create effective visualizations. This will focus on enhancing graphic literacy, exploring methods for making better visualizations, and using stakeholder needs to guide your design choices.

By the end of this training, attendees will be able to:

  • Analyze how different visual encodings affect the accuracy of data interpretation.
  • Use Gestalt principles and preattentive attributes to design visualizations that improve clarity, grouping, and rapid perception.
  • Evaluate the appropriateness of color scales.
  • Identify and correct common visualization pitfalls.
Organized by
ABCS/FNLCR
Description

In his seminal 2001 paper, the famous statistician Leo Breiman juxtaposed the inferential approach and the predictive approach to statistical analysis. I will begin this lecture with an overview of these two approaches. Then, I will illustrate the difference between them using examples from bioinformatics and biomedical diagnostics. I will then delve deeper into the similarities and differences between these two approaches and will use regression as the main example. I will emphasize that, while Read More

In his seminal 2001 paper, the famous statistician Leo Breiman juxtaposed the inferential approach and the predictive approach to statistical analysis. I will begin this lecture with an overview of these two approaches. Then, I will illustrate the difference between them using examples from bioinformatics and biomedical diagnostics. I will then delve deeper into the similarities and differences between these two approaches and will use regression as the main example. I will emphasize that, while some statistical models can be both predictive and inferential, the recommended methodological approach is to choose the analysis goal (inference or prediction) in advance and then plan the data collection and analysis accordingly. I will conclude by making the point that both statistical approaches – inferential and predictive – are useful tools in biological data analysis. Beginner knowledge of statistics is expected, intermediate or advanced is preferred.

Organized by
NIH Library
Description

This one-hour and thirty minute online training is part one of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation.  &Read More

This one-hour and thirty minute online training is part one of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation.   

By the end of part one of this training series, attendees will be able to:   

  • Understand data management best practices   

  • Become familiar with data management tools  

  • Have a solid knowledge of the resources, enabling data sharing  

During Part 2, attendees will learn about sharing and archiving data. You must register separately for Part 2 of this training. This training is introductory, no prior knowledge required.  

Organized by
NIH GREI
Description

Streamlining Data Sharing: Practical Tools and Researcher Stories from the NIH GREI

Webinar 2

This session will help you make your dataset more FAIR. We will walk through GREI's Data Submission Checklist (https://doi.org/10.5281/zenodo.14278906), which provides practical guidance for planning, preparing, and publishing your data in a generalist repository. This session will highlight stories from NIH-funded researchers that show how sharing and reusing data through GREI repositories can increase Read More

Streamlining Data Sharing: Practical Tools and Researcher Stories from the NIH GREI

Webinar 2

This session will help you make your dataset more FAIR. We will walk through GREI's Data Submission Checklist (https://doi.org/10.5281/zenodo.14278906), which provides practical guidance for planning, preparing, and publishing your data in a generalist repository. This session will highlight stories from NIH-funded researchers that show how sharing and reusing data through GREI repositories can increase research impact and visibility.

Organized by
NIH Library
Description

This hour and half online training is part two of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation.  

By the Read More

This hour and half online training is part two of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation.  

By the end of part two of this training series, attendees will be able to:   

  • Have a solid knowledge of the resources, enabling data sharing  
  • Understand how data is archived and preserved  
  • Part 1 of this training covers understanding research data, how to manage research data, and how to work with data. During Part 2, attendees learn about sharing and archiving data. This training is introductory, no prior knowledge required. 
Join Meeting
Organized by
NLM
Description

The NLM Division of Intramural Research (DIR) is pleased to welcome Tiarnán Keenan, MD, PhD, Stadtman Tenure-Track Investigator and Director of the Medical Retina Fellowship Program at the National Eye Institute to present his lecture entitled "Diverse Applications of Computational Research and Artificial Intelligence in Ophthalmology".

Ophthalmology is ideally positioned to benefit from recent advances in computational data science and artificial intelligence. As a highly image-based specialty, it offers non-invasive, Read More

The NLM Division of Intramural Research (DIR) is pleased to welcome Tiarnán Keenan, MD, PhD, Stadtman Tenure-Track Investigator and Director of the Medical Retina Fellowship Program at the National Eye Institute to present his lecture entitled "Diverse Applications of Computational Research and Artificial Intelligence in Ophthalmology".

Ophthalmology is ideally positioned to benefit from recent advances in computational data science and artificial intelligence. As a highly image-based specialty, it offers non-invasive, high-resolution views of the microvascular circulation and the central nervous system, creating rich opportunities for computational analysis with direct clinical relevance. This seminar will present diverse applications of advanced biostatistics, computational research, and machine learning techniques in ophthalmology, with a focus on age-related macular degeneration, the leading cause of blindness in industrialized countries, and cataract, the leading cause of blindness worldwide. Topics will include automated disease detection, quantitative severity classification, and prognostic prediction of disease progression from retinal imaging data, with and without the integration of genetic information. Methodological themes will span deep feature extraction, label transfer, and multi-modal, multi-task learning frameworks.

The NLM Colloquia on Biomedical Data Science and Computational Biology Research is a series of scientific lectures featuring experts from across the bioinformatics community who present their research and discuss how it contributes to advancing biomedical discovery. This series is presented by NLM’s DIR a premier hub of innovation for computational biology and biomedical data science.

Organized by
NIH Library
Description

In partnership with the NIH Clinical Center's Biostatistics and Clinical Epidemiology Service (BCES), the NIH Library is offering several trainings that cover general concepts behind statistics and epidemiology. These trainings will help participants  better understand and prepare data, interpret results and findings, design and prepare studies, and understand the results in published literature. 

This four-hour online training will address fundamental statistical concepts including Read More

In partnership with the NIH Clinical Center's Biostatistics and Clinical Epidemiology Service (BCES), the NIH Library is offering several trainings that cover general concepts behind statistics and epidemiology. These trainings will help participants  better understand and prepare data, interpret results and findings, design and prepare studies, and understand the results in published literature. 

This four-hour online training will address fundamental statistical concepts including hypothesis testing, p-values and confidence intervals, types of data and their distributional importance, and bias and confounding. Time will be devoted to questions from attendees and references will be provided for in-depth self-study.    

By the end of this training, attendees will be able to:  

  • Describe key concepts in statistical procedures

  • Understand the steps involved in hypothesis testing 

  • Define p-values and be familiar with their appropriate uses 

  • Describe confidence intervals and their uses

  • Understand differences in types of data and how to summarize them 

  • Describe bias and confounding

Join Meeting
Organized by
BTEP
Description

This session introduces NIH STRIDES and Cloud Lab resources for bioinformatics and generative AI workflows in the cloud. Participants will tour key tutorial libraries, including the STRIDES Cloud Lab GitHub repositories (AWS, GCP, Azure notebooks) and the NIGMS GitHub. The session concludes with a live cloud demonstration showing how to build a grounded chatbot using a Snakemake-based datastore and show responses based on the indexed sources.

This session introduces NIH STRIDES and Cloud Lab resources for bioinformatics and generative AI workflows in the cloud. Participants will tour key tutorial libraries, including the STRIDES Cloud Lab GitHub repositories (AWS, GCP, Azure notebooks) and the NIGMS GitHub. The session concludes with a live cloud demonstration showing how to build a grounded chatbot using a Snakemake-based datastore and show responses based on the indexed sources.

Organized by
NIH Library
Description

This one-hour online training will cover the fundamentals, applications, and ethical considerations of Artificial Intelligence (AI). Attendees will explore key topics such as machine learning, deep learning, data handling, and real-world AI applications across various industries. The session will also delve into the ethical implications of AI and provide insights on becoming AI literate. Whether you're a seasoned professional or just starting your AI journey, this session will equip you with essential knowledge to Read More

This one-hour online training will cover the fundamentals, applications, and ethical considerations of Artificial Intelligence (AI). Attendees will explore key topics such as machine learning, deep learning, data handling, and real-world AI applications across various industries. The session will also delve into the ethical implications of AI and provide insights on becoming AI literate. Whether you're a seasoned professional or just starting your AI journey, this session will equip you with essential knowledge to navigate the AI landscape effectively and make informed decisions in our data-driven world.

By the end of this training, attendees will be able to: 

  • Understand the core concepts of AI 
  • Recognize the significance of ethical considerations in AI 
  • Begin the journey toward AI literacy

Attendees are not expected to have any prior knowledge of AI to be successful in this training. 

Organized by
NIH Library
Description

This one and a half hour online training equips participants with powerful data wrangling techniques using R and the tidyverse ecosystem. The tidyverse is a cohesive ecosystem of R packages designed to make data science workflows more intuitive and efficient through consistent syntax and design principles. Designed for both beginners and those looking to refine their skills, this training addresses the challenges posed by messy datasets.  

By Read More

This one and a half hour online training equips participants with powerful data wrangling techniques using R and the tidyverse ecosystem. The tidyverse is a cohesive ecosystem of R packages designed to make data science workflows more intuitive and efficient through consistent syntax and design principles. Designed for both beginners and those looking to refine their skills, this training addresses the challenges posed by messy datasets.  

By the end of this training, attendees will be able to

  • Diagnose and address common data quality issues in clinical datasets.
  • Apply systematic approaches to clean and standardize text, dates, and numerical values.
  • Transform messy data and handle missing values using tidyverse functions, including appropriate imputation strategies.
  • Design reproducible, automated data-cleaning workflows with tidyverse tools for transformation and aggregation.

Requirements 

Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:

  • Installed R and RStudio.
  • Have a basic understanding of R and RStudio.
  • Reviewed our R basics training on the NIH Data Services: On Demand Content YouTube Playlist, if you are new to R
Organized by
NIH GREI
Description

Streamlining Data Sharing: Practical Tools and Researcher Stories from the NIH GREI

A clear and comprehensive Data Management and Sharing (DMS) Plan is essential for meeting NIH policy requirements. This session introduces GREI’s guide to help you incorporate generalist repositories into your DMS Plan (https://doi.org/10.5281/zenodo.14278957), offering recommended language and concrete examples. Learn how to write a stronger, more compliant plan and hear stories from researchers benefiting from sharing Read More

Streamlining Data Sharing: Practical Tools and Researcher Stories from the NIH GREI

A clear and comprehensive Data Management and Sharing (DMS) Plan is essential for meeting NIH policy requirements. This session introduces GREI’s guide to help you incorporate generalist repositories into your DMS Plan (https://doi.org/10.5281/zenodo.14278957), offering recommended language and concrete examples. Learn how to write a stronger, more compliant plan and hear stories from researchers benefiting from sharing data via GREI repositories.

Organized by
NIH Library
Description

This one hour and half hour online training will equip attendees with essential knowledge and skills for effective interactions with Large Language Model (LLM) AI chatbots. Explore the intricacies of prompt engineering and its pivotal role in optimizing the conversational capabilities of LLMs. Emphasizing best practices and practical applications, this training features live demonstrations and provides valuable skills for the effective use of LLMs. 

This one hour and half hour online training will equip attendees with essential knowledge and skills for effective interactions with Large Language Model (LLM) AI chatbots. Explore the intricacies of prompt engineering and its pivotal role in optimizing the conversational capabilities of LLMs. Emphasizing best practices and practical applications, this training features live demonstrations and provides valuable skills for the effective use of LLMs. 

By the end of this training, attendees will be able to:  

  • Define LLMs, prompt patterns, and prompt engineering
  • Identify potential uses and issues to consider when using LLMs in the biomedical research field
  • Use a selection of prompt patterns to improve generated output from LLMs
  • Identify resources for learning more about prompt engineering in LLMs 

Attendees are not expected to have any prior knowledge of AI chatbots to be successful in this training.

March

Organized by
NIH Library
Description

This 45-minute online Lunch and Learn training will help attendees develop their own customized strategy for responsibly incorporating generative artificial intelligence (AI) tools, such as ChatGPT, into their workflows. 

By the end of this training, attendees will be able to: 

  • Assess appropriate use cases for generative AI tools within their specific research/work context&Read More

This 45-minute online Lunch and Learn training will help attendees develop their own customized strategy for responsibly incorporating generative artificial intelligence (AI) tools, such as ChatGPT, into their workflows. 

By the end of this training, attendees will be able to: 

  • Assess appropriate use cases for generative AI tools within their specific research/work context 

  • Develop a customized generative AI usage strategy 

  • Document their approach for using generative AI tools 

Attendees are not expected to have any prior knowledge of generative AI tools to be successful in this training. 

Organized by
NIH Library
Description

This one-hour online training, is the first of a two-part series, which introduces participants to cleaning and exploring a patient health dataset using Python and pandas. Attendees will load tabular data, inspect structure and data types, summarize columns, and identify common data quality problems such as missing values, inconsistent formats, and duplicate records. They will then apply practical fixes, including standardizing height and weight units, parsing and normalizing dates of birth, splitting combined fields, Read More

This one-hour online training, is the first of a two-part series, which introduces participants to cleaning and exploring a patient health dataset using Python and pandas. Attendees will load tabular data, inspect structure and data types, summarize columns, and identify common data quality problems such as missing values, inconsistent formats, and duplicate records. They will then apply practical fixes, including standardizing height and weight units, parsing and normalizing dates of birth, splitting combined fields, and using Boolean masks to flag or correct implausible values.​

By the end of this session students will be able to:

  • Import CSV data into pandas DataFrames and quickly understand column types, basic statistics, and overall data quality.​
  • Identify duplicate or repeated patient records and decide whether to keep, correct, or remove them.​
  • Detect and handle missing or inconsistent values using methods such as isna, fillna, filtering, and conditional replacement.​
  • Standardize mixed formats (for example, heights with and without units, date strings in different formats, and numeric values embedded in text).​
  • Create derived columns such as systolic and diastolic blood pressure, and use logical conditions to flag questionable or out-of-range values.​

Attendees are expected to have:

  • Basic Python coding knowledge
  • Familiarity with an IDE and loading script and data files into the IDE. (Colab, Jupyter Notebooks) 

Requirements: 

  • Participants will receive a script file and data files prior to the training. These should be loaded and ready to use before the training session begins. 

You can register for Part 2 in this series via the link below: 

https://www.nihlibrary.nih.gov/training/introduction-data-wrangling-using-python-part-2-2

Organized by
NIH Library
Description

This one-hour online training, the second session of the two-part series,  focuses on reshaping and enriching the cleaned patient dataset to prepare it for analysis and reporting. Attendees will practice splitting and recombining columns (for example, separating full names into first and last names), converting columns to appropriate data types, and engineering new fields such as outlier indicators and blood pressure status labels. The session also covers merging multiple tables (patient details, contact Read More

This one-hour online training, the second session of the two-part series,  focuses on reshaping and enriching the cleaned patient dataset to prepare it for analysis and reporting. Attendees will practice splitting and recombining columns (for example, separating full names into first and last names), converting columns to appropriate data types, and engineering new fields such as outlier indicators and blood pressure status labels. The session also covers merging multiple tables (patient details, contact information, and subsets of records) and filtering or subsetting data to answer specific analytical questions.​

By the end of this training, attendees will be able to:

  • Reshape and restructure data by splitting and combining columns, changing data types, and reordering or selecting relevant fields.​
  • Engineer clinically useful features, including z-score–based outlier flags, hypertension indicators, and combined status columns for downstream models or dashboards.​
  • Merge and join DataFrames using common keys (such as patient ID) to bring together core data with supplemental tables like contact information.​
  • Filter and subset records based on multiple conditions (for example, patients with diabetes and abnormal blood pressure) to create analysis-ready datasets.​

Attendees are expected to have:

  • To have attended Intro to Data Wrangling Using Python - Part 1 of the series
  • Basic Python coding knowledge

Familiarity with an IDE and loading script and data files into the IDE. (Colab, Jupyter Notebooks) 

Requirements: 

  • Participants will receive a script file and data files prior to the training. These should be loaded and ready to use before the training session begins. 

You can register for Part 1 in this series via the link below: 

https://www.nihlibrary.nih.gov/training/introduction-data-wrangling-using-python-part-1-2

Organized by
NIH Library
Description

This 45-minute online training provides a high-level overview of recent developments in artificial intelligence (AI). Each session highlights emerging trends, tools, and use cases in the evolving AI landscape, with an emphasis on practical relevance and responsible use. Whether you're just getting started or looking to stay current, this training offers timely insights in a concise format.  

By the end of this Read More

This 45-minute online training provides a high-level overview of recent developments in artificial intelligence (AI). Each session highlights emerging trends, tools, and use cases in the evolving AI landscape, with an emphasis on practical relevance and responsible use. Whether you're just getting started or looking to stay current, this training offers timely insights in a concise format.  

By the end of this training, attendees will be able to:   

  • Summarize key trends and developments in AI 

  • Identify new tools, capabilities, or applications relevant to their work 

  • Describe considerations for ethical and responsible use of AI technologies 

Attendees are not expected to have any prior knowledge to be successful in this training. 

Distinguished Speakers Seminar Series

Join Meeting
Organized by
BTEP
Description

In this talk, Dr. Carey will describe how Bioconductor approaches new challenges in supporting open method development and reproducible
analyses in genomic data science. He will discuss aspects of the project that bear on education in cancer epidemiology and
computational cancer genomics, and on emerging topics in software and data engineering for scalable omics analyses.

In this talk, Dr. Carey will describe how Bioconductor approaches new challenges in supporting open method development and reproducible
analyses in genomic data science. He will discuss aspects of the project that bear on education in cancer epidemiology and
computational cancer genomics, and on emerging topics in software and data engineering for scalable omics analyses.

Organized by
NCI
Description
Overview

This 3-day, virtual workshop will explore how foundation models—a powerful class of advanced AI models —can transform cancer research and clinical care. We will focus on their potential to improve diagnosis, prognosis, and treatment response, with a strong emphasis on clinical translation and technology development.

Key Topics:
  1. Foundation Read More
Overview

This 3-day, virtual workshop will explore how foundation models—a powerful class of advanced AI models —can transform cancer research and clinical care. We will focus on their potential to improve diagnosis, prognosis, and treatment response, with a strong emphasis on clinical translation and technology development.

Key Topics:
  • Foundation Model Primer: A high-level introduction to foundation models.
  • Multimodal Data: Combining pathology, radiology, omics, and patient data into unified models.
  • Prediction: Predicting therapeutic response, resistance, and patient outcomes.
  • Validation and Reproducibility: Ensuring model results are consistent and reliable for real-world clinical performance and use.
  • Diagnostic Case Studies: Real-world applications for early detection and automated diagnostics.
  • Federated Learning: Approaches to training robust models across multiple institutions—without sharing sensitive patient data
  • Challenges, Risk, and Regulation: Addressing model interpretability and regulatory considerations for clinical adoption.
  • Agenda (https://events.cancer.gov/dctd/foundationmodel/agenda)