Supported by CCR Office of Science and Technology Resources (OSTR)
ncibtep@nih.gov

Bioinformatics Training and Education Program

Featured

Upcoming Classes & Events

June

No scheduled events

July

Organized by
CBIIT
Description

Learn how to visualize sequencing and analysis results effectively.

This session describes the application of the web-based interactive OmicCircos in R Shiny to construct circular plots with desired biological features. Example data from human and mouse genomes will be used to demonstrate over thirty plot functions along with the color selection, annotation, labeling, and zoom capabilities. User-guide, take-home video and sample plots from publications will be provided.

Learn how to visualize sequencing and analysis results effectively.

This session describes the application of the web-based interactive OmicCircos in R Shiny to construct circular plots with desired biological features. Example data from human and mouse genomes will be used to demonstrate over thirty plot functions along with the color selection, annotation, labeling, and zoom capabilities. User-guide, take-home video and sample plots from publications will be provided.

Organized by
NIH Library
Description

In this hour and half online training, attendees will learn about how to call MATLAB from Python and how to call Python libraries from MATLAB. Attendees will use MATLAB’s Python integration to improve the compatibility and usability of their code.   

By the end of this training, attendees will be able to:  

  • Call Python Read More

In this hour and half online training, attendees will learn about how to call MATLAB from Python and how to call Python libraries from MATLAB. Attendees will use MATLAB’s Python integration to improve the compatibility and usability of their code.   

By the end of this training, attendees will be able to:  

  • Call Python libraries  
  • Call user-defined Python commands, scripts, and modules  
  • Package MATLAB algorithms to be called from Python  

Attendees are expected to have some prior knowledge of Python Libraries and/or MATLAB. This training is introductory taught by MathWorks. Installation for MATLAB is not needed. 

Organized by
BTEP
Description

This lesson will introduce the tidyverse package, dplyr. Attendees will primarily learn how to filter rows and select columns from data frames.

 

This lesson will introduce the tidyverse package, dplyr. Attendees will primarily learn how to filter rows and select columns from data frames.

 

Organized by
NIH Library
Description

This 45-minute online training provides a high-level overview of recent developments in artificial intelligence (AI). Each session highlights emerging trends, tools, and use cases in the evolving AI landscape, with an emphasis on practical relevance and responsible use. Whether you're just getting started or looking to stay current, this training offers timely insights in a concise format.  

By the end of this training, attendees will be able to:   Read More

This 45-minute online training provides a high-level overview of recent developments in artificial intelligence (AI). Each session highlights emerging trends, tools, and use cases in the evolving AI landscape, with an emphasis on practical relevance and responsible use. Whether you're just getting started or looking to stay current, this training offers timely insights in a concise format.  

By the end of this training, attendees will be able to:   

  • Summarize key trends and developments in AI 
  • Identify new tools, capabilities, or applications relevant to their work 
  • Describe considerations for ethical and responsible use of AI technologies 

Attendees are not expected to have any prior knowledge to be successful in this training.

Organized by
NIH Library
Description

This hour-and-a-half online training will examine how humans process and encode visual information and how visual attributes can be utilized to create effective visualizations. This will focus on enhancing graphic literacy, exploring methods for making better visualizations, and using stakeholder needs to guide your design choices.

By the end of this training, attendees will be able to:

  • Discuss the value of Read More

This hour-and-a-half online training will examine how humans process and encode visual information and how visual attributes can be utilized to create effective visualizations. This will focus on enhancing graphic literacy, exploring methods for making better visualizations, and using stakeholder needs to guide your design choices.

By the end of this training, attendees will be able to:

  • Discuss the value of data visualization and key visualization goals
  • Provide an introduction to human perception and its role in visualization
  • Describe the principles of visual encoding.
  • Provide an overview of core visualization techniques
  • Outline the steps for effectively presenting your visualizations to different audiences.
Organized by
CBIIT
Description

Join Drs. Eytan Ruppin (presenter) and Timothy Shaw (moderator) as they present on four approaches for predicting how patients respond to checkpoint immunotherapy.

  • Approach #1: Predicting patient response to the tumor bulk transcriptome
  • Approach #2: Predicting response directly from the blood via simple routine lab tests and the tumor mutational burden
  • Approach #3: Predicting patient immunotherapy response from the tumor histopathological images
  • Approach #4: Building predictors of the tumor Read More

Join Drs. Eytan Ruppin (presenter) and Timothy Shaw (moderator) as they present on four approaches for predicting how patients respond to checkpoint immunotherapy.

  • Approach #1: Predicting patient response to the tumor bulk transcriptome
  • Approach #2: Predicting response directly from the blood via simple routine lab tests and the tumor mutational burden
  • Approach #3: Predicting patient immunotherapy response from the tumor histopathological images
  • Approach #4: Building predictors of the tumor microenvironment and developing spatially grounded biomarkers of treatment response
Organized by
BTEP
Description

This lesson will introduce the "split-apply-combine" approach to data analysis and the key players in the dplyr package used to implement this type of workflow.  

This lesson will introduce the "split-apply-combine" approach to data analysis and the key players in the dplyr package used to implement this type of workflow.  

Coding Club Seminar Series

Organized by
BTEP
Description
This session of the BTEP Coding Club will demonstrate the use of R programming to perform decision tree analysis, survival tree analysis, and random forest. This event complements a Statistics for Lunch event, "Decision Trees, Survival Trees, and Random Forest", organized by the Advanced Biomedical Computational Science group at the Frederick National Laboratory for Cancer Research. The Statistics for Lunch event will provide a Read More
This session of the BTEP Coding Club will demonstrate the use of R programming to perform decision tree analysis, survival tree analysis, and random forest. This event complements a Statistics for Lunch event, "Decision Trees, Survival Trees, and Random Forest", organized by the Advanced Biomedical Computational Science group at the Frederick National Laboratory for Cancer Research. The Statistics for Lunch event will provide a theoretical introduction to these topics, while this coding club session will focus on practical implementation using R.
The session will cover the following: 1. Decision Tree Analysis
The decision tree analysis will use the “kyphosis” dataset to predict the absence or presence of kyphosis (a type of deformation) following corrective spinal surgery.
2. Survival Tree Analysis
The survival tree analysis uses the recurrence-free survival time from a prospective randomized clinical trial conducted by the German Breast Cancer Study Group.
3. Random Forest
Random forest will be applied to the German Credit Data set, which contains 20 variables for 1000 individuals, to determine whether they should or should not receive a loan of a given amount.   This class requires knowledge and experience with R programming.  
Join Meeting
Organized by
CDSL
Description

From image analysis to federated learning and multimodal modeling, generative AI can be used to study observational succession across scales and data types. At DCEG's Data Science and Engineering Research Group (DSERG), we are developing user-facing, FAIR, privacy-preserving infrastructure for cancer research based on numerical embedding shared across, and between data types. This presentation is configured as a show-and-tell activity with live applications ranging from digital pathology to real-time epidemiology trackers. Particular Read More

From image analysis to federated learning and multimodal modeling, generative AI can be used to study observational succession across scales and data types. At DCEG's Data Science and Engineering Research Group (DSERG), we are developing user-facing, FAIR, privacy-preserving infrastructure for cancer research based on numerical embedding shared across, and between data types. This presentation is configured as a show-and-tell activity with live applications ranging from digital pathology to real-time epidemiology trackers. Particular focus will be placed on showing where and how does genAI modeling expose a numeric representation of the underlying latent space.

 CDr. Jonas Almeida leads a multidisciplinary program of data science and engineering research that combines systems biology, computational statistics, and software engineering for biomedical applications. The primary focus of his research is to accelerate the investigation of epidemiologic and genetic causes of cancer by developing innovative digital methods that advance the computational research infrastructure for precision prevention.

Organized by
BTEP
Description

This is the final lesson in the course Introductory R for Novices: Introduction to Data Wrangling. This lesson will show attendees how to join multiple data frames and transform and create new variables using dplyr.

This is the final lesson in the course Introductory R for Novices: Introduction to Data Wrangling. This lesson will show attendees how to join multiple data frames and transform and create new variables using dplyr.

Organized by
NIDDK
Description

NIDDK Biostats Seminar Series: From Research Study Design to Collecting, Managing, and Analyzing Data.

Learning Objectives

1. The learner should know the difference between observational studies, clinical trials (drug and non-drug studies), and secondary data (new data from stored samples, existing data) as defined for the NIH Clinical Center and how study development differs for each.

2. The learner should understand the Read More

NIDDK Biostats Seminar Series: From Research Study Design to Collecting, Managing, and Analyzing Data.

Learning Objectives

1. The learner should know the difference between observational studies, clinical trials (drug and non-drug studies), and secondary data (new data from stored samples, existing data) as defined for the NIH Clinical Center and how study development differs for each.

2. The learner should understand the development process, know the timeline, and know the resources available for successful protocol development.

3. The learner should understand the purpose and scope of ClinicalTrials.gov.

4. The learner should be able to identify and understand key data elements and each step of trial registration and reporting.

5. The learner should be able to understand the differences between a scientific hypothesis and a statistical hypothesis.

6. The learner should be able to translate scientific hypotheses into statistical design elements: study design, primary outcomes, statistical hypotheses, sample size calculation, and statistical analysis plan.

 

Tentative Webinar Outline:

2:30-3:00pm – Dr. Paige Studlack(Clinical Protocol Coordinator, NIDDK)

Research study types, timelines, and process for successful protocol development, IRB approval, and study initiation at the NIH, with particular emphasis on NIDDK resources and processes.

3:00– 3:30pm – Dr. Elizabeth Wright (Mathematical Statistician, Biostatistics Program Office, NIDDK)

Understanding ClinicalTrial.gov elements and how they are used in trial registration and reporting for studies at the NIH.

3:30-4:00pm – Dr. Sungyoung Auh (Mathematical Statistician, Biostatistics Program Office, NIDDK)

Translating scientific questions to needed statistical design elements for research study planning, documentation, completion, and reporting.

Organized by
NCI
Description

Considerations for protecting private data when training AI models is a topic of increasing concern. During this event, participants will discuss the use of synthetic data for privacy-preserving AI.

Considerations for protecting private data when training AI models is a topic of increasing concern. During this event, participants will discuss the use of synthetic data for privacy-preserving AI.

Organized by
NIH Library
Description

This one and a half-hour online training covers the basic principles of FAIR (Findable, Accessible, Interoperable, Reusable) data and why it is important to make your data FAIR.  This is an introductory level training.

  •  By the end of this training, attendees will be able to:  
  • Define FAIR data   
  • Explain what purpose FAIR data Read More

This one and a half-hour online training covers the basic principles of FAIR (Findable, Accessible, Interoperable, Reusable) data and why it is important to make your data FAIR.  This is an introductory level training.

  •  By the end of this training, attendees will be able to:  
  • Define FAIR data   
  • Explain what purpose FAIR data serves 
  • Apply FAIR data principles to make data findable, accessible, interoperable, and reusable 
Organized by
NIH Library
Description

This one-hour training, provided by a presenter from SAS, will demonstrate tips and tricks to make your SAS code run more efficiently. There are at least six ways to do most things in SAS, so understanding some coding guidelines can help to guide efficient decisions. Attendees are expected to have some working experience with SAS 9.4 or to have attended an introductory SAS class, such as 

This one-hour training, provided by a presenter from SAS, will demonstrate tips and tricks to make your SAS code run more efficiently. There are at least six ways to do most things in SAS, so understanding some coding guidelines can help to guide efficient decisions. Attendees are expected to have some working experience with SAS 9.4 or to have attended an introductory SAS class, such as SAS® Programming 1: Essentials.  

  • By the end of this training, attendees will be able to:   
  • Measure performance of SAS code
  • Describe how to create readable code
  • Discuss tips for basic coding recommendations and developing code 
Organized by
NIH Library
Description

The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem. 

This hour and half online training will explore the Read More

The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem. 

This hour and half online training will explore the topics of perception and cognition, and how these apply to data visualization. This training will also teach you how to visualize your data using ggplot2. We will start by creating a simple scatterplot and use that to introduce aesthetic mappings and geometric objects, the fundamental building blocks of ggplot2. You must have taken Introduction to R and RStudio training to be successful in this training. 

You can register for the other training in this series via the link below.

By the end of this training, participants should be able to: 

  • Describe how perception and cognition inform visualizations.
  • Distinguish between aesthetic mappings and geometric objects, the fundamental building blocks of ggplot.
  • Create a simple scatterplot.
  • Create a plot and save it in a high-resolution format.
  • Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:
  • Installed R and RStudio.
  • Have a basic understanding of R and RStudio.
  • Reviewed our R basics training on the NIH Data Services: On Demand Content YouTube Playlist, if you are new to R.
Organized by
NIDDK
Description

NIDDK Biostats Seminar Series: From Research Study Design to Collecting, Managing, and Analyzing Data.

Learning Objectives:

1.         To delineate features of REDCap to support project management for research studies.

2.         To outline steps to create detailed data collection plans which fulfill regulatory requirements. 

3.         To identify principled approaches to data collection and management. 

NIDDK Biostats Seminar Series: From Research Study Design to Collecting, Managing, and Analyzing Data.

Learning Objectives:

1.         To delineate features of REDCap to support project management for research studies.

2.         To outline steps to create detailed data collection plans which fulfill regulatory requirements. 

3.         To identify principled approaches to data collection and management. 

4.         To explain the connections between research rigor and reproducibility.

 

Outline:

2:30-3:00pm – Matthew Breymaier (Informatics Specialist, Office of the Clinical Director, NIDDK), Sai Theja (Senior Data Analyst, Office of the Clinical Director, NIDDK)

RedCap – functionality and basics of setup and how different types of studies can be designed in RedCap (longitudinal vs cross-sectional etc), with emphasis on NIDDK RedCap.  

 

3:00– 4:00pm – Dr. Kenneth Wilkins (Mathematical Statistician, Biostatistics Program Office, NIDDK)

Document organization and access as part of study planning: regulatory, clinical, and case report forms 

Data Management and Sharing Plans

Data Management for Reproducibility

Organized by
NIH Library
Description

The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem. 

This one hour and half online training builds on Read More

The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem. 

This one hour and half online training builds on the topics covered in the Data Visualization in ggplot training. This training emphasizes advanced customization techniques in ggplot, to create effective and clear visualizations. Participants will build on the foundational skills learned in Part 1 of the series and apply various customization options, such as faceting, labeling, themes, and color scales.  You must have taken Data Visualization in R: Introduction to ggplot: Part 1 of 2 training to be successful in this training.  

By the end of this training, attendees should be able to:  

  • Create a scatterplot in ggplot 
  • Learn how to facet a plot 
  • Demonstrate options for customizing the title and axis 
  • Apply different ggplot themes Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:
  • Installed R and RStudio.
  • Have a basic understanding of R and RStudio.
  • Reviewed our R basics training on the NIH Data Services: On Demand Content YouTube Playlist, if you are new to R.
  • You can register for the training in this series via the link below:  
  • Data Visualization in R:  introduction to ggplot Part 1 of 2 
Organized by
BTEP
Description

Globus is a GUI-based software suitable for efficiently transferring large datasets such as those generated from Next Generation Sequencing (NGS) to and from high performance computing systems such as NIH’s Biowulf. This demonstration only class will show participants how to access and setup Globus for transferring data from local computer as well as sequencing facility data management environment (DME) to Biowulf. This class is open only to NIH staff. Meeting link will be Read More

Globus is a GUI-based software suitable for efficiently transferring large datasets such as those generated from Next Generation Sequencing (NGS) to and from high performance computing systems such as NIH’s Biowulf. This demonstration only class will show participants how to access and setup Globus for transferring data from local computer as well as sequencing facility data management environment (DME) to Biowulf. This class is open only to NIH staff. Meeting link will be provided upon approval of registration.

Registeration link: https://cbiit.webex.com/weblink/register/re8dc373ea594662a5e3b0e92a71582fb

Distinguished Speakers Seminar Series

Organized by
BTEP
Description

The role of computational science in biomedical research has typically been downstream of experiments, where it plays important roles in signal processing, data integration, pattern detection, and hypothesis testing. But this is changing, and predictive models are now being used to generate and test hypotheses in silico. In this talk, Dr. Pollard will share examples from human genetics, where they have built deep learning models of 3D chromatin interactions that take only Read More

The role of computational science in biomedical research has typically been downstream of experiments, where it plays important roles in signal processing, data integration, pattern detection, and hypothesis testing. But this is changing, and predictive models are now being used to generate and test hypotheses in silico. In this talk, Dr. Pollard will share examples from human genetics, where they have built deep learning models of 3D chromatin interactions that take only sequence as input and then used them to interpret disease variants. This strategy leads to causal hypotheses and enables them to prioritize variants with predicted functional effects. Experiments designed using model outputs are accelerating the rate of discoveries, shedding light on genetic mechanisms in cancer and developmental disorders. This prediction-first strategy exemplifies Dr. Pollard's vision for a more proactive, rather than reactive, role for computational science in biomedical research.

Organized by
NIDDK
Description

NIDDK Biostats Seminar Series: From Research Study Design to Collecting, Managing, and Analyzing Data.

Learning Objectives:

Be able to identify, load, and use R resources/packages based upon needs and experience level with R.  

1. For beginners, know how to load R Commander, import data, and navigate the GUI.  

2. For those interested in Read More

NIDDK Biostats Seminar Series: From Research Study Design to Collecting, Managing, and Analyzing Data.

Learning Objectives:

Be able to identify, load, and use R resources/packages based upon needs and experience level with R.  

1. For beginners, know how to load R Commander, import data, and navigate the GUI.  

2. For those interested in learning more about coding/functions, how to use R Swirl to learn foundations for functions and coding higher level operations (loops, combining functions, and building new functions).  

3. For regular users of R, how to use tidyverse for data manipulation, organization, and preparation for analysis.  

4. For those using R for research work, how to utilize R Markdown for appropriate and thorough project documentation and management.  

 

Outline:

2:30-3:00pm –Beginner level (Dr. Wilkins, Mathematical Statistician, Biostatistics Program Office, NIDDK)

How to get the basics accomplished: load data, navigate RCommander GUI, and export data.

3:00– 3:30pm – Intermediate level (Dr. Leary, Chief, Biostatistics Program Office, NIDDK)

Data manipulation and organization for analysis with focus on tools for more complex coding and functionality.

3:30-4:00pm – Advanced topics (Dr. Leary)

Leveraging R Markdown and other resources for project management, documentation, and archiving.

August

Organized by
NIH Library
Description

This one-hour online training will provide a high-level overview of Python coding concepts, as well as some of the integrative development environments (IDEs, such as Jupyter notebooks) used for Python coding. Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. The training will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. Read More

This one-hour online training will provide a high-level overview of Python coding concepts, as well as some of the integrative development environments (IDEs, such as Jupyter notebooks) used for Python coding. Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. The training will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. This overview training will demonstrate how these skills can boost productivity, rigor, and transparency in reporting research findings.  

By the end of the training, attendees will be able to: 

  • Recognize four freely available IDEs for python coding 

  • Identify fundamental components of python code 

  • Understand how and why notebooks support rigor and transparency in analysis 

Attendees are not expected to have any prior knowledge of python coding or the IDEs to be successful in this training.  

If you choose to follow along with Google Colab or Jupyter Notebooks, these IDEs should be installed and ready to go. Code will be provided during the training for this option.