Supported by CCR Office of Science and Technology Resources (OSTR)
ncibtep@nih.gov

Bioinformatics Training and Education Program

Upcoming Classes & Events

August

Organized by
BTEP
Description

This class provides an overview of the R and Python programming languages, and how each is used in bioinformatics research. Participants will learn the advantages of each language and how to choose which is most applicable to a data analyses. Learning resources for beginners will be provided and questions answered.  Attendance is restricted to NIH staff. This class is not hands-on. Meeting link will be provided upon approval of registration.

 

Read More

This class provides an overview of the R and Python programming languages, and how each is used in bioinformatics research. Participants will learn the advantages of each language and how to choose which is most applicable to a data analyses. Learning resources for beginners will be provided and questions answered.  Attendance is restricted to NIH staff. This class is not hands-on. Meeting link will be provided upon approval of registration.

 

Registration: https://cbiit.webex.com/weblink/register/rd1761836833d5b5790d978418e4eecf2

Organized by
NIH Library
Description

This one hour and half online training is designed for those who create reproducible projects and also want to extend the basics of R Markdown and apply those skills in Quarto. Quarto is an open-source scientific and technical publishing system that offers multilingual programming language support to create documents, books, presentations, blogs, and Read More

This one hour and half online training is designed for those who create reproducible projects and also want to extend the basics of R Markdown and apply those skills in Quarto. Quarto is an open-source scientific and technical publishing system that offers multilingual programming language support to create documents, books, presentations, blogs, and other online resources.

By the end of this training, attendees will be able to: 

  • Define reproducibility from a data science perspective
  • Distinguish between R-markdown and Quarto
  • Identify publishing workflows using markdown
  • Demonstrate the differences between the visual and source editors
  • Create basic markdown elements
  • Learn how to create and run code-blocks
  • Render a markdown document

View more details at https://www-nihlibrary-nih-gov.ezproxy.nihlibrary.nih.gov/training/introduction-quarto-scientific-writing-1.

 

September

Organized by
BTEP
Description

Jupyter Lab is a platform to organize code and analysis steps in one place, allowing users to easily keep track of all steps taken in an analysis, thereby facilitating collaboration and research presentation. This class is a demo and not hands-on. Participants will learn how to access Jupyter Lab and steps involved in producing reproducible analysis reports using this software. Experience using or installation of Jupyter Lab is not needed to participate. Attendance is Read More

Jupyter Lab is a platform to organize code and analysis steps in one place, allowing users to easily keep track of all steps taken in an analysis, thereby facilitating collaboration and research presentation. This class is a demo and not hands-on. Participants will learn how to access Jupyter Lab and steps involved in producing reproducible analysis reports using this software. Experience using or installation of Jupyter Lab is not needed to participate. Attendance is restricted to NIH staff. Meeting link will be provided upon approval of registration. This session is a part of the BTEP Introduction to Bioinformatics Summer Series.

 

Registration: https://cbiit.webex.com/weblink/register/re16ac11f4d295ca6f9c6cd061790316c

Organized by
NHLBI
Description

Join the National Heart, Lung, and Blood Institute (NHLBI) for a hybrid workshop to explore how medicine will be transformed by the current artificial intelligence (AI) revolution. Participants will engage with leading experts to learn the current state and visionary outlook of the field and to identify research gaps and opportunities in AI, focusing on clinical decision support.

The primary aim of this workshop is to Read More

Join the National Heart, Lung, and Blood Institute (NHLBI) for a hybrid workshop to explore how medicine will be transformed by the current artificial intelligence (AI) revolution. Participants will engage with leading experts to learn the current state and visionary outlook of the field and to identify research gaps and opportunities in AI, focusing on clinical decision support.

The primary aim of this workshop is to explore how AI can be utilized to aid in diagnosing and treating heart, lung, blood, and sleep disorders (HLBS). AI is a computer science field focused on creating systems that perform tasks requiring human intelligence. These tasks include learning, reasoning, problem-solving, perception, language understanding, and interaction. AI includes subfields like machine learning, natural language processing, robotics, and computer vision. In biomedical research and health care, AI analyzes complex datasets, enhances diagnostic accuracy, personalizes treatment plans, and improves healthcare delivery. 

Additionally, the workshop aligns with the broader mission of NHLBI to promote the prevention and treatment of heart, lung, and blood diseases, and enhance the health of all individuals so that they can live longer and more fulfilling lives.

Organized by
BTEP
Description

Morning Session

(10 – 10:25 AM) Resources for Bioinformatics Data Analysis at NIH (Amy Stonelake, BTEP)

10:30 – 10:55 AM) Supercharge your Data Analysis with Biowulf (Antonio Ulloa, CIT)

(11 AM – 11:25 AM) From Data to Discovery: Highlights from the Computational Genomics and Bioinformatics Branch (CGBB/CBIIT/NCI) (Daoud Meerzaman, CBIIT)

(11:30 – 12 noon) Gen AI Community of Practice Tools and Training (Read More

Morning Session

(10 – 10:25 AM) Resources for Bioinformatics Data Analysis at NIH (Amy Stonelake, BTEP)

10:30 – 10:55 AM) Supercharge your Data Analysis with Biowulf (Antonio Ulloa, CIT)

(11 AM – 11:25 AM) From Data to Discovery: Highlights from the Computational Genomics and Bioinformatics Branch (CGBB/CBIIT/NCI) (Daoud Meerzaman, CBIIT)

(11:30 – 12 noon) Gen AI Community of Practice Tools and Training (Nick Weber, CIT)

12 noon – 1 PM) Bioinformatics Q & A and Lunch Break

Afternoon Session

(1 – 1:30 PM) GeneAgent: An LLM Powered Tool for Gene Set Analysis (Zhiyong Lu, NCBI/NLM)

(1:30 – 2 PM) AI Clinical Pathology Image Analysis in Prostate Cancer (Baris Turkbey, NIH CC)

(2 -2:30 PM) SCassist: An AI-Powered Workflow Assistant for Single-Cell Analysis (Vijay Nagarajan, NEI)

(2:30 – 3 PM) Single Cell and Spatial Transcriptomics SIG (Stefan Cordes, NHLBI)

(3 – 3:30 PM) NIH Artificial Intelligence Interest Group (Ryan O’Neill, NHLBI)

No registration, in-person only, will be recorded (no hybrid option). 

Organized by
NCI
Description

Federated data analysis allows for collaboration and analysis across institutions without physically moving individual-level data to a central location, thus protecting sensitive data and maintaining data security. Join the NCI Cohort Consortium for an insightful webinar on federated data analysis methods and tools, featuring advances in Privacy-Preserving Data Analysis (PDA) and practical applications of Stata for registry data.

The session will include an overview of the

Federated data analysis allows for collaboration and analysis across institutions without physically moving individual-level data to a central location, thus protecting sensitive data and maintaining data security. Join the NCI Cohort Consortium for an insightful webinar on federated data analysis methods and tools, featuring advances in Privacy-Preserving Data Analysis (PDA) and practical applications of Stata for registry data.

The session will include an overview of the PDA toolbox, real-world use cases, and guidance on when to use federated vs. metadata-based approaches. The session will also showcase how Stata can be leveraged to manage, harmonize, and analyze large-scale registry datasets, with practical examples and best practices in the context of epidemiologic research.

Organized by
NIH Library
Description

This one-hour online training will provide a high-level overview of Python coding concepts, as well as some of the integrative development environments (IDEs, such as Jupyter notebooks) used for Python coding. Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. The training will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. This overview Read More

This one-hour online training will provide a high-level overview of Python coding concepts, as well as some of the integrative development environments (IDEs, such as Jupyter notebooks) used for Python coding. Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. The training will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. This overview training will demonstrate how these skills can boost productivity, rigor, and transparency in reporting research findings.  

By the end of the training, attendees will be able to: 

  • Recognize four freely available IDEs for python coding 
  • Identify fundamental components of python code 
  • Understand how and why notebooks support rigor and transparency in analysis 

Attendees are not expected to have any prior knowledge of python coding or the IDEs to be successful in this training.  

If you choose to follow along with Google Colab or Jupyter Notebooks, these IDEs should be installed and ready to go. Code will be provided during the training for this option.

Single Cell Seminar Series

Organized by
BTEP
Description
Over the past decade, the field of computational cell biology has undergone a transformation — from cataloging cell types to modeling how cells behave, interact, and respond to perturbations. In this talk, Dr. Theis will review and explore how machine learning is enabling this shift, focusing on two converging frontiers: integrated cellular mapping and actionable generative models.   He'll begin with a brief overview of recent advances in representation learning for atlas-scale integration, highlighting work Read More
Over the past decade, the field of computational cell biology has undergone a transformation — from cataloging cell types to modeling how cells behave, interact, and respond to perturbations. In this talk, Dr. Theis will review and explore how machine learning is enabling this shift, focusing on two converging frontiers: integrated cellular mapping and actionable generative models.   He'll begin with a brief overview of recent advances in representation learning for atlas-scale integration, highlighting work across the Human Cell Atlas and beyond. These efforts aim to unify diverse single-cell and spatial modalities into shared manifolds of cellular identity and state. As one example, he will present our recent multimodal atlas of human brain organoids, which integrates transcriptomic variation across development and lab protocols.   From there, he'll review the emerging landscape of foundation models in single-cell genomics, including their work on Nicheformer, a transformer trained on millions of spatial and dissociated cells. These models offer generalizable embeddings for a range of tasks—but more importantly, they set the stage for predictive modeling of biological responses.   He'll close by introducing perturbation models leveraging generative AI to model interventions on these systems. As example he will show Cellflow, a generative framework that learns how perturbations such as drugs, cytokines or gene edits — shift cellular phenotypes. It enables virtual experimental design, including in silico protocol screening for brain organoid differentiation. This exemplifies a move toward models that not only interpret biological systems but help shape them.
Organized by
NIH Library
Description

The "Data Visualization in R" series focuses on using ggplot2 and the broader tidyverse ecosystem to create visualizations. Attendees will progress from foundational plotting techniques to advanced customization, learning to create multi-faceted displays and apply professional styling. The series emphasizes ggplot's flexibility and power within a tidy data workflow. By the end of the series, attendees will have a solid foundation in building effective visualizations using the tidyverse ecosystem.

This hour and Read More

The "Data Visualization in R" series focuses on using ggplot2 and the broader tidyverse ecosystem to create visualizations. Attendees will progress from foundational plotting techniques to advanced customization, learning to create multi-faceted displays and apply professional styling. The series emphasizes ggplot's flexibility and power within a tidy data workflow. By the end of the series, attendees will have a solid foundation in building effective visualizations using the tidyverse ecosystem.

This hour and half online training will explore the topics of perception and cognition, and how these apply to data visualization. This training will also teach you how to visualize your data using ggplot2. We will start by creating a simple scatterplot and use that to introduce aesthetic mappings and geometric objects, the fundamental building blocks of ggplot2. You must have taken Introduction to R and RStudio training to be successful in this training. 

By the end of this training, participants should be able to: 

  • Distinguish between aesthetic mappings and geometric objects, the fundamental building blocks of ggplot.
  • Create a simple scatterplot.
  • Create a plot and save it in a high-resolution format.
  • Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:

    Organized by
    NIH Library
    Description

    This one-hour online training covers various aspects of sharing code using MATLAB community tools like File Exchange and GitHub. Well-documented methods and workflows enable reproducible research by helping scientists follow each other’s experimental logic and interpret results.

    By the end of this training, attendees will be able to: 

    • Share code with collaborators and the scientific community
    • Create notebook-style Live Scripts using MATLAB Live Read More

    This one-hour online training covers various aspects of sharing code using MATLAB community tools like File Exchange and GitHub. Well-documented methods and workflows enable reproducible research by helping scientists follow each other’s experimental logic and interpret results.

    By the end of this training, attendees will be able to: 

    • Share code with collaborators and the scientific community
    • Create notebook-style Live Scripts using MATLAB Live Editor 
    • Leverage MATLAB Community Resources to make code, projects, and toolboxes available 
    • Learn how to access MATLAB through the browser and share licenses with collaborators 

    This is an introductory-level training taught by MathWorks. No installation of MATLAB is necessary.

    Organized by
    NCI Office of Data Sharing
    Description

    Please use this link to access overview, registration, and other information:

    https://events.cancer.gov/nci/ods-data-jamboree

    Childhood cancer is a rare disease with ~15,000 cases diagnosed annually in the United States in individuals younger than 20 years. Despite extensive efforts made over the last two decade by programs such as National Institutes of Health (NIH)'s Gabriela Miller Kids First Programand&Read More

    Please use this link to access overview, registration, and other information:

    https://events.cancer.gov/nci/ods-data-jamboree

    Childhood cancer is a rare disease with ~15,000 cases diagnosed annually in the United States in individuals younger than 20 years. Despite extensive efforts made over the last two decade by programs such as National Institutes of Health (NIH)'s Gabriela Miller Kids First Programand NCI's Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and Childhood Cancer Data Initiative (CCDI) to generate, collect and share the data, pediatric and AYA cancer datasets remain underutilized. Finding and accessing datasets, building specific pediatric cancer cohorts, and aggregating or linking datasets from various data systems still present tremendous challenges for the wider community. To overcome these barriers and raise awareness of existing childhood cancer data resources to inform better diagnosis and treatment options for children, this data jamboree is to bring together researchers and citizen scientists with diverse expertise and experience to collaborate and explore scientific or other questions using childhood cancer data. The goals of the jamboree include:

    • Promoting access and reuse of pediatric cancer data and raising awareness about the availability of these datasets.
    • Promoting interdisciplinary collaborations to expand the size, technical, and scientific diversity of the pediatric cancer research community.
    • Promoting development of new methods and tools for data analysis.
    • Identifying gaps and limitations of existing data and resources including barriers to real time access to the data.
     
    Organized by
    NIH Library
    Description

    The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem. 

    This one hour and Read More

    The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem. 

    This one hour and half online training builds on the topics covered in the Data Visualization in ggplot training. This training emphasizes advanced customization techniques in ggplot, to create effective and clear visualizations. Participants will build on the foundational skills learned in Part 1 of the series and apply various customization options, such as faceting, labeling, themes, and color scales.  You must have taken Data Visualization in R: Introduction to ggplot: Part 1 of 2 training to be successful in this training.  

    By the end of this training, attendees should be able to:  

    • Create a scatterplot in ggplot 
    • Learn how to facet a plot 
    • Demonstrate options for customizing the title and axis 
    • Apply different ggplot themes 

    Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:

    • Installed R and RStudio.
    • Have a basic understanding of R and RStudio.
    • Reviewed our R basics training on the NIH Data Services: On Demand Content YouTube Playlist, if you are new to R.

    October

    No scheduled events