Supported by CCR Office of Science and Technology Resources (OSTR)
ncibtep@nih.gov

Bioinformatics Training and Education Program

Featured

Upcoming Classes & Events

September

Organized by
NCI
Description

Federated data analysis allows for collaboration and analysis across institutions without physically moving individual-level data to a central location, thus protecting sensitive data and maintaining data security. Join the NCI Cohort Consortium for an insightful webinar on federated data analysis methods and tools, featuring advances in Privacy-Preserving Data Analysis (PDA) and practical applications of Stata for registry data.

The session will include an overview of the

Federated data analysis allows for collaboration and analysis across institutions without physically moving individual-level data to a central location, thus protecting sensitive data and maintaining data security. Join the NCI Cohort Consortium for an insightful webinar on federated data analysis methods and tools, featuring advances in Privacy-Preserving Data Analysis (PDA) and practical applications of Stata for registry data.

The session will include an overview of the PDA toolbox, real-world use cases, and guidance on when to use federated vs. metadata-based approaches. The session will also showcase how Stata can be leveraged to manage, harmonize, and analyze large-scale registry datasets, with practical examples and best practices in the context of epidemiologic research.

Organized by
NIH Library
Description

This one-hour online training will provide a high-level overview of Python coding concepts, as well as some of the integrative development environments (IDEs, such as Jupyter notebooks) used for Python coding. Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. The training will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. This overview Read More

This one-hour online training will provide a high-level overview of Python coding concepts, as well as some of the integrative development environments (IDEs, such as Jupyter notebooks) used for Python coding. Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. The training will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. This overview training will demonstrate how these skills can boost productivity, rigor, and transparency in reporting research findings.  

By the end of the training, attendees will be able to: 

  • Recognize four freely available IDEs for python coding 
  • Identify fundamental components of python code 
  • Understand how and why notebooks support rigor and transparency in analysis 

Attendees are not expected to have any prior knowledge of python coding or the IDEs to be successful in this training.  

If you choose to follow along with Google Colab or Jupyter Notebooks, these IDEs should be installed and ready to go. Code will be provided during the training for this option.

Join Meeting
Organized by
CDSL
Description

The rapid advancement of large language models is enabling a new generation of domain-specific agents capable of reasoning, retrieving, and acting in complex biomedical contexts. In this talk, Dr. Lu present two representative systems: GeneAgent (Nature Methods 2025), an AI agent for self-verified gene set analysis, and TrialGPT (Nature Communications, 2024), a new LLM-powered tool for accelerating patient-to-trial matching. Through real-world use cases, he will discuss the design principles behind these biomedical AI tools, and how Read More

The rapid advancement of large language models is enabling a new generation of domain-specific agents capable of reasoning, retrieving, and acting in complex biomedical contexts. In this talk, Dr. Lu present two representative systems: GeneAgent (Nature Methods 2025), an AI agent for self-verified gene set analysis, and TrialGPT (Nature Communications, 2024), a new LLM-powered tool for accelerating patient-to-trial matching. Through real-world use cases, he will discuss the design principles behind these biomedical AI tools, and how such AI agents can support biomedical discovery and clinical practice, as well as challenges and limitations in medical agentic AI research.

Join Meeting
Organized by
HPC Biowulf
Description

Examples of combining visualization and computation in brain imaging. 

Examples of combining visualization and computation in brain imaging. 

Coding Club Seminar Series

Organized by
BTEP
Description

In this BTEP Coding Club, participants will see how Pandas, a data wrangling package for Python enables extraction of insights from and telling of a cogent story with data. Topics to be discussed include importing tabular data, subsetting, sorting, performing mathematical operations, and creating visualizations, which are steps involved in drawing and conveying conclusions from data. After attending, participants will be able to apply skills learned to their own research. This class is a Read More

In this BTEP Coding Club, participants will see how Pandas, a data wrangling package for Python enables extraction of insights from and telling of a cogent story with data. Topics to be discussed include importing tabular data, subsetting, sorting, performing mathematical operations, and creating visualizations, which are steps involved in drawing and conveying conclusions from data. After attending, participants will be able to apply skills learned to their own research. This class is a demonstration and not hands-on. Experience is not needed for participation and attendance is restricted to NIH staff. Meeting link will be provided upon approval of registration.

Organized by
NIH Library
Description

This hour and a half online training covers how to analyze and model data using interactive tools in MATLAB. Through live demonstrations and examples, attendees will learn to solve many steps in a data analysis workflow without writing any code. The interactive tools can generate the MATLAB code needed to reproduce the work programmatically. 

By the end of this training, attendees will be able to:

  • Use interactive tools for Read More

This hour and a half online training covers how to analyze and model data using interactive tools in MATLAB. Through live demonstrations and examples, attendees will learn to solve many steps in a data analysis workflow without writing any code. The interactive tools can generate the MATLAB code needed to reproduce the work programmatically. 

By the end of this training, attendees will be able to:

  • Use interactive tools for data visualization, cleaning, and modeling
  • Automatically generate code to replicate interactive work
  • Capture work in easy-to-write scripts and functions
  • Share results by automatically creating reports

This training taught by MathWorks. Attendees are not expected to have any prior knowledge of MATLAB, but experienced users will also benefit from new tools, tips, and tricks from the latest releases. This training is an introductory level; no software installation required.

Organized by
HPC Biowulf
Description

Next edition of the NIH HPC Virtual Walk-in Consults!

All Biowulf users, and all those interested in using the systems, are invited to call in to our Virtual Walk-in Consult to discuss problems and concerns, from scripting problems to node allocation, to strategies for a particular project, to anything that is affecting your use of the HPC systems. Users will be assigned to a breakout-session with a member of the HPC staff Read More

Next edition of the NIH HPC Virtual Walk-in Consults!

All Biowulf users, and all those interested in using the systems, are invited to call in to our Virtual Walk-in Consult to discuss problems and concerns, from scripting problems to node allocation, to strategies for a particular project, to anything that is affecting your use of the HPC systems. Users will be assigned to a breakout-session with a member of the HPC staff to discuss the problem 1-on-1.  We'll try to address simpler issues on the spot and follow up on more complex questions after the session.

Please email staff@hpc.nih.gov for the meeting link.

Organized by
NIH Library
Description

This one-hour online training, provided by SAS, will demonstrate the basics of the Structured Query Language (SQL) procedure in SAS.  

By the end of this training, attendees will be able to:    

  • Discuss the basics of SQL procedure in SAS, including syntax and joins 

  • This one-hour online training, provided by SAS, will demonstrate the basics of the Structured Query Language (SQL) procedure in SAS.  

    By the end of this training, attendees will be able to:    

    • Discuss the basics of SQL procedure in SAS, including syntax and joins 

    • Compare SQL procedure in SAS with SAS Data step 

    Attendees are expected to have some working experience with SAS 9.4 or to have attended an introductory SAS class, such as SAS® Programming 1: Essentials.   

Distinguished Speakers Seminar Series

Organized by
BTEP
Description

The ability to measure gene expression levels for individual cells (vs. pools of cells) and with spatial resolution is crucial to address many important biological and medical questions, such as the study of stem cell differentiation, the discovery of cellular subtypes in the brain, and cancer diagnosis and treatment. Single-cell transcriptome sequencing (RNA-Seq) allows the high-throughput measurement of gene expression levels for entire genomes at the resolution of single cells. Spatially-resolved Read More

The ability to measure gene expression levels for individual cells (vs. pools of cells) and with spatial resolution is crucial to address many important biological and medical questions, such as the study of stem cell differentiation, the discovery of cellular subtypes in the brain, and cancer diagnosis and treatment. Single-cell transcriptome sequencing (RNA-Seq) allows the high-throughput measurement of gene expression levels for entire genomes at the resolution of single cells. Spatially-resolved transcriptomics further allows the measurement of gene expression levels along with the location of the RNA molecules within a tissue. Transcriptomics exemplifies the range of issues one encounters in a data science workflow, where the data are complex in a variety of ways, questions are not always clearly formulated, there are multiple analysis steps, and drawing on rigorous statistical principles and methods is essential to derive meaningful and reliable biological results. 

In this talk, Dr. Dudoit will provide a survey of statistical questions related to the analysis of single-cell transcriptome sequencing data to investigate the differentiation of stem cells in the brain, including, exploratory data analysis, expression quantitation, cluster analysis, and the inference of cellular lineages. She will also address differential expression analysis in spatial transcriptomics.

Single Cell Seminar Series

Organized by
BTEP
Description
Over the past decade, the field of computational cell biology has undergone a transformation — from cataloging cell types to modeling how cells behave, interact, and respond to perturbations. In this talk, Dr. Theis will review and explore how machine learning is enabling this shift, focusing on two converging frontiers: integrated cellular mapping and actionable generative models.   He'll begin with a brief overview of recent advances in representation learning for atlas-scale integration, highlighting work Read More
Over the past decade, the field of computational cell biology has undergone a transformation — from cataloging cell types to modeling how cells behave, interact, and respond to perturbations. In this talk, Dr. Theis will review and explore how machine learning is enabling this shift, focusing on two converging frontiers: integrated cellular mapping and actionable generative models.   He'll begin with a brief overview of recent advances in representation learning for atlas-scale integration, highlighting work across the Human Cell Atlas and beyond. These efforts aim to unify diverse single-cell and spatial modalities into shared manifolds of cellular identity and state. As one example, he will present our recent multimodal atlas of human brain organoids, which integrates transcriptomic variation across development and lab protocols.   From there, he'll review the emerging landscape of foundation models in single-cell genomics, including their work on Nicheformer, a transformer trained on millions of spatial and dissociated cells. These models offer generalizable embeddings for a range of tasks—but more importantly, they set the stage for predictive modeling of biological responses.   He'll close by introducing perturbation models leveraging generative AI to model interventions on these systems. As example he will show Cellflow, a generative framework that learns how perturbations such as drugs, cytokines or gene edits — shift cellular phenotypes. It enables virtual experimental design, including in silico protocol screening for brain organoid differentiation. This exemplifies a move toward models that not only interpret biological systems but help shape them.
Join Meeting
Organized by
AI Club
Description

This is the second part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.

This is the second part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.

Organized by
NIH Library
Description

The "Data Visualization in R" series focuses on using ggplot2 and the broader tidyverse ecosystem to create visualizations. Attendees will progress from foundational plotting techniques to advanced customization, learning to create multi-faceted displays and apply professional styling. The series emphasizes ggplot's flexibility and power within a tidy data workflow. By the end of the series, attendees will have a solid foundation in building effective visualizations using the tidyverse ecosystem.

This hour and Read More

The "Data Visualization in R" series focuses on using ggplot2 and the broader tidyverse ecosystem to create visualizations. Attendees will progress from foundational plotting techniques to advanced customization, learning to create multi-faceted displays and apply professional styling. The series emphasizes ggplot's flexibility and power within a tidy data workflow. By the end of the series, attendees will have a solid foundation in building effective visualizations using the tidyverse ecosystem.

This hour and half online training will explore the topics of perception and cognition, and how these apply to data visualization. This training will also teach you how to visualize your data using ggplot2. We will start by creating a simple scatterplot and use that to introduce aesthetic mappings and geometric objects, the fundamental building blocks of ggplot2. You must have taken Introduction to R and RStudio training to be successful in this training. 

By the end of this training, participants should be able to: 

  • Distinguish between aesthetic mappings and geometric objects, the fundamental building blocks of ggplot.
  • Create a simple scatterplot.
  • Create a plot and save it in a high-resolution format.
  • Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:

    Organized by
    NIH Library
    Description

    This one-hour online training introduces applying data science and artificial intelligence (AI) techniques to signals and time-series datasets using MATLAB. The training will cover the entire AI pipeline, from signal exploration to deployment. Participants will explore the fundamentals of processing, analyzing, and visualizing signal data, as well as implementing machine learning and AI algorithms tailored for time-series datasets. This training is designed for researchers, engineers, and data scientists who Read More

    This one-hour online training introduces applying data science and artificial intelligence (AI) techniques to signals and time-series datasets using MATLAB. The training will cover the entire AI pipeline, from signal exploration to deployment. Participants will explore the fundamentals of processing, analyzing, and visualizing signal data, as well as implementing machine learning and AI algorithms tailored for time-series datasets. This training is designed for researchers, engineers, and data scientists who work with signals or temporal data and seek to enhance their analytical capabilities through MATLAB's data science and AI functionalities. 

    By the end of this training, attendees will be able to: 

    • Understand the unique challenges and opportunities in analyzing signals and time-series data. 

    • Import, preprocess, and visualize signal and time-series datasets in MATLAB. 

    • Apply machine learning techniques, including supervised and unsupervised algorithms, to create predictive models for time-series data. 

    • Explore deep learning approaches, such as recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, for advanced time-series analysis. 

    • Deploy trained AI models and automate workflows to integrate insights into research or operational pipelines. 

    • Utilize MATLAB’s documentation, online resources, and toolboxes to extend their data science and AI capabilities. 

    Attendees are expected to be familiar with the basic functions of the MATLAB to be successful in this training. 

    Join Meeting
    Organized by
    Cancer Diagnosis Program
    Description

    Dr. Church's clinical and research work focus on bringing molecular testing to the clinical care of children with cancer. Through institutional projects (the Profile study, GAIN consortium study) she has profiled thousands of children's tumors and has used these results to make real-time impacts on their diagnoses and treatments. She is involved in national initiatives to improve the quality and access to molecular testing for children with cancer, including the NCI-funded Count Me In Read More

    Dr. Church's clinical and research work focus on bringing molecular testing to the clinical care of children with cancer. Through institutional projects (the Profile study, GAIN consortium study) she has profiled thousands of children's tumors and has used these results to make real-time impacts on their diagnoses and treatments. She is involved in national initiatives to improve the quality and access to molecular testing for children with cancer, including the NCI-funded Count Me In Study (Dana Farber, Broad Institute), the National Comprehensive Cancer Network, NIH, and the Children's Oncology Group.

    Organized by
    NCI Cancer AI Conversations Series
    Description

    There are many challenges associated with moving cancer AI originally developed in a research setting into a clinical setting, including into clinical trials. During this event, participants will discuss the integration of AI in the clinic and in clinical trials for oncology.

    There are many challenges associated with moving cancer AI originally developed in a research setting into a clinical setting, including into clinical trials. During this event, participants will discuss the integration of AI in the clinic and in clinical trials for oncology.

    Organized by
    NIH Library
    Description

    This one-hour online training covers various aspects of sharing code using MATLAB community tools like File Exchange and GitHub. Well-documented methods and workflows enable reproducible research by helping scientists follow each other’s experimental logic and interpret results.

    By the end of this training, attendees will be able to: 

    • Share code with collaborators and the scientific community
    • Create notebook-style Live Scripts using MATLAB Live Read More

    This one-hour online training covers various aspects of sharing code using MATLAB community tools like File Exchange and GitHub. Well-documented methods and workflows enable reproducible research by helping scientists follow each other’s experimental logic and interpret results.

    By the end of this training, attendees will be able to: 

    • Share code with collaborators and the scientific community
    • Create notebook-style Live Scripts using MATLAB Live Editor 
    • Leverage MATLAB Community Resources to make code, projects, and toolboxes available 
    • Learn how to access MATLAB through the browser and share licenses with collaborators 

    This is an introductory-level training taught by MathWorks. No installation of MATLAB is necessary.

    Join Meeting
    Organized by
    CIT
    Description

    Don’t miss this brand-new double-feature of two essential AI courses in one! We will start with AI Done Right: Ethics and Privacy to set the stage for understanding the AI landscape and the rules of the road at NIH and then start building your skills with Prompt Like a Pro: Getting the Most from AI with Effective Prompt Engineering. What’s the secret to great AI results? Great prompts. This hands-on Read More

    Don’t miss this brand-new double-feature of two essential AI courses in one! We will start with AI Done Right: Ethics and Privacy to set the stage for understanding the AI landscape and the rules of the road at NIH and then start building your skills with Prompt Like a Pro: Getting the Most from AI with Effective Prompt Engineering. What’s the secret to great AI results? Great prompts. This hands-on class teaches you how to craft clear, specific, and effective instructions for Copilot and other AI tools. Practice real-world examples and get a toolkit of reusable prompt templates you can start using right away.

    Organized by
    NCI Office of Data Sharing
    Description

    Please use this link to access overview, registration, and other information:

    https://events.cancer.gov/nci/ods-data-jamboree

    Childhood cancer is a rare disease with ~15,000 cases diagnosed annually in the United States in individuals younger than 20 years. Despite extensive efforts made over the last two decade by programs such as National Institutes of Health (NIH)'s Gabriela Miller Kids First Programand&Read More

    Please use this link to access overview, registration, and other information:

    https://events.cancer.gov/nci/ods-data-jamboree

    Childhood cancer is a rare disease with ~15,000 cases diagnosed annually in the United States in individuals younger than 20 years. Despite extensive efforts made over the last two decade by programs such as National Institutes of Health (NIH)'s Gabriela Miller Kids First Programand NCI's Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and Childhood Cancer Data Initiative (CCDI) to generate, collect and share the data, pediatric and AYA cancer datasets remain underutilized. Finding and accessing datasets, building specific pediatric cancer cohorts, and aggregating or linking datasets from various data systems still present tremendous challenges for the wider community. To overcome these barriers and raise awareness of existing childhood cancer data resources to inform better diagnosis and treatment options for children, this data jamboree is to bring together researchers and citizen scientists with diverse expertise and experience to collaborate and explore scientific or other questions using childhood cancer data. The goals of the jamboree include:

    • Promoting access and reuse of pediatric cancer data and raising awareness about the availability of these datasets.
    • Promoting interdisciplinary collaborations to expand the size, technical, and scientific diversity of the pediatric cancer research community.
    • Promoting development of new methods and tools for data analysis.
    • Identifying gaps and limitations of existing data and resources including barriers to real time access to the data.
     
    Join Meeting
    Organized by
    AI Club
    Description

    This is the third part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.

    This is the third part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.

    Organized by
    NIH Library
    Description

    The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem. 

    This one hour and Read More

    The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem. 

    This one hour and half online training builds on the topics covered in the Data Visualization in ggplot training. This training emphasizes advanced customization techniques in ggplot, to create effective and clear visualizations. Participants will build on the foundational skills learned in Part 1 of the series and apply various customization options, such as faceting, labeling, themes, and color scales.  You must have taken Data Visualization in R: Introduction to ggplot: Part 1 of 2 training to be successful in this training.  

    By the end of this training, attendees should be able to:  

    • Create a scatterplot in ggplot 
    • Learn how to facet a plot 
    • Demonstrate options for customizing the title and axis 
    • Apply different ggplot themes 

    Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:

    • Installed R and RStudio.
    • Have a basic understanding of R and RStudio.
    • Reviewed our R basics training on the NIH Data Services: On Demand Content YouTube Playlist, if you are new to R.

    October

    Organized by
    NIH Library
    Description

    This one-hour online training, provided by SAS, will review multiple ways to combine SAS data sets. 

    By the end of this training, attendees will be able to:    

    • Utilize Concatenation on SAS data sets (SET Statement, PROC SQL, PROC APPEND) 

    • Use Interleaving on SAS data sets (SET Read More

    This one-hour online training, provided by SAS, will review multiple ways to combine SAS data sets. 

    By the end of this training, attendees will be able to:    

    • Utilize Concatenation on SAS data sets (SET Statement, PROC SQL, PROC APPEND) 

    • Use Interleaving on SAS data sets (SET Statement with BY Statement) 

    • Merge SAS data sets (MERGE Statement, PROC SQL, etc.) 

    • Update SAS data sets (UPDATE, MODIFY Statements, etc.) 

    Attendees are expected to have some working experience with SAS 9.4 or to have attended an introductory SAS class, such as SAS® Programming 1: Essentials.   

    Join Meeting
    Organized by
    AI Club
    Description

    This is the fourth and final part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.

    This is the fourth and final part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.

    Organized by
    NIH Library
    Description

    This one-hour online training equips participants with powerful data wrangling techniques using R and the tidyverse ecosystem. The tidyverse is a cohesive ecosystem of R packages designed to make data science workflows more intuitive and efficient through consistent syntax and design principles. Designed for both beginners and those looking to refine their skills, this training tackles the challenges of messy datasets.  

    By Read More

    This one-hour online training equips participants with powerful data wrangling techniques using R and the tidyverse ecosystem. The tidyverse is a cohesive ecosystem of R packages designed to make data science workflows more intuitive and efficient through consistent syntax and design principles. Designed for both beginners and those looking to refine their skills, this training tackles the challenges of messy datasets.  

    By the end of this training, attendees  will be able to: 

    • Demonstrate how to clean messy clinical data using R 

    • Implement methods for standardizing text, dates, and numerical values 

    • Discuss the different ways to automate data transformations and aggregations using tidyverse functions 

    • Transform and organize data using the dplyr and tidyr packages 

    • Reshape data, handle missing values, create calculated fields, and prepare clean datadsets ready for visualization and analysis 

    Requirements 

    Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:

    • Installed R and RStudio.
    • Have a basic understanding of R and RStudio.
    • Reviewed our R basics training on the NIH Data Services: On Demand Content YouTube Playlist, if you are new to R.
    Organized by
    NIH Library
    Description

    This one-hour online training introduces attendees to modeling and simulation of biological systems using MATLAB’s SimBiology and BioPipeline Designer toolboxes. SimBiology is a versatile toolbox for modeling, simulating, and analyzing dynamic biological systems such as metabolic pathways, signaling cascades, and pharmacokinetics/pharmacodynamics (PK/PD) models. BioPipeline Designer complements this by streamlining workflows for integrating biological data and automating computational analyses. 

    By Read More

    This one-hour online training introduces attendees to modeling and simulation of biological systems using MATLAB’s SimBiology and BioPipeline Designer toolboxes. SimBiology is a versatile toolbox for modeling, simulating, and analyzing dynamic biological systems such as metabolic pathways, signaling cascades, and pharmacokinetics/pharmacodynamics (PK/PD) models. BioPipeline Designer complements this by streamlining workflows for integrating biological data and automating computational analyses. 

    By the end of this training, attendees will be able to: 

    • Describe the capabilities and applications of SimBiology and BioPipeline Designer for modeling and analyzing biological systems. 

    • Construct and parameterize basic models of biological processes using SimBiology’s graphical and programmatic interfaces. 

    • Simulate dynamic behaviors of biological systems, such as time-course analyses, and interpret simulation results. 

    • Automate and streamline data integration workflows using BioPipeline Designer to enhance reproducibility and efficiency. 

    • Access and utilize resources for further learning, including tutorials, user guides, and MATLAB community forums 

    Attendees are expected to be familiar with the basic functions of the MATLAB to be successful in this training. 

    Coding Club Seminar Series

    Organized by
    BTEP
    Description
    Scikit-learn is a free and open-source Python library for machine learning. It is built on top of other fundamental Python libraries like NumPy, SciPy, and Matplotlib. Users will be introduced to scikit-learn and its usage, followed by the basic Machine Line pipeline and a simple Classification example using scikit-learn on a publicly available Diabetes dataset.
    Scikit-learn is a free and open-source Python library for machine learning. It is built on top of other fundamental Python libraries like NumPy, SciPy, and Matplotlib. Users will be introduced to scikit-learn and its usage, followed by the basic Machine Line pipeline and a simple Classification example using scikit-learn on a publicly available Diabetes dataset.
    Distinguished Speakers Seminar Series

    Organized by
    BTEP
    Description

    In this talk, Dr. Carey will describe how Bioconductor approaches new challenges in supporting open method development and reproducible
    analyses in genomic data science. He will discuss aspects of the project that bear on education in cancer epidemiology and
    computational cancer genomics, and on emerging topics in software and data engineering for scalable omics analyses.

    In this talk, Dr. Carey will describe how Bioconductor approaches new challenges in supporting open method development and reproducible
    analyses in genomic data science. He will discuss aspects of the project that bear on education in cancer epidemiology and
    computational cancer genomics, and on emerging topics in software and data engineering for scalable omics analyses.

    Organized by
    NIH Library
    Description

    This one-hour online training will cover the fundamentals, applications, and ethical considerations of Artificial Intelligence (AI). Attendees will explore key topics such as machine learning, deep learning, data handling, and real-world AI applications across various industries. The session will also delve into the ethical implications of AI and provide insights on becoming AI literate. Whether you're a seasoned professional or just starting your AI journey, this session will equip you with essential knowledge to Read More

    This one-hour online training will cover the fundamentals, applications, and ethical considerations of Artificial Intelligence (AI). Attendees will explore key topics such as machine learning, deep learning, data handling, and real-world AI applications across various industries. The session will also delve into the ethical implications of AI and provide insights on becoming AI literate. Whether you're a seasoned professional or just starting your AI journey, this session will equip you with essential knowledge to navigate the AI landscape effectively and make informed decisions in our data-driven world.

    By the end of this training, attendees will be able to: 

    • Understand the core concepts of AI 
    • Recognize the significance of ethical considerations in AI 
    • Begin the journey toward AI literacy

    Attendees are not expected to have any prior knowledge of AI to be successful in this training. 

    Organized by
    BTEP
    Description

    Qiagen CLC Genomics Workbench is a point-and-click software that runs on a personal computer and enables bulk RNA sequencing, ChIP sequencing, long reads, and variant analysis that is available to NCI scientists. Submit a ticket with https://service.cancer.gov/ncisp to get it installed on personal computer. This Qiagen scientist led training will show participants how analyze bulk RNA sequencing data starting from FASTQ files and ending with differential expression analysis as well Read More

    Qiagen CLC Genomics Workbench is a point-and-click software that runs on a personal computer and enables bulk RNA sequencing, ChIP sequencing, long reads, and variant analysis that is available to NCI scientists. Submit a ticket with https://service.cancer.gov/ncisp to get it installed on personal computer. This Qiagen scientist led training will show participants how analyze bulk RNA sequencing data starting from FASTQ files and ending with differential expression analysis as well as constructing of visualizations (i.e. PCA and heatmap). Experience using or installation of CLC Genomics Workbench is not required for participation. This session is a demonstration and not hands-on. Attendance is restricted to NIH staff.

    Organized by
    NIH Library
    Description

    This one-hour and thirty minute online training is part one of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation.  &Read More

    This one-hour and thirty minute online training is part one of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation.   

    By the end of part one of this training series, attendees will be able to:   

    • Understand data management best practices   

    • Become familiar with data management tools  

    • Have a solid knowledge of the resources, enabling data sharing  

    During Part 2, attendees will learn about sharing and archiving data. You must register separately for Part 2 of this training. This training is introductory, no prior knowledge required.  

    Organized by
    NIH Library
    Description

    This one-hour and fifteen minute online training is part two of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation. &Read More

    This one-hour and fifteen minute online training is part two of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation.  

    By the end of part two of this training series, attendees will be able to:   

    • Have a solid knowledge of the resources, enabling data sharing  

    • Understand how data is archived and preserved  

    Part 1 of this training covers understanding research data, how to manage research data, and how to work with data. During Part 2, attendees learn about sharing and archiving data. This training is introductory, no prior knowledge required.  

    You must register separately for Part 1 of this training.  

    Organized by
    BTEP
    Description
    • Intro to STRIDES and Cloud Lab
    • Tour the tutorial libraries: Overview of STRIDES Cloud Lab GitHub (AWS/GCP/Azure notebooks) and the NIGMS GitHub.
    • Cloud demo: Build a chatbot with grounding using a Snakemake datastore. Configure datastore, query through the chatbot, and show responses based on the indexed sources.
    • Intro to STRIDES and Cloud Lab
    • Tour the tutorial libraries: Overview of STRIDES Cloud Lab GitHub (AWS/GCP/Azure notebooks) and the NIGMS GitHub.
    • Cloud demo: Build a chatbot with grounding using a Snakemake datastore. Configure datastore, query through the chatbot, and show responses based on the indexed sources.
    Organized by
    NIH Library
    Description

    This one-hour online training provides researchers with an overview of online resources for locating research datasets, data repositories, and data publications for data sharing and re-use. Participants will learn search strategies for locating datasets through federated data search portals and generalist data repositories, including directories for locating discipline-specific and institutional data repositories. An overview of key issues to consider when re-using datasets or when locating a data repository for sharing Read More

    This one-hour online training provides researchers with an overview of online resources for locating research datasets, data repositories, and data publications for data sharing and re-use. Participants will learn search strategies for locating datasets through federated data search portals and generalist data repositories, including directories for locating discipline-specific and institutional data repositories. An overview of key issues to consider when re-using datasets or when locating a data repository for sharing and preservation purposes will be discussed. 

    By the end of this training, attendees will be able to:  

    • Locate different types of data repositories and datasets 

    • Identify issues to consider with data repositories 

    • Discuss how data repositories can improve reproducibility
    • Identify issues to consider when re-using datasets 

    • Describe guidelines and resources for citing datasets 

    Attendees are not expected to have any prior knowledge of these resources to be successful in this training. 

    Organized by
    NIH Library
    Description

    This one hour and half hour online training will equip attendees with essential knowledge and skills for effective interactions with Large Language Model (LLM) AI chatbots. Explore the intricacies of prompt engineering and its pivotal role in optimizing the conversational capabilities of LLMs. Emphasizing best practices and practical applications, this training features live demonstrations and provides valuable skills for the effective use of LLMs. 

    This one hour and half hour online training will equip attendees with essential knowledge and skills for effective interactions with Large Language Model (LLM) AI chatbots. Explore the intricacies of prompt engineering and its pivotal role in optimizing the conversational capabilities of LLMs. Emphasizing best practices and practical applications, this training features live demonstrations and provides valuable skills for the effective use of LLMs. 

    By the end of this training, attendees will be able to:  

    • Define LLMs, prompt patterns, and prompt engineering
    • Identify potential uses and issues to consider when using LLMs in the biomedical research field
    • Use a selection of prompt patterns to improve generated output from LLMs
    • Identify resources for learning more about prompt engineering in LLMs 

    Attendees are not expected to have any prior knowledge of AI chatbots to be successful in this training. 

    Organized by
    BTEP
    Description

    Qiagen Ingenuity Pathway Analysis (IPA) is a point-and-click software that enables scientists to discern how genomic, transcriptomic, proteomic, and metabolomic changes influence molecular biology pathways and networks. This software is available to NCI investigators. Submit a ticket with NCI computing help desk (https://service.cancer.gov/ncisp) to get it installed on personal computer. In this Qiagen scientist led training, participants will learn conduct path analysis from bulk RNA sequencing differential expression results using Read More

    Qiagen Ingenuity Pathway Analysis (IPA) is a point-and-click software that enables scientists to discern how genomic, transcriptomic, proteomic, and metabolomic changes influence molecular biology pathways and networks. This software is available to NCI investigators. Submit a ticket with NCI computing help desk (https://service.cancer.gov/ncisp) to get it installed on personal computer. In this Qiagen scientist led training, participants will learn conduct path analysis from bulk RNA sequencing differential expression results using this software. Experience using or installation of IPA is not required for participation. This class is a demonstration and not hands-on. Attendance is restricted to NIH staff.

    November

    Single Cell Seminar Series

    Organized by
    BTEP
    Description

    This talk delves into the innovative utilization of generative AI in propelling biomedical research forward. By harnessing single-cell sequencing data, we developed scGPT, a foundational model that extracts biological insights from an extensive dataset of over 33 million cells. Analogous to how words form text, genes define cells, effectively bridging the technological and biological realms. The strategic application of scGPT via transfer learning significantly boosts its efficacy in diverse applications such as cell-type annotation, multi-batch Read More

    This talk delves into the innovative utilization of generative AI in propelling biomedical research forward. By harnessing single-cell sequencing data, we developed scGPT, a foundational model that extracts biological insights from an extensive dataset of over 33 million cells. Analogous to how words form text, genes define cells, effectively bridging the technological and biological realms. The strategic application of scGPT via transfer learning significantly boosts its efficacy in diverse applications such as cell-type annotation, multi-batch integration, and gene network inference.

    Additionally, the talk will spotlight MedSAM, a state-of-the-art segmentation foundational model. Designed for universal application, MedSAM excels across various medical imaging tasks and modalities. It showcased unprecedented advancements in 30 segmentation tasks, outperforming existing models considerably. Notably, MedSAM possesses the unique ability for zero-shot and few-shot segmentation, enabling it to identify previously unseen tumor types and swiftly adapt to novel imaging modalities. Collectively, these breakthroughs emphasize the importance of developing versatile and efficient foundational models. These models are poised to address the expanding needs of imaging and omics data, thus driving continuous innovation in biomedical analysis.

    Organized by
    BTEP
    Description

    Qlucore Omics Explorer is a point-and-click software that enables analysis of RNA sequencing (bulk and single cell), proteomics and metabolomics data. It’s machine learning capabilities allow for cell type classification. This software is available to NCI CCR scientists. Submit a ticket at https://service.cancer.gov/ncisp to get it installed. This session covering bulk RNA sequencing introduces participants to experimental design, data import, normalization, differential expression analysis, visualizations, and biological interpretation (Read More

    Qlucore Omics Explorer is a point-and-click software that enables analysis of RNA sequencing (bulk and single cell), proteomics and metabolomics data. It’s machine learning capabilities allow for cell type classification. This software is available to NCI CCR scientists. Submit a ticket at https://service.cancer.gov/ncisp to get it installed. This session covering bulk RNA sequencing introduces participants to experimental design, data import, normalization, differential expression analysis, visualizations, and biological interpretation (i.e. GSEA, pathway visualization, biological networks, GO enrichment). Experience using or installation of this software is not required for attendance. This class is a demonstration and not hands-on. Participation is restricted to NIH staff. Meeting link will be provided upon approval of registration.

    Organized by
    BTEP
    Description

    Qlucore Omics Explorer is a point-and-click package available to NCI CCR scientists that enables visualization-based analysis of multi-omics data including RNA-seq, scRNA-seq, proteomics, metabolomics, as well as enabling machine learning classification of cell types. Submit a ticket at https://service.cancer.gov/ncisp to get it installed. In this session, participants will learn to apply regression approaches to identify correlation between bulk RNA and protein expression using this software. Experience using or installation of Read More

    Qlucore Omics Explorer is a point-and-click package available to NCI CCR scientists that enables visualization-based analysis of multi-omics data including RNA-seq, scRNA-seq, proteomics, metabolomics, as well as enabling machine learning classification of cell types. Submit a ticket at https://service.cancer.gov/ncisp to get it installed. In this session, participants will learn to apply regression approaches to identify correlation between bulk RNA and protein expression using this software. Experience using or installation of Qlucore Omics Explorer is not needed to attend. Attendance is restricted to NIH staff. Meeting link will be provided upon approval of registration.