Upcoming Classes & Events
September
Organized by
NCIDescription
Federated data analysis allows for collaboration and analysis across institutions without physically moving individual-level data to a central location, thus protecting sensitive data and maintaining data security. Join the NCI Cohort Consortium for an insightful webinar on federated data analysis methods and tools, featuring advances in Privacy-Preserving Data Analysis (PDA) and practical applications of Stata for registry data.
The session will include an overview of the
Federated data analysis allows for collaboration and analysis across institutions without physically moving individual-level data to a central location, thus protecting sensitive data and maintaining data security. Join the NCI Cohort Consortium for an insightful webinar on federated data analysis methods and tools, featuring advances in Privacy-Preserving Data Analysis (PDA) and practical applications of Stata for registry data. The session will include an overview of the PDA toolbox, real-world use cases, and guidance on when to use federated vs. metadata-based approaches. The session will also showcase how Stata can be leveraged to manage, harmonize, and analyze large-scale registry datasets, with practical examples and best practices in the context of epidemiologic research.
Organized by
NIH LibraryDescription
This one-hour online training will provide a high-level overview of Python coding concepts, as well as some of the integrative development environments (IDEs, such as Jupyter notebooks) used for Python coding. Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. The training will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. This overview Read More
This one-hour online training will provide a high-level overview of Python coding concepts, as well as some of the integrative development environments (IDEs, such as Jupyter notebooks) used for Python coding. Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. The training will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. This overview training will demonstrate how these skills can boost productivity, rigor, and transparency in reporting research findings.
By the end of the training, attendees will be able to:
- Recognize four freely available IDEs for python coding
- Identify fundamental components of python code
- Understand how and why notebooks support rigor and transparency in analysis
Attendees are not expected to have any prior knowledge of python coding or the IDEs to be successful in this training.
If you choose to follow along with Google Colab or Jupyter Notebooks, these IDEs should be installed and ready to go. Code will be provided during the training for this option.
Description
The rapid advancement of large language models is enabling a new generation of domain-specific agents capable of reasoning, retrieving, and acting in complex biomedical contexts. In this talk, Dr. Lu present two representative systems: GeneAgent (Nature Methods 2025), an AI agent for self-verified gene set analysis, and TrialGPT (Nature Communications, 2024), a new LLM-powered tool for accelerating patient-to-trial matching. Through real-world use cases, he will discuss the design principles behind these biomedical AI tools, and how Read More
The rapid advancement of large language models is enabling a new generation of domain-specific agents capable of reasoning, retrieving, and acting in complex biomedical contexts. In this talk, Dr. Lu present two representative systems: GeneAgent (Nature Methods 2025), an AI agent for self-verified gene set analysis, and TrialGPT (Nature Communications, 2024), a new LLM-powered tool for accelerating patient-to-trial matching. Through real-world use cases, he will discuss the design principles behind these biomedical AI tools, and how such AI agents can support biomedical discovery and clinical practice, as well as challenges and limitations in medical agentic AI research.
Description
Examples of combining visualization and computation in brain imaging.
Examples of combining visualization and computation in brain imaging.
Coding Club Seminar Series
Organized by
BTEPDescription
In this BTEP Coding Club, participants will see how Pandas, a data wrangling package for Python enables extraction of insights from and telling of a cogent story with data. Topics to be discussed include importing tabular data, subsetting, sorting, performing mathematical operations, and creating visualizations, which are steps involved in drawing and conveying conclusions from data. After attending, participants will be able to apply skills learned to their own research. This class is a Read More
In this BTEP Coding Club, participants will see how Pandas, a data wrangling package for Python enables extraction of insights from and telling of a cogent story with data. Topics to be discussed include importing tabular data, subsetting, sorting, performing mathematical operations, and creating visualizations, which are steps involved in drawing and conveying conclusions from data. After attending, participants will be able to apply skills learned to their own research. This class is a demonstration and not hands-on. Experience is not needed for participation and attendance is restricted to NIH staff. Meeting link will be provided upon approval of registration.
Organized by
NIH LibraryDescription
This hour and a half online training covers how to analyze and model data using interactive tools in MATLAB. Through live demonstrations and examples, attendees will learn to solve many steps in a data analysis workflow without writing any code. The interactive tools can generate the MATLAB code needed to reproduce the work programmatically.
By the end of this training, attendees will be able to:
- Use interactive tools for Read More
This hour and a half online training covers how to analyze and model data using interactive tools in MATLAB. Through live demonstrations and examples, attendees will learn to solve many steps in a data analysis workflow without writing any code. The interactive tools can generate the MATLAB code needed to reproduce the work programmatically.
By the end of this training, attendees will be able to:
- Use interactive tools for data visualization, cleaning, and modeling
- Automatically generate code to replicate interactive work
- Capture work in easy-to-write scripts and functions
- Share results by automatically creating reports
This training taught by MathWorks. Attendees are not expected to have any prior knowledge of MATLAB, but experienced users will also benefit from new tools, tips, and tricks from the latest releases. This training is an introductory level; no software installation required.
Organized by
HPC BiowulfDescription
Next edition of the NIH HPC Virtual Walk-in Consults!
All Biowulf users, and all those interested in using the systems, are invited to call in to our Virtual Walk-in Consult to discuss problems and concerns, from scripting problems to node allocation, to strategies for a particular project, to anything that is affecting your use of the HPC systems. Users will be assigned to a breakout-session with a member of the HPC staff Read More
Next edition of the NIH HPC Virtual Walk-in Consults!
All Biowulf users, and all those interested in using the systems, are invited to call in to our Virtual Walk-in Consult to discuss problems and concerns, from scripting problems to node allocation, to strategies for a particular project, to anything that is affecting your use of the HPC systems. Users will be assigned to a breakout-session with a member of the HPC staff to discuss the problem 1-on-1. We'll try to address simpler issues on the spot and follow up on more complex questions after the session.
Please email staff@hpc.nih.gov for the meeting link.
Organized by
NIH LibraryDescription
This one-hour online training, provided by SAS, will demonstrate the basics of the Structured Query Language (SQL) procedure in SAS.
By the end of this training, attendees will be able to:
-
Discuss the basics of SQL procedure in SAS, including syntax and joins
-
This one-hour online training, provided by SAS, will demonstrate the basics of the Structured Query Language (SQL) procedure in SAS.
By the end of this training, attendees will be able to:
-
Discuss the basics of SQL procedure in SAS, including syntax and joins
-
Compare SQL procedure in SAS with SAS Data step
Attendees are expected to have some working experience with SAS 9.4 or to have attended an introductory SAS class, such as SAS® Programming 1: Essentials.
-
Distinguished Speakers Seminar Series
Organized by
BTEPDescription
The ability to measure gene expression levels for individual cells (vs. pools of cells) and with spatial resolution is crucial to address many important biological and medical questions, such as the study of stem cell differentiation, the discovery of cellular subtypes in the brain, and cancer diagnosis and treatment. Single-cell transcriptome sequencing (RNA-Seq) allows the high-throughput measurement of gene expression levels for entire genomes at the resolution of single cells. Spatially-resolved Read More
The ability to measure gene expression levels for individual cells (vs. pools of cells) and with spatial resolution is crucial to address many important biological and medical questions, such as the study of stem cell differentiation, the discovery of cellular subtypes in the brain, and cancer diagnosis and treatment. Single-cell transcriptome sequencing (RNA-Seq) allows the high-throughput measurement of gene expression levels for entire genomes at the resolution of single cells. Spatially-resolved transcriptomics further allows the measurement of gene expression levels along with the location of the RNA molecules within a tissue. Transcriptomics exemplifies the range of issues one encounters in a data science workflow, where the data are complex in a variety of ways, questions are not always clearly formulated, there are multiple analysis steps, and drawing on rigorous statistical principles and methods is essential to derive meaningful and reliable biological results.
In this talk, Dr. Dudoit will provide a survey of statistical questions related to the analysis of single-cell transcriptome sequencing data to investigate the differentiation of stem cells in the brain, including, exploratory data analysis, expression quantitation, cluster analysis, and the inference of cellular lineages. She will also address differential expression analysis in spatial transcriptomics.
Single Cell Seminar Series
Organized by
BTEPDescription
Description
This is the second part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.
This is the second part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.
Organized by
NIH LibraryDescription
The "Data Visualization in R" series focuses on using ggplot2 and the broader tidyverse ecosystem to create visualizations. Attendees will progress from foundational plotting techniques to advanced customization, learning to create multi-faceted displays and apply professional styling. The series emphasizes ggplot's flexibility and power within a tidy data workflow. By the end of the series, attendees will have a solid foundation in building effective visualizations using the tidyverse ecosystem.
This hour and Read More
The "Data Visualization in R" series focuses on using ggplot2 and the broader tidyverse ecosystem to create visualizations. Attendees will progress from foundational plotting techniques to advanced customization, learning to create multi-faceted displays and apply professional styling. The series emphasizes ggplot's flexibility and power within a tidy data workflow. By the end of the series, attendees will have a solid foundation in building effective visualizations using the tidyverse ecosystem.
This hour and half online training will explore the topics of perception and cognition, and how these apply to data visualization. This training will also teach you how to visualize your data using ggplot2. We will start by creating a simple scatterplot and use that to introduce aesthetic mappings and geometric objects, the fundamental building blocks of ggplot2. You must have taken Introduction to R and RStudio training to be successful in this training.
By the end of this training, participants should be able to:
Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:
- Installed R and RStudio.
- Have a basic understanding of R and RStudio.
- Reviewed our R basics training on the NIH Data Services: On Demand Content YouTube Playlist if you are new to R, especially Introduction to RStudio Projects.
Organized by
NIH LibraryDescription
This one-hour online training introduces applying data science and artificial intelligence (AI) techniques to signals and time-series datasets using MATLAB. The training will cover the entire AI pipeline, from signal exploration to deployment. Participants will explore the fundamentals of processing, analyzing, and visualizing signal data, as well as implementing machine learning and AI algorithms tailored for time-series datasets. This training is designed for researchers, engineers, and data scientists who Read More
This one-hour online training introduces applying data science and artificial intelligence (AI) techniques to signals and time-series datasets using MATLAB. The training will cover the entire AI pipeline, from signal exploration to deployment. Participants will explore the fundamentals of processing, analyzing, and visualizing signal data, as well as implementing machine learning and AI algorithms tailored for time-series datasets. This training is designed for researchers, engineers, and data scientists who work with signals or temporal data and seek to enhance their analytical capabilities through MATLAB's data science and AI functionalities.
By the end of this training, attendees will be able to:
-
Understand the unique challenges and opportunities in analyzing signals and time-series data.
-
Import, preprocess, and visualize signal and time-series datasets in MATLAB.
-
Apply machine learning techniques, including supervised and unsupervised algorithms, to create predictive models for time-series data.
-
Explore deep learning approaches, such as recurrent neural networks (RNNs) and long short-term memory (LSTM) networks, for advanced time-series analysis.
-
Deploy trained AI models and automate workflows to integrate insights into research or operational pipelines.
-
Utilize MATLAB’s documentation, online resources, and toolboxes to extend their data science and AI capabilities.
Attendees are expected to be familiar with the basic functions of the MATLAB to be successful in this training.
Description
Dr. Church's clinical and research work focus on bringing molecular testing to the clinical care of children with cancer. Through institutional projects (the Profile study, GAIN consortium study) she has profiled thousands of children's tumors and has used these results to make real-time impacts on their diagnoses and treatments. She is involved in national initiatives to improve the quality and access to molecular testing for children with cancer, including the NCI-funded Count Me In Read More
Dr. Church's clinical and research work focus on bringing molecular testing to the clinical care of children with cancer. Through institutional projects (the Profile study, GAIN consortium study) she has profiled thousands of children's tumors and has used these results to make real-time impacts on their diagnoses and treatments. She is involved in national initiatives to improve the quality and access to molecular testing for children with cancer, including the NCI-funded Count Me In Study (Dana Farber, Broad Institute), the National Comprehensive Cancer Network, NIH, and the Children's Oncology Group.
Organized by
NCI Cancer AI Conversations SeriesDescription
There are many challenges associated with moving cancer AI originally developed in a research setting into a clinical setting, including into clinical trials. During this event, participants will discuss the integration of AI in the clinic and in clinical trials for oncology.
There are many challenges associated with moving cancer AI originally developed in a research setting into a clinical setting, including into clinical trials. During this event, participants will discuss the integration of AI in the clinic and in clinical trials for oncology.
Organized by
NIH LibraryDescription
This one-hour online training covers various aspects of sharing code using MATLAB community tools like File Exchange and GitHub. Well-documented methods and workflows enable reproducible research by helping scientists follow each other’s experimental logic and interpret results.
By the end of this training, attendees will be able to:
- Share code with collaborators and the scientific community
- Create notebook-style Live Scripts using MATLAB Live Read More
This one-hour online training covers various aspects of sharing code using MATLAB community tools like File Exchange and GitHub. Well-documented methods and workflows enable reproducible research by helping scientists follow each other’s experimental logic and interpret results.
By the end of this training, attendees will be able to:
- Share code with collaborators and the scientific community
- Create notebook-style Live Scripts using MATLAB Live Editor
- Leverage MATLAB Community Resources to make code, projects, and toolboxes available
- Learn how to access MATLAB through the browser and share licenses with collaborators
This is an introductory-level training taught by MathWorks. No installation of MATLAB is necessary.
Description
Don’t miss this brand-new double-feature of two essential AI courses in one! We will start with AI Done Right: Ethics and Privacy to set the stage for understanding the AI landscape and the rules of the road at NIH and then start building your skills with Prompt Like a Pro: Getting the Most from AI with Effective Prompt Engineering. What’s the secret to great AI results? Great prompts. This hands-on Read More
Don’t miss this brand-new double-feature of two essential AI courses in one! We will start with AI Done Right: Ethics and Privacy to set the stage for understanding the AI landscape and the rules of the road at NIH and then start building your skills with Prompt Like a Pro: Getting the Most from AI with Effective Prompt Engineering. What’s the secret to great AI results? Great prompts. This hands-on class teaches you how to craft clear, specific, and effective instructions for Copilot and other AI tools. Practice real-world examples and get a toolkit of reusable prompt templates you can start using right away.
Organized by
NCI Office of Data SharingDescription
Please use this link to access overview, registration, and other information:
https://events.cancer.gov/nci/ods-data-jamboree
Childhood cancer is a rare disease with ~15,000 cases diagnosed annually in the United States in individuals younger than 20 years. Despite extensive efforts made over the last two decade by programs such as National Institutes of Health (NIH)'s Gabriela Miller Kids First Programand&Read More
Please use this link to access overview, registration, and other information:
https://events.cancer.gov/nci/ods-data-jamboree
Childhood cancer is a rare disease with ~15,000 cases diagnosed annually in the United States in individuals younger than 20 years. Despite extensive efforts made over the last two decade by programs such as National Institutes of Health (NIH)'s Gabriela Miller Kids First Programand NCI's Therapeutically Applicable Research to Generate Effective Treatments (TARGET) and Childhood Cancer Data Initiative (CCDI) to generate, collect and share the data, pediatric and AYA cancer datasets remain underutilized. Finding and accessing datasets, building specific pediatric cancer cohorts, and aggregating or linking datasets from various data systems still present tremendous challenges for the wider community. To overcome these barriers and raise awareness of existing childhood cancer data resources to inform better diagnosis and treatment options for children, this data jamboree is to bring together researchers and citizen scientists with diverse expertise and experience to collaborate and explore scientific or other questions using childhood cancer data. The goals of the jamboree include:
- Promoting access and reuse of pediatric cancer data and raising awareness about the availability of these datasets.
- Promoting interdisciplinary collaborations to expand the size, technical, and scientific diversity of the pediatric cancer research community.
- Promoting development of new methods and tools for data analysis.
- Identifying gaps and limitations of existing data and resources including barriers to real time access to the data.
Description
This is the third part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.
This is the third part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.
Organized by
NIH LibraryDescription
The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem.
This one hour and Read More
The "Data Visualization in R" series focuses on using ggplot and the broader tidyverse ecosystem to create insightful and customizable visualizations. It covers key principles of data visualization, from basic plots to advanced techniques, emphasizing the flexibility and power of ggplot within a tidy data workflow. By the end of the series, participants will be proficient in building plots using the tidyverse ecosystem.
This one hour and half online training builds on the topics covered in the Data Visualization in ggplot training. This training emphasizes advanced customization techniques in ggplot, to create effective and clear visualizations. Participants will build on the foundational skills learned in Part 1 of the series and apply various customization options, such as faceting, labeling, themes, and color scales. You must have taken Data Visualization in R: Introduction to ggplot: Part 1 of 2 training to be successful in this training.
By the end of this training, attendees should be able to:
- Create a scatterplot in ggplot
- Learn how to facet a plot
- Demonstrate options for customizing the title and axis
- Apply different ggplot themes
Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:
October
Organized by
NIH LibraryDescription
This one-hour online training, provided by SAS, will review multiple ways to combine SAS data sets.
By the end of this training, attendees will be able to:
-
Utilize Concatenation on SAS data sets (SET Statement, PROC SQL, PROC APPEND)
-
Use Interleaving on SAS data sets (SET Read More
This one-hour online training, provided by SAS, will review multiple ways to combine SAS data sets.
By the end of this training, attendees will be able to:
-
Utilize Concatenation on SAS data sets (SET Statement, PROC SQL, PROC APPEND)
-
Use Interleaving on SAS data sets (SET Statement with BY Statement)
-
Merge SAS data sets (MERGE Statement, PROC SQL, etc.)
-
Update SAS data sets (UPDATE, MODIFY Statements, etc.)
Attendees are expected to have some working experience with SAS 9.4 or to have attended an introductory SAS class, such as SAS® Programming 1: Essentials.
Description
This is the fourth and final part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.
This is the fourth and final part of a four-part workshop series on Parallel Machine Learning Model Training, presented by AIIG co-chair Samar Samarjeet, PhD (NHLBI). Over the course of the workshop, Samar will cover topics on data and model parallelism including pipeline and looping parallelism, profiling, data sharding, using Jax and PyTorch, and specific tools like DeepSpeed and PyTorch-lightning.
Organized by
NIH LibraryDescription
This one-hour online training equips participants with powerful data wrangling techniques using R and the tidyverse ecosystem. The tidyverse is a cohesive ecosystem of R packages designed to make data science workflows more intuitive and efficient through consistent syntax and design principles. Designed for both beginners and those looking to refine their skills, this training tackles the challenges of messy datasets.
By Read More
This one-hour online training equips participants with powerful data wrangling techniques using R and the tidyverse ecosystem. The tidyverse is a cohesive ecosystem of R packages designed to make data science workflows more intuitive and efficient through consistent syntax and design principles. Designed for both beginners and those looking to refine their skills, this training tackles the challenges of messy datasets.
By the end of this training, attendees will be able to:
-
Demonstrate how to clean messy clinical data using R
-
Implement methods for standardizing text, dates, and numerical values
-
Discuss the different ways to automate data transformations and aggregations using tidyverse functions
-
Transform and organize data using the dplyr and tidyr packages
-
Reshape data, handle missing values, create calculated fields, and prepare clean datadsets ready for visualization and analysis
Requirements
Attendees are expected to have a basic understanding of R and RStudio. To proceed, attendees should have done the following:
Organized by
NIH LibraryDescription
This one-hour online training introduces attendees to modeling and simulation of biological systems using MATLAB’s SimBiology and BioPipeline Designer toolboxes. SimBiology is a versatile toolbox for modeling, simulating, and analyzing dynamic biological systems such as metabolic pathways, signaling cascades, and pharmacokinetics/pharmacodynamics (PK/PD) models. BioPipeline Designer complements this by streamlining workflows for integrating biological data and automating computational analyses.
By Read More
This one-hour online training introduces attendees to modeling and simulation of biological systems using MATLAB’s SimBiology and BioPipeline Designer toolboxes. SimBiology is a versatile toolbox for modeling, simulating, and analyzing dynamic biological systems such as metabolic pathways, signaling cascades, and pharmacokinetics/pharmacodynamics (PK/PD) models. BioPipeline Designer complements this by streamlining workflows for integrating biological data and automating computational analyses.
By the end of this training, attendees will be able to:
-
Describe the capabilities and applications of SimBiology and BioPipeline Designer for modeling and analyzing biological systems.
-
Construct and parameterize basic models of biological processes using SimBiology’s graphical and programmatic interfaces.
-
Simulate dynamic behaviors of biological systems, such as time-course analyses, and interpret simulation results.
-
Automate and streamline data integration workflows using BioPipeline Designer to enhance reproducibility and efficiency.
-
Access and utilize resources for further learning, including tutorials, user guides, and MATLAB community forums
Attendees are expected to be familiar with the basic functions of the MATLAB to be successful in this training.
Coding Club Seminar Series
Organized by
BTEPDescription
Distinguished Speakers Seminar Series
Organized by
BTEPDescription
In this talk, Dr. Carey will describe how Bioconductor approaches new challenges in supporting open method development and reproducible
analyses in genomic data science. He will discuss aspects of the project that bear on education in cancer epidemiology and
computational cancer genomics, and on emerging topics in software and data engineering for scalable omics analyses.
In this talk, Dr. Carey will describe how Bioconductor approaches new challenges in supporting open method development and reproducible
analyses in genomic data science. He will discuss aspects of the project that bear on education in cancer epidemiology and
computational cancer genomics, and on emerging topics in software and data engineering for scalable omics analyses.
Organized by
BTEPDescription
Qiagen CLC Genomics Workbench is a point-and-click software that runs on a personal computer and enables bulk RNA sequencing, ChIP sequencing, long reads, and variant analysis that is available to NCI scientists. Submit a ticket with https://service.cancer.gov/ncisp to get it installed on personal computer. This Qiagen scientist led training will show participants how analyze bulk RNA sequencing data starting from FASTQ files and ending with differential expression analysis as well Read More
Qiagen CLC Genomics Workbench is a point-and-click software that runs on a personal computer and enables bulk RNA sequencing, ChIP sequencing, long reads, and variant analysis that is available to NCI scientists. Submit a ticket with https://service.cancer.gov/ncisp to get it installed on personal computer. This Qiagen scientist led training will show participants how analyze bulk RNA sequencing data starting from FASTQ files and ending with differential expression analysis as well as constructing of visualizations (i.e. PCA and heatmap). Experience using or installation of CLC Genomics Workbench is not required for participation. This session is a demonstration and not hands-on. Attendance is restricted to NIH staff.
Organized by
NIH LibraryDescription
This one-hour and thirty minute online training is part one of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation. &Read More
This one-hour and thirty minute online training is part one of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation.
By the end of part one of this training series, attendees will be able to:
-
Understand data management best practices
-
Become familiar with data management tools
-
Have a solid knowledge of the resources, enabling data sharing
During Part 2, attendees will learn about sharing and archiving data. You must register separately for Part 2 of this training. This training is introductory, no prior knowledge required.
Organized by
NIH LibraryDescription
This one-hour and fifteen minute online training is part two of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation. &Read More
This one-hour and fifteen minute online training is part two of an introductory two-part series for those who want to learn about research data management and sharing, or for those who are interested in a refresher. The series provides detailed information on managing and sharing data from the first data planning stage, through the data life cycle, to data archiving, and finally to selecting an appropriate repository for data preservation.
By the end of part two of this training series, attendees will be able to:
-
Have a solid knowledge of the resources, enabling data sharing
-
Understand how data is archived and preserved
Part 1 of this training covers understanding research data, how to manage research data, and how to work with data. During Part 2, attendees learn about sharing and archiving data. This training is introductory, no prior knowledge required.
You must register separately for Part 1 of this training.
Organized by
BTEPDescription
- Intro to STRIDES and Cloud Lab
- Tour the tutorial libraries: Overview of STRIDES Cloud Lab GitHub (AWS/GCP/Azure notebooks) and the NIGMS GitHub.
- Cloud demo: Build a chatbot with grounding using a Snakemake datastore. Configure datastore, query through the chatbot, and show responses based on the indexed sources.
- Intro to STRIDES and Cloud Lab
- Tour the tutorial libraries: Overview of STRIDES Cloud Lab GitHub (AWS/GCP/Azure notebooks) and the NIGMS GitHub.
- Cloud demo: Build a chatbot with grounding using a Snakemake datastore. Configure datastore, query through the chatbot, and show responses based on the indexed sources.
Organized by
NIH LibraryDescription
This one-hour online training provides researchers with an overview of online resources for locating research datasets, data repositories, and data publications for data sharing and re-use. Participants will learn search strategies for locating datasets through federated data search portals and generalist data repositories, including directories for locating discipline-specific and institutional data repositories. An overview of key issues to consider when re-using datasets or when locating a data repository for sharing Read More
This one-hour online training provides researchers with an overview of online resources for locating research datasets, data repositories, and data publications for data sharing and re-use. Participants will learn search strategies for locating datasets through federated data search portals and generalist data repositories, including directories for locating discipline-specific and institutional data repositories. An overview of key issues to consider when re-using datasets or when locating a data repository for sharing and preservation purposes will be discussed.
By the end of this training, attendees will be able to:
-
Locate different types of data repositories and datasets
-
Identify issues to consider with data repositories
- Discuss how data repositories can improve reproducibility
-
Identify issues to consider when re-using datasets
-
Describe guidelines and resources for citing datasets
Attendees are not expected to have any prior knowledge of these resources to be successful in this training.
Organized by
NIH LibraryDescription
This one hour and half hour online training will equip attendees with essential knowledge and skills for effective interactions with Large Language Model (LLM) AI chatbots. Explore the intricacies of prompt engineering and its pivotal role in optimizing the conversational capabilities of LLMs. Emphasizing best practices and practical applications, this training features live demonstrations and provides valuable skills for the effective use of LLMs.
This one hour and half hour online training will equip attendees with essential knowledge and skills for effective interactions with Large Language Model (LLM) AI chatbots. Explore the intricacies of prompt engineering and its pivotal role in optimizing the conversational capabilities of LLMs. Emphasizing best practices and practical applications, this training features live demonstrations and provides valuable skills for the effective use of LLMs.
By the end of this training, attendees will be able to:
- Define LLMs, prompt patterns, and prompt engineering
- Identify potential uses and issues to consider when using LLMs in the biomedical research field
- Use a selection of prompt patterns to improve generated output from LLMs
- Identify resources for learning more about prompt engineering in LLMs
Attendees are not expected to have any prior knowledge of AI chatbots to be successful in this training.
Organized by
BTEPDescription
Qiagen Ingenuity Pathway Analysis (IPA) is a point-and-click software that enables scientists to discern how genomic, transcriptomic, proteomic, and metabolomic changes influence molecular biology pathways and networks. This software is available to NCI investigators. Submit a ticket with NCI computing help desk (https://service.cancer.gov/ncisp) to get it installed on personal computer. In this Qiagen scientist led training, participants will learn conduct path analysis from bulk RNA sequencing differential expression results using Read More
Qiagen Ingenuity Pathway Analysis (IPA) is a point-and-click software that enables scientists to discern how genomic, transcriptomic, proteomic, and metabolomic changes influence molecular biology pathways and networks. This software is available to NCI investigators. Submit a ticket with NCI computing help desk (https://service.cancer.gov/ncisp) to get it installed on personal computer. In this Qiagen scientist led training, participants will learn conduct path analysis from bulk RNA sequencing differential expression results using this software. Experience using or installation of IPA is not required for participation. This class is a demonstration and not hands-on. Attendance is restricted to NIH staff.
November
Single Cell Seminar Series
Organized by
BTEPDescription
This talk delves into the innovative utilization of generative AI in propelling biomedical research forward. By harnessing single-cell sequencing data, we developed scGPT, a foundational model that extracts biological insights from an extensive dataset of over 33 million cells. Analogous to how words form text, genes define cells, effectively bridging the technological and biological realms. The strategic application of scGPT via transfer learning significantly boosts its efficacy in diverse applications such as cell-type annotation, multi-batch Read More
This talk delves into the innovative utilization of generative AI in propelling biomedical research forward. By harnessing single-cell sequencing data, we developed scGPT, a foundational model that extracts biological insights from an extensive dataset of over 33 million cells. Analogous to how words form text, genes define cells, effectively bridging the technological and biological realms. The strategic application of scGPT via transfer learning significantly boosts its efficacy in diverse applications such as cell-type annotation, multi-batch integration, and gene network inference.
Additionally, the talk will spotlight MedSAM, a state-of-the-art segmentation foundational model. Designed for universal application, MedSAM excels across various medical imaging tasks and modalities. It showcased unprecedented advancements in 30 segmentation tasks, outperforming existing models considerably. Notably, MedSAM possesses the unique ability for zero-shot and few-shot segmentation, enabling it to identify previously unseen tumor types and swiftly adapt to novel imaging modalities. Collectively, these breakthroughs emphasize the importance of developing versatile and efficient foundational models. These models are poised to address the expanding needs of imaging and omics data, thus driving continuous innovation in biomedical analysis.
Organized by
BTEPDescription
Qlucore Omics Explorer is a point-and-click software that enables analysis of RNA sequencing (bulk and single cell), proteomics and metabolomics data. It’s machine learning capabilities allow for cell type classification. This software is available to NCI CCR scientists. Submit a ticket at https://service.cancer.gov/ncisp to get it installed. This session covering bulk RNA sequencing introduces participants to experimental design, data import, normalization, differential expression analysis, visualizations, and biological interpretation (Read More
Qlucore Omics Explorer is a point-and-click software that enables analysis of RNA sequencing (bulk and single cell), proteomics and metabolomics data. It’s machine learning capabilities allow for cell type classification. This software is available to NCI CCR scientists. Submit a ticket at https://service.cancer.gov/ncisp to get it installed. This session covering bulk RNA sequencing introduces participants to experimental design, data import, normalization, differential expression analysis, visualizations, and biological interpretation (i.e. GSEA, pathway visualization, biological networks, GO enrichment). Experience using or installation of this software is not required for attendance. This class is a demonstration and not hands-on. Participation is restricted to NIH staff. Meeting link will be provided upon approval of registration.
Organized by
BTEPDescription
Qlucore Omics Explorer is a point-and-click package available to NCI CCR scientists that enables visualization-based analysis of multi-omics data including RNA-seq, scRNA-seq, proteomics, metabolomics, as well as enabling machine learning classification of cell types. Submit a ticket at https://service.cancer.gov/ncisp to get it installed. In this session, participants will learn to apply regression approaches to identify correlation between bulk RNA and protein expression using this software. Experience using or installation of Read More
Qlucore Omics Explorer is a point-and-click package available to NCI CCR scientists that enables visualization-based analysis of multi-omics data including RNA-seq, scRNA-seq, proteomics, metabolomics, as well as enabling machine learning classification of cell types. Submit a ticket at https://service.cancer.gov/ncisp to get it installed. In this session, participants will learn to apply regression approaches to identify correlation between bulk RNA and protein expression using this software. Experience using or installation of Qlucore Omics Explorer is not needed to attend. Attendance is restricted to NIH staff. Meeting link will be provided upon approval of registration.