Upcoming Classes & Events
March
Description
Partek Flow provides a singular web based point and click environment for analyzing and visualizing high dimensional multi-omics sequencing data, making bioinformatics easily accessible to all researchers. Partek Flow software is available to NCI researchers. Join us for this online webinar session, where the Partek Scientist will show you how to perform start to finish analysis on RNA-Seq data with the point-and-click user interface in Partek Flow.
RNA-Seq example data will be Read More
Partek Flow provides a singular web based point and click environment for analyzing and visualizing high dimensional multi-omics sequencing data, making bioinformatics easily accessible to all researchers. Partek Flow software is available to NCI researchers. Join us for this online webinar session, where the Partek Scientist will show you how to perform start to finish analysis on RNA-Seq data with the point-and-click user interface in Partek Flow.
RNA-Seq example data will be used to illustrate the analysis steps from fastq files to biological interpretation.
Agenda:
· Data QA/QC
· Alignment
· Quantification and filtering
· Normalization
· Differential expression detection
· Biological interpretation
· Visualization (PCA, dotplot, volcano plot, hierarchical clustering etc.)
Details
Organizer
CBIITWhen
Thu, Mar 30, 2023 - 11:00 am - 12:00 pmWhere
Online WebinarDistinguished Speakers Seminar Series
Description
AI Models of Cancer and Precision Medicine: Building a Mind for Cancer
The long-term objective of the Ideker Lab is to create artificially intelligent, mechanistic models of cancer and neurodegenerative diseases for translation of patient data to precision diagnosis and treatment. We seek to advance this goal by addressing fundamental questions in the field: What are the genetic and molecular networks that promote disease, and how do we Read More
AI Models of Cancer and Precision Medicine: Building a Mind for Cancer
The long-term objective of the Ideker Lab is to create artificially intelligent, mechanistic models of cancer and neurodegenerative diseases for translation of patient data to precision diagnosis and treatment. We seek to advance this goal by addressing fundamental questions in the field: What are the genetic and molecular networks that promote disease, and how do we best chart these? How do we use knowledge of these networks in intelligent systems for predicting the effects of genotype on phenotype? – Ideker Lab, https://idekerlab.ucsd.edu/research/cancer/
This webinar will be recorded and made available on the BTEP web site: https://bioinformatics.ccr.cancer.gov/btep/btep-video-archive-of-past-classes/ within 48 hours after the event ends.
Register
April
Description
Workshop Description: The application of AI to cancer research holds promise to accelerate new discoveries, enable early detection, improve diagnosis, and spur development of new therapies for cancer. Machine learning and other forms of AI have made a significant impact in some areas of cancer research, but the full promise of data-driven approaches has been elusive. While there are important ongoing efforts to collect and produce large, well-annotated datasets to support the Read More
Workshop Description: The application of AI to cancer research holds promise to accelerate new discoveries, enable early detection, improve diagnosis, and spur development of new therapies for cancer. Machine learning and other forms of AI have made a significant impact in some areas of cancer research, but the full promise of data-driven approaches has been elusive. While there are important ongoing efforts to collect and produce large, well-annotated datasets to support the training of robust deep learning models, the heterogeneity and complexity of cancer, along with privacy and bias concerns, continues to limit the application of AI methods to many critical areas of cancer research. There is a need for foundational advances in machine learning that can operate on incomplete, noisy, unbalanced and/or biased data across the cancer research continuum.
The goals of this workshop are to (1) examine the state of the science for AI methods designed to operate on noisy, complex, or low-dimensional data, (2) explore how these methods may be applied to key areas of cancer research, and (3) discuss processes for identifying the biological questions that will motivate further advances in machine learning. This workshop will highlight the importance of leveraging advances across fields to accelerate cancer research and discovery through AI.
Workshop Chairs:
Caroline Uhler, Ph.D. (MIT and Broad Institute)
Olivier Gevaert, Ph.D. (Stanford University)
NCI Planning Committee:
Juli Klemm, Ph.D.
Jennifer Couch, Ph.D.
Sean Hanlon, Ph.D.
Natalie Abrams, Ph.D.
Keyvan Farahani, Ph.D.
Emily Greenspan, Ph.D.
Paul Han, M.D., M.A., M.P.H.
Roxanne Jensen, Ph.D.
Jerry Li, M.D., Ph.D.
AgendaA summary of the planned workshop sessions and participants is provided below. A detailed agenda with speakers and presentation titles will be posted ahead of the meeting.
DAY 1, April 3, 2023 (11 am to 4:30 pm EDT)Welcome and Opening Comments
- National Cancer Institute
- Caroline Uhler, MIT and Broad Institute
Session 1: Integrating classical structure prediction with machine learning towards drug discovery
Session Chair: Trey Ideker, UCSD
This session will focus on expanding the field of structure prediction to incorporate multiple data modalities and layers of biological structure beyond the protein, as well as meta-learning for identifying targets for drug discovery.
Speakers:
- Anima Anandkumar, Cal Tech and NVIDIA
- Andrej Sali, UCSF
- Jure Leskovec, Stanford
Panelists:
- Rick Stevens, Argonne National Laboratory
- Sergey Ovchinnikov, Harvard
Session 2: Chemical, genetic, and mechanical perturbations for understanding mechanisms in cancer: Extrapolating beyond existing data
Session Chair: Fabian Theis, Helmholtz Munich
In this session, researchers will discuss the use of large-scale perturbation data for causal modeling, combining representation learning with perturbation approaches, and methods to extrapolate beyond existing perturbation data.
Speakers:
- Yoshua Bengio, Université de Montréal
- GV Shivashankar, ETH Zurich
- Smita Krishnaswamy, Yale
Panelists:
- Paquita Vazquez, Broad Institute
- Byung-Jun Yoon, Texas A&M University and Brookhaven National Laboratory
Session 3: Multimodal learning in data limited contexts: Leveraging tissue-level data for understanding cell-cell interactions in cancer
Session chair: Dana Pe’er, Memorial Sloan Kettering
This session will focus on multimodal learning in data limited contexts, including cell-cell interactions and predicting outcomes. Dealing with imbalances across multimodal data sets and foundational models will also be discussed.
Speakers:
- Elena Fertig, Johns Hopkins
- Elham Azizi, Columbia
- Livnat Jerby, Stanford
Panelists:
- Marianna Rapsomaniki, IBM Research
- Arjun Krishnan, University of Colorado
Session 4: Making use of large-scale, structured clinical research data and image repositories
Session chair: Ziad Obermeyer, UC Berkeley
In this session, researchers will discuss the use of large-scale clinical research data for machine learning models. Discussion topics include the use of synthetic data, considerations of bias, generalizable models, and development of digital twins.
Speakers:
- Chris Probert, InSitro
- James Zou, Stanford
- Mihaela van der Schaar, University of Cambridge
Panelists:
- Lily Peng, Verily
- Matthew Lungren, Microsoft/UCSF
Session 5: Improving modeling of real-world evidence data in clinical research and clinical trial design
Session chair: Tianxi Cai, Harvard
This session will focus on real-world evidence (RWE) data modeling, including issues associated with RWE data such as electronic health record coding and unbalanced data, towards the development of clinical trials.
Speakers:
- Sean Khozin, MIT
- Limor Appelbaum, Beth Israel Deaconess
- Ryan Copping, Genentech
Panelists:
- Donna Rivera, FDA
- Khaled El Emam, University of Ottawa
Session 6: Cross-cutting discussion with session chairs
Session chair: Olivier Gevaert, Stanford University
Discussion of the approaches and challenges identified during the workshop and opportunities for the future.
Panelists:
- Caroline Uhler, MIT and Broad Institute
- Trey Ideker, UCSD
- Dana Pe’er, Memorial Sloan Kettering
- Ziad Obermeyer, UC Berkeley
- Tianxi Cai, Harvard
Register
Organizer
NCIWhen
Mon, Apr 03 - Tue, Apr 04, 2023 -11:00 am - 5:00 pmWhere
OnlineDescription
This class provides a basic overview of the methods used to visualize the association among two or more quantitative variables. This class will focus on scatterplots, scatterplot matrix, and visualizing paired data. Participants are expected to have taken the Introduction to Data Visualization in R: ggplot class. Participants are encouraged to install R, RStudio, and the tidyverse package, before the webinar so that they can follow along Read More
This class provides a basic overview of the methods used to visualize the association among two or more quantitative variables. This class will focus on scatterplots, scatterplot matrix, and visualizing paired data. Participants are expected to have taken the Introduction to Data Visualization in R: ggplot class. Participants are encouraged to install R, RStudio, and the tidyverse package, before the webinar so that they can follow along with the instructor.
Details
Organizer
NIH LibraryWhen
Tue, Apr 04, 2023 - 11:00 am - 11:00 amWhere
Online WebinarDescription
Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. This class will demonstrate integrated development and learning (IDE) platforms for learning Python, the fundamentals of Python coding, and why it is advantageous to develop these skills. The session will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. It will also provide an overview of programming constructs needed Read More
Python is a programming language used for data science, specifically: data analysis, statistical analysis, and visualization of results. This class will demonstrate integrated development and learning (IDE) platforms for learning Python, the fundamentals of Python coding, and why it is advantageous to develop these skills. The session will feature the following IDEs: Google Colaboratory: Jupyter Notebook; and Anaconda’s: Spyder, Jupyter Notebook, and JupyterLab. It will also provide an overview of programming constructs needed to learn Python. Finally, this class will demonstrate why these skills can boost productivity, rigor, and transparency in reporting research findings
Details
Organizer
NIH LibraryWhen
Tue, Apr 04, 2023 - 2:00 pm - 3:00 pmWhere
Online WebinarDescription
Azimuth is a web application that uses an annotated reference dataset to automate the processing, analysis, and interpretation of a new single-cell RNA-seq experiment. Azimuth leverages a 'reference-based mapping' pipeline that inputs a counts matrix of gene expression in single cells, and performs normalization, visualization, cell annotation, and differential expression (biomarker discovery). All results can be explored within the app, and easily downloaded Read More
Azimuth is a web application that uses an annotated reference dataset to automate the processing, analysis, and interpretation of a new single-cell RNA-seq experiment. Azimuth leverages a 'reference-based mapping' pipeline that inputs a counts matrix of gene expression in single cells, and performs normalization, visualization, cell annotation, and differential expression (biomarker discovery). All results can be explored within the app, and easily downloaded for additional downstream analysis. - Satija Lab
The development of Azimuth is led by the New York Genome Center Mapping Component as part of the NIH Human Biomolecular Atlas Project (HuBMAP).
This webinar will be recorded and made available on the BTEP web site: https://bioinformatics.ccr.cancer.gov/btep/btep-video-archive-of-past-classes/ within 48 hours after the event ends.
Register
Description
Wondering why you should use R for data visualization? Lesson 1 of the Data Visualization with R course series will address this question and introduce the various plot types that will be generated throughout the course. Lesson 1 will also showcase related plots that you will be able to create in the future using the foundational skills gained over the next 5 lessons.
This will not be a hands-on lesson so no coding just yet. Read More
Wondering why you should use R for data visualization? Lesson 1 of the Data Visualization with R course series will address this question and introduce the various plot types that will be generated throughout the course. Lesson 1 will also showcase related plots that you will be able to create in the future using the foundational skills gained over the next 5 lessons.
This will not be a hands-on lesson so no coding just yet. The hands-on portion of this series will start with lesson 2, Getting Started with ggplot2.
This lesson is the first lesson of a multi-lesson course series. Registering here will register you for the entire course series.
IMPORTANT: You do not need to download or install any software to participate in the course. This course will be taught on the DNAnexus platform. Every learner will need to create a free DNAnexus account at https://dnanexus.com. After you have created your DNAnexus account, please complete this form. If you fail to complete the form, we will not be able to give you access to the course on DNAnexus.
Register
Description
Partek Flow is a point-and-click software that allows users to analyze high dimensional multi-omics sequencing data. This software runs on NIH’s high performance computing cluster (Biowulf), which allows users to take advantage of abundant computing power while avoiding the steep learning curve associated with analyzing sequencing data programmatically. Users interact with Partek Flow through a web browser, which eliminates the need to install software on personal computer. In this training Read More
Partek Flow is a point-and-click software that allows users to analyze high dimensional multi-omics sequencing data. This software runs on NIH’s high performance computing cluster (Biowulf), which allows users to take advantage of abundant computing power while avoiding the steep learning curve associated with analyzing sequencing data programmatically. Users interact with Partek Flow through a web browser, which eliminates the need to install software on personal computer. In this training session, you will learn to analyze 10x Visium spatial transcriptomics data using Partek Flow.
Register
Description
In this hands-on virtual lab, the participants will familiarize themselves with Deep Learning concepts and techniques, using MATLAB Online to train deep neural networks on GPUs in the cloud, create deep learning models from scratch for images and signal data, explore pretrained models and use transfer learning, import and export models from Python frameworks such as Keras and PyTorch, as well as automatically generate code for embedded targets. Deep Read More
In this hands-on virtual lab, the participants will familiarize themselves with Deep Learning concepts and techniques, using MATLAB Online to train deep neural networks on GPUs in the cloud, create deep learning models from scratch for images and signal data, explore pretrained models and use transfer learning, import and export models from Python frameworks such as Keras and PyTorch, as well as automatically generate code for embedded targets. Deep Learning can achieve state-of-the-art accuracy when it comes to complex problems such as image classification or developing predictive models for signal processing applications. Deep Learning outperforms humans in some tasks like classifying objects in images. Cancer researchers are using deep learning to automatically detect cancer cells, for example.
Details
Organizer
NIH LibraryWhen
Thu, Apr 13, 2023 - 12:00 pm - 1:30 pmWhere
Online WebinarDescription
Lesson 2 of the Data Visualization with R course series will focus on the basics of ggplot2, including the grammar of graphics philosophy and its application. This lesson will provide a hands on introduction to the ggplot2 syntax, geom functions, mapping and aesthetics, and plot layering.
Registering for lesson 1 of this course series will enroll you in the entire course series.
Lesson 2 of the Data Visualization with R course series will focus on the basics of ggplot2, including the grammar of graphics philosophy and its application. This lesson will provide a hands on introduction to the ggplot2 syntax, geom functions, mapping and aesthetics, and plot layering.
Registering for lesson 1 of this course series will enroll you in the entire course series.
Register
Description
This seminar will introduce the NIH Comparative Genomics Resource (CGR), an NIH-funded, multi-year NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms in collaboration with the genomics community. The project’s vision is to maximize the biomedical impact of eukaryotic research organisms and their genomic data resources to meet emerging research needs for human health. To achieve this, NCBI is providing high-value data and assorted tools compatible Read More
This seminar will introduce the NIH Comparative Genomics Resource (CGR), an NIH-funded, multi-year NLM project to establish an ecosystem to facilitate reliable comparative genomics analyses for all eukaryotic organisms in collaboration with the genomics community. The project’s vision is to maximize the biomedical impact of eukaryotic research organisms and their genomic data resources to meet emerging research needs for human health. To achieve this, NCBI is providing high-value data and assorted tools compatible with community-provided resources.
Details
Organizer
NIH Office of Data Science Strategy (ODSS)Where
Online WebinarDescription
Lesson 3 of the Data Visualization with R course series will continue the discussion on the grammar of graphics, with a focus on ggplot2 plot customization including axes labels, coordinate systems, axes scales, and themes. This hands on lesson will showcase these features of plot building through the generation of increasingly complex scatter plots using data included with a base R installation as well as RNA-seq data.
Lesson 3 of the Data Visualization with R course series will continue the discussion on the grammar of graphics, with a focus on ggplot2 plot customization including axes labels, coordinate systems, axes scales, and themes. This hands on lesson will showcase these features of plot building through the generation of increasingly complex scatter plots using data included with a base R installation as well as RNA-seq data.
Registering for lesson 1 of this course series will enroll you in the entire course series.
Register
Description
Register by April 03, 2023. Using information gleaned from a person’s genome can assist clinicians in customizing their patient’s case management and increase the likelihood of a positive outcome. While NCBI has long had resources for biologists to explore what is known about genomes, genes and genetic variations, we have also added resources designed to assist the clinical community in understanding the impact of genetic variations in their patients. Using real-world cases, Read More
Register by April 03, 2023. Using information gleaned from a person’s genome can assist clinicians in customizing their patient’s case management and increase the likelihood of a positive outcome. While NCBI has long had resources for biologists to explore what is known about genomes, genes and genetic variations, we have also added resources designed to assist the clinical community in understanding the impact of genetic variations in their patients. Using real-world cases, this workshop will show you how to use free, high quality, online resources to assist you with your patient care. See more here.
Details
Organizer
NCBIWhen
Tue, Apr 18, 2023 - 1:00 pm - 3:00 pmWhere
OnlineDescription
Big data, such as those derived from genomic studies are often programmatically analyzed using computer languages such as Python or R. Tools such as Jupyter Lab (https://jupyter.org) make it easy for researchers to document the program written for their data analysis and thus, facilitate reproducibility. In addition to code and inline output, Jupyter Lab allows inclusion of text to help Read More
Big data, such as those derived from genomic studies are often programmatically analyzed using computer languages such as Python or R. Tools such as Jupyter Lab (https://jupyter.org) make it easy for researchers to document the program written for their data analysis and thus, facilitate reproducibility. In addition to code and inline output, Jupyter Lab allows inclusion of text to help researchers better communicate their analysis. Jupyter Lab can be used with many computer languages including Python and R. We will demonstrate how to use Jupyter Lab to document genomic data analysis in this BTEP Coding Club. This session is not hands-on, and you do not need to install anything to attend.
Register
Description
It is common to obtain summary statistics for a dataset to understand parameters like mean, standard deviation, and distribution. In Lesson 4 of the Data Visualization with R course series, we will learn to generate plots that will help with visualization of summary statistics including bar plots with error bars, histograms, and box and whiskers plots.
Registering for lesson 1 of this course series will enroll you Read More
It is common to obtain summary statistics for a dataset to understand parameters like mean, standard deviation, and distribution. In Lesson 4 of the Data Visualization with R course series, we will learn to generate plots that will help with visualization of summary statistics including bar plots with error bars, histograms, and box and whiskers plots.
Registering for lesson 1 of this course series will enroll you in the entire course series.
Register
Organizer
BTEPWhen
Thu, Apr 20, 2023 - 1:00 pm - 2:15 pmWhere
OnlineDescription
Macros are ways to use code to substitute in a value, and using macros makes a code in SAS easier to read and edit, less prone to errors, and allows it to run more efficiently. This 90-minute advanced class will provide an in-depth look at using and writing macros in SAS. Topics covered in this class include macro function, using SQL and Data Step to create macro variables, indirect references to macro variables, defining Read More
Macros are ways to use code to substitute in a value, and using macros makes a code in SAS easier to read and edit, less prone to errors, and allows it to run more efficiently. This 90-minute advanced class will provide an in-depth look at using and writing macros in SAS. Topics covered in this class include macro function, using SQL and Data Step to create macro variables, indirect references to macro variables, defining and calling a macro, macro variable scope, conditional processing, and iterative processing.
Details
Organizer
NIH LibraryWhen
Tue, Apr 25, 2023 - 12:00 pm - 1:30 pmWhere
Online WebinarDescription
Lesson 5 of the Data Visualization with R course series will introduce the heatmap and dendrogram as tools for visualizing clusters in data. This lesson will primarily use the R package pheatmap.
Registering for lesson 1 of this course series will enroll you in the entire course series.
Lesson 5 of the Data Visualization with R course series will introduce the heatmap and dendrogram as tools for visualizing clusters in data. This lesson will primarily use the R package pheatmap.
Registering for lesson 1 of this course series will enroll you in the entire course series.
Register
Description
Register by April 10, 2023. This workshop is for biological researchers who would like to incorporate NCBI command-line clients into their workflows to access and process NCBI molecular data and metadata. In this workshop you will learn to use both the EDirect suite and the Datasets command-line interface (CLI) to download gene sequences, genome assemblies and their associated metadata, and create custom reports that cross reference biological features and sequence data. You do not Read More
Register by April 10, 2023. This workshop is for biological researchers who would like to incorporate NCBI command-line clients into their workflows to access and process NCBI molecular data and metadata. In this workshop you will learn to use both the EDirect suite and the Datasets command-line interface (CLI) to download gene sequences, genome assemblies and their associated metadata, and create custom reports that cross reference biological features and sequence data. You do not need to have prior experience with EDirect or the Datasets CLI tools (datasets and dataformat), but you will need to be familiar with NCBI databases and comfortable using the Unix/Linux shell to get the most out of this workshop. See more information here.
Details
Organizer
NCBIWhen
Tue, Apr 25, 2023 - 1:00 pm - 3:00 pmWhere
OnlineDescription
Scientific journals almost always have limits on the number of figures that can be included in a publication. Don't fret, in the 6th and final lesson of the Data Visualization with R course series, we will focus on generating sub-plots and multi-plot figure panels using ggplot2 associated packages.
Registering for lesson 1 of this course series will enroll you in the entire course series.
Scientific journals almost always have limits on the number of figures that can be included in a publication. Don't fret, in the 6th and final lesson of the Data Visualization with R course series, we will focus on generating sub-plots and multi-plot figure panels using ggplot2 associated packages.
Registering for lesson 1 of this course series will enroll you in the entire course series.
Register
May
Description
Galaxy is a scientific workflow, data integration, data analysis, and publishing platform that makes computational biology accessible to research scientists that do not have computer programming experience. This workshop will introduce RNA-seq data analysis followed by tutorials showing the use of popular RNA-seq analysis packages and preparing participants to independently run basic RNA-Seq analysis for expression profiling. The hands-on exercises will run on the Galaxy platform using Illumina paired-end RNA-seq data. The workshop will Read More
Galaxy is a scientific workflow, data integration, data analysis, and publishing platform that makes computational biology accessible to research scientists that do not have computer programming experience. This workshop will introduce RNA-seq data analysis followed by tutorials showing the use of popular RNA-seq analysis packages and preparing participants to independently run basic RNA-Seq analysis for expression profiling. The hands-on exercises will run on the Galaxy platform using Illumina paired-end RNA-seq data. The workshop will be taught by NCI staff and is open to NIH and HHS staff.