Bioinformatics Training and Education Program

Cancer AI Research: Computational Approaches Addressing Imperfect Data

Cancer AI Research: Computational Approaches Addressing Imperfect Data

 When: Apr. 3rd, 2023 - Apr. 4th, 2023 11:00 am - 5:00 pm

Learning Level: Any

To Know

Online Webinar
This class has ended.

About this Class

Workshop Description: The application of AI to cancer research holds promise to accelerate new discoveries, enable early detection, improve diagnosis, and spur development of new therapies for cancer. Machine learning and other forms of AI have made a significant impact in some areas of cancer research, but the full promise of data-driven approaches has been elusive. While there are important ongoing efforts to collect and produce large, well-annotated datasets to support the training of robust deep learning models, the heterogeneity and complexity of cancer, along with privacy and bias concerns, continues to limit the application of AI methods to many critical areas of cancer research. There is a need for foundational advances in machine learning that can operate on incomplete, noisy, unbalanced and/or biased data across the cancer research continuum.

The goals of this workshop are to (1) examine the state of the science for AI methods designed to operate on noisy, complex, or low-dimensional data, (2) explore how these methods may be applied to key areas of cancer research, and (3) discuss processes for identifying the biological questions that will motivate further advances in machine learning. This workshop will highlight the importance of leveraging advances across fields to accelerate cancer research and discovery through AI.

Workshop Chairs:

Caroline Uhler, Ph.D. (MIT and Broad Institute)

Olivier Gevaert, Ph.D. (Stanford University)

NCI Planning Committee:

Juli Klemm, Ph.D.

Jennifer Couch, Ph.D.

Sean Hanlon, Ph.D.

Natalie Abrams, Ph.D.

Keyvan Farahani, Ph.D.

Emily Greenspan, Ph.D.

Paul Han, M.D., M.A., M.P.H.

Roxanne Jensen, Ph.D.

Jerry Li, M.D., Ph.D.


A summary of the planned workshop sessions and participants is provided below. A detailed agenda with speakers and presentation titles will be posted ahead of the meeting.

DAY 1, April 3, 2023 (11 am to 4:30 pm EDT) 

Welcome and Opening Comments

  • National Cancer Institute
  • Caroline Uhler, MIT and Broad Institute


Session 1: Integrating classical structure prediction with machine learning towards drug discovery

Session Chair: Trey Ideker, UCSD

This session will focus on expanding the field of structure prediction to incorporate multiple data modalities and layers of biological structure beyond the protein, as well as meta-learning for identifying targets for drug discovery.


  • Anima Anandkumar, Cal Tech and NVIDIA
  • Andrej Sali, UCSF
  • Jure Leskovec, Stanford


  • Rick Stevens, Argonne National Laboratory
  • Sergey Ovchinnikov, Harvard


Session 2: Chemical, genetic, and mechanical perturbations for understanding mechanisms in cancer: Extrapolating beyond existing data

Session Chair: Fabian Theis, Helmholtz Munich

In this session, researchers will discuss the use of large-scale perturbation data for causal modeling, combining representation learning with perturbation approaches, and methods to extrapolate beyond existing perturbation data.


  • Yoshua Bengio, Université de Montréal
  • GV Shivashankar, ETH Zurich
  • Smita Krishnaswamy, Yale


  • Paquita Vazquez, Broad Institute
  • Byung-Jun Yoon, Texas A&M University and Brookhaven National Laboratory


Session 3: Multimodal learning in data limited contexts: Leveraging tissue-level data for understanding cell-cell interactions in cancer

Session chair: Dana Pe’er, Memorial Sloan Kettering

This session will focus on multimodal learning in data limited contexts, including cell-cell interactions and predicting outcomes. Dealing with imbalances across multimodal data sets and foundational models will also be discussed.


  • Elena Fertig, Johns Hopkins
  • Elham Azizi, Columbia
  • Livnat Jerby, Stanford


  • Marianna Rapsomaniki, IBM Research
  • Arjun Krishnan, University of Colorado


DAY 2, April 4, 2023 (11 am to 3:30 pm EDT) 

Session 4: Making use of large-scale, structured clinical research data and image repositories

Session chair: Ziad Obermeyer, UC Berkeley

In this session, researchers will discuss the use of large-scale clinical research data for machine learning models. Discussion topics include the use of synthetic data, considerations of bias, generalizable models, and development of digital twins.


  • Chris Probert, InSitro
  • James Zou, Stanford
  • Mihaela van der Schaar, University of Cambridge


  • Lily Peng, Verily
  • Matthew Lungren, Microsoft/UCSF


Session 5: Improving modeling of real-world evidence data in clinical research and clinical trial design

Session chair: Tianxi Cai, Harvard

This session will focus on real-world evidence (RWE) data modeling, including issues associated with RWE data such as electronic health record coding and unbalanced data, towards the development of clinical trials.


  • Sean Khozin, MIT
  • Limor Appelbaum, Beth Israel Deaconess
  • Ryan Copping, Genentech


  • Donna Rivera, FDA
  • Khaled El Emam, University of Ottawa


Session 6: Cross-cutting discussion with session chairs

Session chair: Olivier Gevaert, Stanford University

Discussion of the approaches and challenges identified during the workshop and opportunities for the future.


  • Caroline Uhler, MIT and Broad Institute
  • Trey Ideker, UCSD
  • Dana Pe’er, Memorial Sloan Kettering
  • Ziad Obermeyer, UC Berkeley
  • Tianxi Cai, Harvard