Applications of Deep Learning in Computational Biology
When: Oct. 24th, 2024 2:00 pm - 3:00 pm
Learning Level: Any
To Know
About this Class
NIH Text Mining and Natural Language Processing SIG is pleased to welcome you to this special event featuring two extraordinary speakers focused on the applications of Deep Learning in Computational Biology.
Speaker: Dr. Lauren Porter, Principal Investigator at the Division of Intramural Research, NLM, NIH
Title: Predicting unknown regions of protein fold space
Speaker: Dr. Ivan Ovcharenko, Principal Investigator at the Division of Intramural Research, NLM, NIH
Title: Deep Learning Models Accurately Identify Disease-Causal Regulatory Variants
Predicting unknown regions of protein fold space
Recent work suggests that AlphaFold (AF)–a deep learning-based model that can accurately infer protein structure from sequence–may discern important features of folded protein energy landscapes, defined by the diversity and frequency of different conformations in the folded state. Here, we test the limits of its predictive power on fold-switching proteins, which assume two structures with regions of distinct secondary and/or tertiary structure. We find that (1) AF is a weak predictor of fold switching and (2) some of its successes result from memorization of training-set structures rather than learned protein energetics. Combining >280,000 models from several implementations of AF2 and AF3, a 35% success rate was achieved for fold switchers likely in AF’s training sets. AF2’s confidence metrics selected against models consistent with experimentally determined fold-switching structures and failed to discriminate between low and high energy conformations. Further, AF captured only one out of seven experimentally confirmed fold switchers outside of its training sets despite extensive sampling of an additional ~280,000 models. Several observations indicate that AF2 has memorized structural information during training, and AF3 misassigns coevolutionary restraints. These limitations constrain the scope of successful predictions, highlighting the need for physically based methods that readily predict multiple protein conformations.
Deep Learning Models Accurately Identify Disease-Causal Regulatory Variants
Genetic association studies have identified thousands of independent signals associated with a wide range of human complex diseases. Despite these successes, pinpointing specific causal variants underlying a genetic association signal remains challenging. In this presentation, I will introduce a deep learning (DL) model designed to accurately predict disease-causal variants in the noncoding regions of the human genome. By applying this model to enhancers, we identify a specific set of causal variants linked to type 2 diabetes, several of which have been confirmed biochemically. When extending the model to silencers, we find that candidate silencers exhibit strong enrichment in disease-associated variants, with certain diseases showing a significantly stronger association with silencer variants than with enhancer variants. Nearly 52% of candidate silencers cluster together, forming silencer-rich loci. In the loci of Parkinson's disease hallmark genes TRIM31 and MAL, the associated SNPs densely populate these clustered candidate silencers rather than enhancers, showing an overall twofold enrichment of silencers compared to enhancers. The disruption of apoptosis in neuronal cells is associated with both schizophrenia and bipolar disorder and can largely be attributed to variants within candidate silencers. Our model allows for a mechanistic explanation of causative SNP effects by identifying altered binding of tissue-specific repressors and activators, validated with 70% directional concordance using SNP-SELEX. Focusing on individual silencer variants, experimental data confirms the roles of the rs62055708 SNP in Parkinson's disease, rs2535629 in schizophrenia, and rs6207121 in type 1 diabetes. In summary, our results suggest that advancements in deep learning models for discovering disease-causal variants can provide a foundation for explaining mechanisms of action and designing novel diagnostics and therapeutics.