Automated Machine Learning Analysis of Metabolomic Data
When: Aug. 21st, 2020 12:00 pm - 1:00 pm
About this Class
Abstract: Machine learning (ML) has emerged as an essential tool for building models which can be used to predict clinical outcomes for age-related diseases. A significant challenge of ML is knowing which algorithms and parameter settings are appropriate for a given data set and the hidden patterns to be discovered. Automated ML or AutoML has emerged to take the guesswork out of selecting an ML method by letting the computer optimize the method and parameter selection. This makes ML more accessible to non-experts. We introduce here the tree-based pipeline optimization tool (TPOT) for automated discovery of ML pipelines. We applied TPOT to predicting coronary artery disease (CAD) phenotypes using 73 nuclear magnetic resonance-derived lipoprotein and metabolite profiles and 27 demographic and clinical features in the Angiography and Genes Study (ANGES) with a sample size of 925 subjects. We show that TPOT outperforms a standard grid search approach for predicting CAD outcomes and identifies pipelines unlikely to be selected by human experts. The TPOT software is programmed in Python and freely available as open-source from Github (
https://github.com/EpistasisLab/tpot).
Speaker: Jason Moore, Ph.D., Director of the Penn Institute for Biomedical Informatics, Philadelphia, PA
Register here with NIH Webex Events:
https://nih.webex.com/nih/onstage/g.php?MTID=e7b2a0e8c2dd5ddf486316a551fe555d4