penetrance
Extending Models Via Gradient Boosting: An Application to Mendelian Models
Huang, Theodore, Idos, Gregory, Hong, Christine, Gruber, Stephen, Parmigiani, Giovanni, Braun, Danielle
Improving existing widely-adopted prediction models is often a more efficient and robust way towards progress than training new models from scratch. Existing models may (a) incorporate complex mechanistic knowledge, (b) leverage proprietary information and, (c) have surmounted barriers to adoption. Compared to model training, model improvement and modification receive little attention. In this paper we propose a general approach to model improvement: we combine gradient boosting with any previously developed model to improve model performance while retaining important existing characteristics. To exemplify, we consider the context of Mendelian models, which estimate the probability of carrying genetic mutations that confer susceptibility to disease by using family pedigrees and health histories of family members. Via simulations we show that integration of gradient boosting with an existing Mendelian model can produce an improved model that outperforms both that model and the model built using gradient boosting alone. We illustrate the approach on genetic testing data from the USC-Stanford Cancer Genetics Hereditary Cancer Panel (HCP) study.
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Empowering individual trait prediction using interactions
One component of precision medicine is to construct prediction models with their predictive ability as high as possible, e.g. to enable individual risk prediction. In genetic epidemiology, complex diseases have a polygenic basis and a common assumption is that biological and genetic features affect the outcome under consideration via interactions. In the case of omics data, the use of standard approaches such as generalized linear models may be suboptimal and machine learning methods are appealing to make individual predictions. However, most of these algorithms focus mostly on main or marginal effects of the single features in a dataset. On the other hand, the detection of interacting features is an active area of research in the realm of genetic epidemiology. One big class of algorithms to detect interacting features is based on the multifactor dimensionality reduction (MDR). Here, we extend the model-based MDR (MB-MDR), a powerful extension of the original MDR algorithm, to enable interaction empowered individual prediction. Using a comprehensive simulation study we show that our new algorithm can use information hidden in interactions more efficiently than two other state-of-the-art algorithms, namely the Random Forest and Elastic Net, and clearly outperforms these if interactions are present. The performance of these algorithms is comparable if no interactions are present. Further, we show that our new algorithm is applicable to real data by comparing the performance of the three algorithms on a dataset of rheumatoid arthritis cases and healthy controls. As our new algorithm is not only applicable to biological/genetic data but to all datasets with discrete features, it may have practical implications in other applications as well, and we made our method available as an R package.
- Europe > Germany > Schleswig-Holstein > Lübeck (0.04)
- Asia > Middle East > Jordan (0.04)
- Health & Medicine > Epidemiology (1.00)
- Health & Medicine > Therapeutic Area > Musculoskeletal (0.48)
13 The Genetics Counselor G. Hunn and J. Lederberg
The Genetics Counselor is a computer program, written in LISP, designed to handle problems of medical genetics counseling. It is an attempt to apply the methods of artificial intelligence research to medical diagnostic problems. The program attempts to map the data space of a family-tree structure into the hypothesis space of classical Mendelian genetics by use of a heuristic search. The input data are the family members along with their children (or parents), and phenotype. The program generates a family tree and searches for consanguinity.
AN APPROACH TO AUTOMATIC PROBLEM-SOLVING
A digital computer program, the Graph Traverser (Doran & Michie 1966), can seek a solution to any problem which may be interpreted as that of finding a path from one specified node of a graph to another. Emphasis is placed Upon the evaluation of intermediate states of the problem (nodes of the graph) according to the extent to which they resemble the'goal' state. Sample results from first applications of the program, and possible future developments, are discussed. The program is related to other problemsolving programs. INTRODUCTION: PROBLEMS AND PROBLEM-SOLVING PROGRAMS How to travel from London to Birmingham may, in some circumstances, be a'problem'.