Machine learning models $-$ now commonly developed to screen, diagnose, or predict health conditions $-$ are evaluated with a variety of performance metrics. An important first step in assessing the practical utility of a model is to evaluate its average performance over an entire population of interest. In many settings, it is also critical that the model makes good predictions within predefined subpopulations. For instance, showing that a model is fair or equitable requires evaluating the model's performance in different demographic subgroups. However, subpopulation performance metrics are typically computed using only data from that subgroup, resulting in higher variance estimates for smaller groups. We devise a procedure to measure subpopulation performance that can be more sample-efficient than the typical subsample estimates. We propose using an evaluation model $-$ a model that describes the conditional distribution of the predictive model score $-$ to form model-based metric (MBM) estimates. Our procedure incorporates model checking and validation, and we propose a computationally efficient approximation of the traditional nonparametric bootstrap to form confidence intervals. We evaluate MBMs on two main tasks: a semi-synthetic setting where ground truth metrics are available and a real-world hospital readmission prediction task. We find that MBMs consistently produce more accurate and lower variance estimates of model performance for small subpopulations.
Interpretability in machine learning (ML) is crucial for high stakes decisions and troubleshooting. In this work, we provide fundamental principles for interpretable ML, and dispel common misunderstandings that dilute the importance of this crucial topic. We also identify 10 technical challenge areas in interpretable machine learning and provide history and background on each problem. Some of these problems are classically important, and some are recent problems that have arisen in the last few years. These problems are: (1) Optimizing sparse logical models such as decision trees; (2) Optimization of scoring systems; (3) Placing constraints into generalized additive models to encourage sparsity and better interpretability; (4) Modern case-based reasoning, including neural networks and matching for causal inference; (5) Complete supervised disentanglement of neural networks; (6) Complete or even partial unsupervised disentanglement of neural networks; (7) Dimensionality reduction for data visualization; (8) Machine learning models that can incorporate physics and other generative or causal constraints; (9) Characterization of the "Rashomon set" of good models; and (10) Interpretable reinforcement learning. This survey is suitable as a starting point for statisticians and computer scientists interested in working in interpretable machine learning.
Diagnosis of chronic diseases and assistance in medical decisions is based on machine learning algorithms. In this paper, we review the classification algorithms used in the health care system (chronic diseases) and present the neural network-based Ensemble learning method. We briefly describe the commonly used algorithms and describe their critical properties. Materials and Methods: In this study, modern classification algorithms used in healthcare, examine the principles of these methods and guidelines, and to accurately diagnose and predict chronic diseases, superior machine learning algorithms with the neural network-based ensemble learning Is used. To do this, we use experimental data, real data on chronic patients (diabetes, heart, cancer) available on the UCI site. Results: We found that group algorithms designed to diagnose chronic diseases can be more effective than baseline algorithms. It also identifies several challenges to further advancing the classification of machine learning in the diagnosis of chronic diseases. Conclusion: The results show the high performance of the neural network-based Ensemble learning approach for the diagnosis and prediction of chronic diseases, which in this study reached 98.5, 99, and 100% accuracy, respectively.
Background and purpose: Heart disease has been one of the most important causes of death in the last 10 years, so the use of classification methods to diagnose and predict heart disease is very important. If this disease is predicted before menstruation, it is possible to prevent high mortality of the disease and provide more accurate and efficient treatment methods. Materials and Methods: Due to the selection of input features, the use of basic algorithms can be very time-consuming. Reducing dimensions or choosing a good subset of features, without risking accuracy, has great importance for basic algorithms for successful use in the region. In this paper, we propose an ensemble-genetic learning method using wrapper feature reduction to select features in disease classification. Findings: The development of a medical diagnosis system based on ensemble learning to predict heart disease provides a more accurate diagnosis than the traditional method and reduces the cost of treatment. Conclusion: The results showed that Thallium Scan and vascular occlusion were the most important features in the diagnosis of heart disease and can distinguish between sick and healthy people with 97.57% accuracy.
Falls are a common problem affecting the older adults and a major public health issue. Centers for Disease Control and Prevention, and World Health Organization report that one in three adults over the age of 65 and half of the adults over 80 fall each year. In recent years, an ever-increasing range of applications have been developed to help deliver more effective falls prevention interventions. All these applications rely on a huge elderly personal database collected from hospitals, mutual health, and other organizations in caring for elderly. The information describing an elderly is continually evolving and may become obsolete at a given moment and contradict what we already know on the same person. So, it needs to be continuously checked and updated in order to restore the database consistency and then provide better service. This paper provides an outline of an Obsolete personal Information Update System (OIUS) designed in the context of the elderly-fall prevention project. Our OIUS aims to control and update in real-time the information acquired about each older adult, provide on-demand consistent information and supply tailored interventions to caregivers and fall-risk patients. The approach outlined for this purpose is based on a polynomial-time algorithm build on top of a causal Bayesian network representing the elderly data. The result is given as a recommendation tree with some accuracy level. We conduct a thorough empirical study for such a model on an elderly personal information base. Experiments confirm the viability and effectiveness of our OIUS.
The use of genetic variants as instrumental variables - an approach known as Mendelian randomization - is a popular epidemiological method for estimating the causal effect of an exposure (phenotype, biomarker, risk factor) on a disease or health-related outcome from observational data. Instrumental variables must satisfy strong, often untestable assumptions, which means that finding good genetic instruments among a large list of potential candidates is challenging. This difficulty is compounded by the fact that many genetic variants influence more than one phenotype through different causal pathways, a phenomenon called horizontal pleiotropy. This leads to errors not only in estimating the magnitude of the causal effect but also in inferring the direction of the putative causal link. In this paper, we propose a Bayesian approach called BayesMR that is a generalization of the Mendelian randomization technique in which we allow for pleiotropic effects and, crucially, for the possibility of reverse causation. The output of the method is a posterior distribution over the target causal effect, which provides an immediate and easily interpretable measure of the uncertainty in the estimation. More importantly, we use Bayesian model averaging to determine how much more likely the inferred direction is relative to the reverse direction.
AIMS. This study compared the performance of deep learning extensions of survival analysis models with traditional Cox proportional hazards (CPH) models for deriving cardiovascular disease (CVD) risk prediction equations in national health administrative datasets. METHODS. Using individual person linkage of multiple administrative datasets, we constructed a cohort of all New Zealand residents aged 30-74 years who interacted with publicly funded health services during 2012, and identified hospitalisations and deaths from CVD over five years of follow-up. After excluding people with prior CVD or heart failure, sex-specific deep learning and CPH models were developed to estimate the risk of fatal or non-fatal CVD events within five years. The proportion of explained time-to-event occurrence, calibration, and discrimination were compared between models across the whole study population and in specific risk groups. FINDINGS. First CVD events occurred in 61,927 of 2,164,872 people. Among diagnoses and procedures, the largest 'local' hazard ratios were associated by the deep learning models with tobacco use in women (2.04, 95%CI: 1.99-2.10) and with chronic obstructive pulmonary disease with acute lower respiratory infection in men (1.56, 95%CI: 1.50-1.62). Other identified predictors (e.g. hypertension, chest pain, diabetes) aligned with current knowledge about CVD risk predictors. The deep learning models significantly outperformed the CPH models on the basis of proportion of explained time-to-event occurrence (Royston and Sauerbrei's R-squared: 0.468 vs. 0.425 in women and 0.383 vs. 0.348 in men), calibration, and discrimination (all p<0.0001). INTERPRETATION. Deep learning extensions of survival analysis models can be applied to large health administrative databases to derive interpretable CVD risk prediction equations that are more accurate than traditional CPH models.
Modern statistical learning addresses datasets that are increasingly massive in volume and diverse in variable category, and it is aimed at leveraging these datasets for knowledge discovery and decision support. In many problem settings, auxiliary outcomes are available to decision makers but one outcome is of primary interest. For instance, e-commerce retailers like Amazon own their customer profile data and purchase history associated with several types of products. These products can share similar features but only one product is of interest to the retailer whose goal is to decide whether to recommend this product to a particular customer or not. The purchase behavior associated with this specific product is considered as the target outcome, and that of the remaining products are auxiliary outcomes. In healthcare, many clinical outcomes are measured and recorded for individual patients. Some of them are closely relevant.
Chronic kidney disease (CKD) is a gradual loss of renal function over time, and it increases the risk of mortality, decreased quality of life, as well as serious complications. The prevalence of CKD has been increasing in the last couple of decades, which is partly due to the increased prevalence of diabetes and hypertension. To accurately detect CKD in diabetic patients, we propose a novel framework to learn sparse longitudinal representations of patients' medical records. The proposed method is also compared with widely used baselines such as Aggregated Frequency Vector and Bag-of-Pattern in Sequences on real EHR data, and the experimental results indicate that the proposed model achieves higher predictive performance. Additionally, the learned representations are interpreted and visualized to bring clinical insights.
Clinical decision making about treatments and interventions based on personal characteristics leads to effective health improvement. Machine learning (ML) has been the central concern of the diagnosis support and disease prediction based on comprehensive patient information. Because the black-box problem in ML is serious for medical applications, explainable artificial intelligence (XAI) techniques to explain the reasons for ML models predictions have been focused. A remaining important issue in clinical situations is discovery of concrete and realistic treatment processes. This paper proposes an innovative framework to plan concrete treatment processes based on an ML model. A key point of our proposed framework is to evaluate an "actionability" of the treatment process using a stochastic surrogate model constructed through hierarchical Bayesian modeling. The actionability is an essential concept for suggesting a realistic treatment process, which leads to clinical applications for personal health improvement. This paper also presents two experiments to evaluate our framework. We first demonstrate the feasibility of our framework from the viewpoint of the methodology using a synthetic dataset. Subsequently, our framework is applied to an actual health checkup dataset, which comprises 3,132 participants, considering an application to improve systolic blood pressure values at a personal level. We confirmed that the computed treatment processes are actionable and consistent with clinical knowledge for lowering blood pressure. These results demonstrate that our framework can contribute to decision making in the medical field. Our framework can be expected to provide clinicians deeper insights by proposing concrete and actionable treatment process based on the ML model.