Multivariate Anomaly Detection in Medicare using Model Residuals and Probabilistic Programming

AAAI Conferences

Anomalies in healthcare claims data can be indicative of possible fraudulent activities, contributing to a significant portion of overall healthcare costs. Medicare is a large government run healthcare program that serves the needs of the elderly in the United States. The increasing elderly population and their reliance on the Medicare program create an environment with rising costs and increased risk of fraud. The detection of these potentially fraudulent activities can recover costs and lessen the overall impact of fraud on the Medicare program. In this paper, we propose a new method to detect fraud by discovering outliers, or anomalies, in payments made to Medicare providers. We employ a multivariate outlier detection method split into two parts. In the first part, we create a multivariate regression model and generate corresponding residuals. In the second part, these residuals are used as inputs into a generalizable univariate probability model. We create this Bayesian probability model using probabilistic programming. Our results indicate our model is robust and less dependent on underlying data distributions, versus Mahalanobis distance. Moreover, we are able to demonstrate successful anomaly detection, within Medicare specialties, providing meaningful results for further investigation.


Evolutionary Clustering via Message Passing

arXiv.org Artificial Intelligence

We are often interested in clustering objects that evolve over time and identifying solutions to the clustering problem for every time step. Evolutionary clustering provides insight into cluster evolution and temporal changes in cluster memberships while enabling performance superior to that achieved by independently clustering data collected at different time points. In this paper we introduce evolutionary affinity propagation (EAP), an evolutionary clustering algorithm that groups data points by exchanging messages on a factor graph. EAP promotes temporal smoothness of the solution to clustering time-evolving data by linking the nodes of the factor graph that are associated with adjacent data snapshots, and introduces consensus nodes to enable cluster tracking and identification of cluster births and deaths. Unlike existing evolutionary clustering methods that require additional processing to approximate the number of clusters or match them across time, EAP determines the number of clusters and tracks them automatically. A comparison with existing methods on simulated and experimental data demonstrates effectiveness of the proposed EAP algorithm.


The Detection of Medicare Fraud Using Machine Learning Methods with Excluded Provider Labels

AAAI Conferences

With the overall increase in the elderly population comes additional, necessary medical needs and costs. Medicare is a U.S. healthcare program that provides insurance, primarily to individuals 65 years or older, to offload some of the financial burden associated with medical care. Even so, healthcare costs are high and continue to increase. Fraud is a major contributor to these inflating healthcare expenses. Our paper provides a comprehensive study leveraging machine learning methods to detect fraudulent Medicare providers. We use publicly available Medicare data and provider exclusions for fraud labels to build and assess three different learners. In order to lessen the impact of class imbalance, given so few actual fraud labels, we employ random undersampling creating four class distributions. Our results show that the C4.5 decision tree and logistic regression learners have the best fraud detection performance, particularly for the 80:20 class distribution with average AUC scores of 0.883 and 0.882, respectively, and low false negative rates. We successfully demonstrate the efficacy of employing machine learning with random undersampling to detect Medicare fraud.


Does Machine Learning Improve Prediction of VA Primary Care Reliance?

#artificialintelligence

Machine learning models, used to predict future use of primary care services from the Veterans Affairs (VA) Health Care System, did not outperform traditional regression models. ABSTRACT Objectives: The Veterans Affairs (VA) Health Care System is among the largest integrated health systems in the United States. Many VA enrollees are dual users of Medicare, and little research has examined methods to most accurately predict which veterans will be mostly reliant on VA services in the future. This study examined whether machine learning methods can better predict future reliance on VA primary care compared with traditional statistical methods. Study Design: Observational study of 83,143 VA patients dually enrolled in fee-for-service Medicare using VA and Medicare administrative databases and the 2012 Survey of Healthcare Experiences of Patients.


Deep Learning Approach for Predicting 30 Day Readmissions after Coronary Artery Bypass Graft Surgery

arXiv.org Machine Learning

Hospital Readmissions within 30 days after discharge following Coronary Artery Bypass Graft (CABG) Surgery are substantial contributors to healthcare costs. Many predictive models were developed to identify risk factors for readmissions. However, majority of the existing models use statistical analysis techniques with data available at discharge. We propose an ensembled model to predict CABG readmissions using pre-discharge perioperative data and machine learning survival analysis techniques. Firstly, we applied fifty one potential readmission risk variables to Cox Proportional Hazard (CPH) survival regression univariate analysis. Fourteen of them turned out to be significant (with p value < 0.05), contributing to readmissions. Subsequently, we applied these 14 predictors to multivariate CPH model and Deep Learning Neural Network (NN) representation of the CPH model, DeepSurv. We validated this new ensembled model with 453 isolated adult CABG cases. Nine of the fourteen perioperative risk variables were identified as the most significant with Hazard Ratios (HR) of greater than 1.0. The concordance index metrics for CPH, DeepSurv, and ensembled models were then evaluated with training and validation datasets. Our ensembled model yielded promising results in terms of c-statistics, as we raised the the number of iterations and data set sizes. 30 day all-cause readmissions among isolated CABG patients can be predicted more effectively with perioperative pre-discharge data, using machine learning survival analysis techniques. Prediction accuracy levels could be improved further with deep learning algorithms.