Multivariate Anomaly Detection in Medicare using Model Residuals and Probabilistic Programming

AAAI Conferences

Anomalies in healthcare claims data can be indicative of possible fraudulent activities, contributing to a significant portion of overall healthcare costs. Medicare is a large government run healthcare program that serves the needs of the elderly in the United States. The increasing elderly population and their reliance on the Medicare program create an environment with rising costs and increased risk of fraud. The detection of these potentially fraudulent activities can recover costs and lessen the overall impact of fraud on the Medicare program. In this paper, we propose a new method to detect fraud by discovering outliers, or anomalies, in payments made to Medicare providers. We employ a multivariate outlier detection method split into two parts. In the first part, we create a multivariate regression model and generate corresponding residuals. In the second part, these residuals are used as inputs into a generalizable univariate probability model. We create this Bayesian probability model using probabilistic programming. Our results indicate our model is robust and less dependent on underlying data distributions, versus Mahalanobis distance. Moreover, we are able to demonstrate successful anomaly detection, within Medicare specialties, providing meaningful results for further investigation.


Artificial Intelligence Effectively Assesses Cell Therapy Functionality

#artificialintelligence

A fully automated artificial intelligence (AI)-based multispectral absorbance imaging system effectively classified function and potency of induced pluripotent stem cell derived retinal pigment epithelial cells (iPSC-RPE) from patients with age-related macular degeneration (AMD). The finding from the system could be applied to assessing future cellular therapies, according to research presented at the 2018 ARVO annual meeting. The software, which uses convolutional neural network (CNN) deep learning algorithms, effectively evaluated release criterion for the iPSC-RPE cell-based therapy in a standard, reproducible, and cost-effective fashion. The AI-based analysis was as specific and sensitive as traditional molecular and physiological assays, without the need for human intervention. "Cells can be classified with high accuracy using nothing but absorbance images," wrote lead investigator Nathan Hotaling and colleagues from the National Institutes of Health in their poster.


DeepAISE -- An End-to-End Development and Deployment of a Recurrent Neural Survival Model for Early Prediction of Sepsis

arXiv.org Machine Learning

Abstract: Sepsis, a dysregulated immune system response to infection, is among the leading causes of morbidity, mortality, and cost overruns in the Intensive Care Unit (ICU). Ear ly prediction of sepsis can improve situational awareness amongst clinicians and facilitate timely, protective interventions. While the application of predictive analytics in ICU patients has shown early promising results, much of the work has been encumbe red by high false - alarm rates. Efforts to improve specificity have been limited by several factors, most notably the difficulty of labeling sepsis onset time and the low prevalence of septic - events in the ICU. We show that by coupling a clinical criterion for defining sepsis onset time with a treatment policy (e.g., initiation of antibiotics within one hour of meeting the criterion), one may rank the relative utility of various criteria through offline policy evaluation. Given the optimal criterion, DeepAISE automatically learns predictive features related to higher - order interactions and temporal patterns among clinic al risk factors that maximize the data likelihood of observed time to septic events. DeepAISE has been incorporated into a clinical workflow, which provides real - time hourly sepsis risk scores. A comparative study of four baseline models indicates that Dee pAISE produces the most accurate predictions (AUC 0.90 and 0.87) and the lowest false alarm rates (FAR 0.20 and 0.26) in two separate cohorts (internal and external, respectively), while simultaneously producing interpretable representations of the clinica l time series and risk factors. Introduction Sepsis is a syndromic, life - threatening condition that arises when the body's response to infection injures its own internal organs (1) . Though the condition lacks the same public notoriety as other conditions like heart attacks, 6% of all hospitalized patients in the U nited S tates carry a primary diagnosis of sepsis as compared to 2.5% for the latter (2) . When all hospital deaths are ultimately considered, nearly 35% are attributable to sepsis (2) . This condition stands in stark contrast to heart attacks which have a mortality rate of 2.7 - 9.6% and only cost the US $12.1 billion ann ually, roughly half of the cost of sepsis (3) .


Deep Learning Approach for Predicting 30 Day Readmissions after Coronary Artery Bypass Graft Surgery

arXiv.org Machine Learning

Hospital Readmissions within 30 days after discharge following Coronary Artery Bypass Graft (CABG) Surgery are substantial contributors to healthcare costs. Many predictive models were developed to identify risk factors for readmissions. However, majority of the existing models use statistical analysis techniques with data available at discharge. We propose an ensembled model to predict CABG readmissions using pre-discharge perioperative data and machine learning survival analysis techniques. Firstly, we applied fifty one potential readmission risk variables to Cox Proportional Hazard (CPH) survival regression univariate analysis. Fourteen of them turned out to be significant (with p value < 0.05), contributing to readmissions. Subsequently, we applied these 14 predictors to multivariate CPH model and Deep Learning Neural Network (NN) representation of the CPH model, DeepSurv. We validated this new ensembled model with 453 isolated adult CABG cases. Nine of the fourteen perioperative risk variables were identified as the most significant with Hazard Ratios (HR) of greater than 1.0. The concordance index metrics for CPH, DeepSurv, and ensembled models were then evaluated with training and validation datasets. Our ensembled model yielded promising results in terms of c-statistics, as we raised the the number of iterations and data set sizes. 30 day all-cause readmissions among isolated CABG patients can be predicted more effectively with perioperative pre-discharge data, using machine learning survival analysis techniques. Prediction accuracy levels could be improved further with deep learning algorithms.


Estimating and Controlling the False Discovery Rate for the PC Algorithm Using Edge-Specific P-Values

arXiv.org Machine Learning

The PC algorithm allows investigators to estimate a complete partially directed acyclic graph (CPDAG) from a finite dataset, but few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and controls the FDR across the edges. PC-p specifically uses the p-values returned by many conditional independence tests to upper bound the p-values of more complex edge-specific hypothesis tests. The algorithm then estimates and controls the FDR using the bounded p-values and the Benjamini-Yekutieli FDR procedure. Modifications to the original PC algorithm also help PC-p accurately compute the upper bounds despite non-zero Type II error rates. Experiments show that PC-p yields more accurate FDR estimation and control across the edges in a variety of CPDAGs compared to alternative methods.