Audits as Evidence: Experiments, Ensembles, and Enforcement

arXiv.org Machine Learning

We develop tools for utilizing correspondence experiments to detect illegal discrimination by individual employers. Employers violate US employment law if their propensity to contact applicants depends on protected characteristics such as race or sex. We establish identification of higher moments of the causal effects of protected characteristics on callback rates as a function of the number of fictitious applications sent to each job ad. These moments are used to bound the fraction of jobs that illegally discriminate. Applying our results to three experimental datasets, we find evidence of significant employer heterogeneity in discriminatory behavior, with the standard deviation of gaps in job-specific callback probabilities across protected groups averaging roughly twice the mean gap. In a recent experiment manipulating racially distinctive names, we estimate that at least 85% of jobs that contact both of two white applications and neither of two black applications are engaged in illegal discrimination. To assess the tradeoff between type I and II errors presented by these patterns, we consider the performance of a series of decision rules for investigating suspicious callback behavior under a simple two-type model that rationalizes the experimental data. Though, in our preferred specification, only 17% of employers are estimated to discriminate on the basis of race, we find that an experiment sending 10 applications to each job would enable accurate detection of 7-10% of discriminators while falsely accusing fewer than 0.2% of non-discriminators. A minimax decision rule acknowledging partial identification of the joint distribution of callback rates yields higher error rates but more investigations than our baseline two-type model. Our results suggest illegal labor market discrimination can be reliably monitored with relatively small modifications to existing audit designs.


Learning Probabilistic Models of Word Sense Disambiguation

arXiv.org Artificial Intelligence

This dissertation presents several new methods of supervised and unsupervised learning of word sense disambiguation models. The supervised methods focus on performing model searches through a space of probabilistic models, and the unsupervised methods rely on the use of Gibbs Sampling and the Expectation Maximization (EM) algorithm. In both the supervised and unsupervised case, the Naive Bayesian model is found to perform well. An explanation for this success is presented in terms of learning rates and bias-variance decompositions.


Multivariate Anomaly Detection in Medicare using Model Residuals and Probabilistic Programming

AAAI Conferences

Anomalies in healthcare claims data can be indicative of possible fraudulent activities, contributing to a significant portion of overall healthcare costs. Medicare is a large government run healthcare program that serves the needs of the elderly in the United States. The increasing elderly population and their reliance on the Medicare program create an environment with rising costs and increased risk of fraud. The detection of these potentially fraudulent activities can recover costs and lessen the overall impact of fraud on the Medicare program. In this paper, we propose a new method to detect fraud by discovering outliers, or anomalies, in payments made to Medicare providers. We employ a multivariate outlier detection method split into two parts. In the first part, we create a multivariate regression model and generate corresponding residuals. In the second part, these residuals are used as inputs into a generalizable univariate probability model. We create this Bayesian probability model using probabilistic programming. Our results indicate our model is robust and less dependent on underlying data distributions, versus Mahalanobis distance. Moreover, we are able to demonstrate successful anomaly detection, within Medicare specialties, providing meaningful results for further investigation.


The Detection of Medicare Fraud Using Machine Learning Methods with Excluded Provider Labels

AAAI Conferences

With the overall increase in the elderly population comes additional, necessary medical needs and costs. Medicare is a U.S. healthcare program that provides insurance, primarily to individuals 65 years or older, to offload some of the financial burden associated with medical care. Even so, healthcare costs are high and continue to increase. Fraud is a major contributor to these inflating healthcare expenses. Our paper provides a comprehensive study leveraging machine learning methods to detect fraudulent Medicare providers. We use publicly available Medicare data and provider exclusions for fraud labels to build and assess three different learners. In order to lessen the impact of class imbalance, given so few actual fraud labels, we employ random undersampling creating four class distributions. Our results show that the C4.5 decision tree and logistic regression learners have the best fraud detection performance, particularly for the 80:20 class distribution with average AUC scores of 0.883 and 0.882, respectively, and low false negative rates. We successfully demonstrate the efficacy of employing machine learning with random undersampling to detect Medicare fraud.


DeepAISE -- An End-to-End Development and Deployment of a Recurrent Neural Survival Model for Early Prediction of Sepsis

arXiv.org Machine Learning

Abstract: Sepsis, a dysregulated immune system response to infection, is among the leading causes of morbidity, mortality, and cost overruns in the Intensive Care Unit (ICU). Ear ly prediction of sepsis can improve situational awareness amongst clinicians and facilitate timely, protective interventions. While the application of predictive analytics in ICU patients has shown early promising results, much of the work has been encumbe red by high false - alarm rates. Efforts to improve specificity have been limited by several factors, most notably the difficulty of labeling sepsis onset time and the low prevalence of septic - events in the ICU. We show that by coupling a clinical criterion for defining sepsis onset time with a treatment policy (e.g., initiation of antibiotics within one hour of meeting the criterion), one may rank the relative utility of various criteria through offline policy evaluation. Given the optimal criterion, DeepAISE automatically learns predictive features related to higher - order interactions and temporal patterns among clinic al risk factors that maximize the data likelihood of observed time to septic events. DeepAISE has been incorporated into a clinical workflow, which provides real - time hourly sepsis risk scores. A comparative study of four baseline models indicates that Dee pAISE produces the most accurate predictions (AUC 0.90 and 0.87) and the lowest false alarm rates (FAR 0.20 and 0.26) in two separate cohorts (internal and external, respectively), while simultaneously producing interpretable representations of the clinica l time series and risk factors. Introduction Sepsis is a syndromic, life - threatening condition that arises when the body's response to infection injures its own internal organs (1) . Though the condition lacks the same public notoriety as other conditions like heart attacks, 6% of all hospitalized patients in the U nited S tates carry a primary diagnosis of sepsis as compared to 2.5% for the latter (2) . When all hospital deaths are ultimately considered, nearly 35% are attributable to sepsis (2) . This condition stands in stark contrast to heart attacks which have a mortality rate of 2.7 - 9.6% and only cost the US $12.1 billion ann ually, roughly half of the cost of sepsis (3) .