Goto

Collaborating Authors

 Accuracy


Modeling Mistrust in End-of-Life Care

arXiv.org Artificial Intelligence

In this work, we characterize the doctor-patient relationship using a machine learning-derived trust score. We show that this score has statistically significant racial associations, and that by modeling trust directly we find stronger disparities in care than by stratifying on race. We further demonstrate that mistrust is indicative of worse outcomes, but is only weakly associated with physiologically-created severity scores. Finally, we describe sentiment analysis experiments indicating patients with higher levels of mistrust have worse experiences and interactions with their caregivers. This work is a step towards measuring fairer machine learning in the healthcare domain.


Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints

arXiv.org Machine Learning

Classifiers can be trained with data-dependent constraints to satisfy fairness goals, reduce churn, achieve a targeted false positive rate, or other policy goals. We study the generalization performance for such constrained optimization problems, in terms of how well the constraints are satisfied at evaluation time, given that they are satisfied at training time. To improve generalization performance, we frame the problem as a two-player game where one player optimizes the model parameters on a training dataset, and the other player enforces the constraints on an independent validation dataset. We build on recent work in two-player constrained optimization to show that if one uses this two-dataset approach, then constraint generalization can be significantly improved. As we illustrate experimentally, this approach works not only in theory, but also in practice.


How AI Can Help Prevent Fraud

#artificialintelligence

One of the most pressing concerns that keeps retail professionals up at night is how to combat fraud. Retailers could lose upwards of $71 billion from fraudulent online transactions over the next few years, yet some executives feel that publicly acknowledging a fraud issue would harm their brand. One of the most significant fraud concerns merchants face today are false positives -- i.e., transactions attempted by legitimate customers that are tagged as suspicious by fraud prevention systems, ultimately leaving money on the table. Because their effect is so difficult to accurately measure, false positives are often ignored, and their cost greatly underestimated. However, a majority of retailers say that fraudulent transactions that aren't detected cost more than a legitimate transaction that's inaccurately declined, despite some evidence that the opposite is true. What's more, relatively few companies track false positives.


Choosing the Right Metric for Evaluating Machine Learning Models -- Part 2

#artificialintelligence

In the first blog, we discussed some important metrics used in regression, their pros and cons, and use cases. This part will focus on commonly used metrics in classification, why should we prefer some over others with context. Let's first understand the basic terminology used in classification problems before going through the pros and cons of each method. You can skip this section if you are already familiar with the terminology. The probabilistic interpretation of ROC-AUC score is that if you randomly choose a positive case and a negative case, the probability that the positive case outranks the negative case according to the classifier is given by the AUC.


Proxy Fairness

arXiv.org Machine Learning

We consider the problem of improving fairness when one lacks access to a dataset labeled with protected groups, making it difficult to take advantage of strategies that can improve fairness but require protected group labels, either at training or runtime. To address this, we investigate improving fairness metrics for proxy groups, and test whether doing so results in improved fairness for the true sensitive groups. Results on benchmark and real-world datasets demonstrate that such a proxy fairness strategy can work well in practice. However, we caution that the effectiveness likely depends on the choice of fairness metric, as well as how aligned the proxy groups are with the true protected groups in terms of the constrained model parameters.


Recursive Neural Networks in Quark/Gluon Tagging

arXiv.org Machine Learning

Since the machine learning techniques are improving rapidly, it has been shown that the image recognition techniques in deep neural networks can be used to detect jet substructure. And it turns out that deep neural networks can match or outperform traditional approach of expert features. However, there are disadvantages such as sparseness of jet images. Based on the natural tree-like structure of jet sequential clustering, the recursive neural networks (RecNNs), which embed jet clustering history recursively as in natural language processing, have a better behavior when confronted with these problems. We thus try to explore the performance of RecNNs in quark/gluon discrimination. The results show that RecNNs work better than the baseline boosted decision tree (BDT) by a few percent in gluon rejection rate. However, extra implementation of particle flow identification only increases the performance slightly. We also experimented on some relevant aspects which might influence the performance of the networks. It shows that even taking only particle flow identification as input feature without any extra information on momentum or angular position is already giving a fairly good result, which indicates that the most of the information for quark/gluon discrimination is already included in the tree-structure itself. As a bonus, a rough up/down quark jets discrimination is also explored.


Beyond One-hot Encoding: lower dimensional target embedding

arXiv.org Artificial Intelligence

Target encoding plays a central role when learning Convolutional Neural Networks. In this realm, One-hot encoding is the most prevalent strategy due to its simplicity. However, this so widespread encoding schema assumes a flat label space, thus ignoring rich relationships existing among labels that can be exploited during training. In large-scale datasets, data does not span the full label space, but instead lies in a low-dimensional output manifold. Following this observation, we embed the targets into a low-dimensional space, drastically improving convergence speed while preserving accuracy. Our contribution is two fold: (i) We show that random projections of the label space are a valid tool to find such lower dimensional embeddings, boosting dramatically convergence rates at zero computational cost; and (ii) we propose a normalized eigenrepresentation of the class manifold that encodes the targets with minimal information loss, improving the accuracy of random projections encoding while enjoying the same convergence rates. Experiments on CIFAR-100, CUB200-2011, Imagenet, and MIT Places demonstrate that the proposed approach drastically improves convergence speed while reaching very competitive accuracy rates.


Request-and-Reverify: Hierarchical Hypothesis Testing for Concept Drift Detection with Expensive Labels

arXiv.org Artificial Intelligence

One important assumption underlying common classification models is the stationarity of the data. However, in real-world streaming applications, the data concept indicated by the joint distribution of feature and label is not stationary but drifting over time. Concept drift detection aims to detect such drifts and adapt the model so as to mitigate any deterioration in the model's predictive performance. Unfortunately, most existing concept drift detection methods rely on a strong and over-optimistic condition that the true labels are available immediately for all already classified instances. In this paper, a novel Hierarchical Hypothesis Testing framework with Request-and-Reverify strategy is developed to detect concept drifts by requesting labels only when necessary. Two methods, namely Hierarchical Hypothesis Testing with Classification Uncertainty (HHT-CU) and Hierarchical Hypothesis Testing with Attribute-wise "Goodness-of-fit" (HHT-AG), are proposed respectively under the novel framework. In experiments with benchmark datasets, our methods demonstrate overwhelming advantages over state-of-the-art unsupervised drift detectors. More importantly, our methods even outperform DDM (the widely used supervised drift detector) when we use significantly fewer labels.


This Japanese AI security camera shows the future of surveillance will be automated

#artificialintelligence

The world of automated surveillance is booming, with new machine learning techniques giving CCTV cameras the ability to spot troubling behavior without human supervision. And sooner or later, this tech will be coming to a store near you -- as illustrated by a new AI security cam built by Japanese telecom giant NTT East and startup Earth Eyes Corp. The security camera is called the "AI Guardman" and is designed to help shop owners in Japan spot potential shoplifters. It uses open source technology developed by Carnegie Mellon University to scan live video streams and estimate the poses of any bodies it can see. The system then tries to match this pose data to predefined'suspicious' behavior. If it sees something noteworthy, it alerts shopkeepers via a connected app.


A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis

arXiv.org Artificial Intelligence

Online symptom checkers have significant potential to improve patient care, however their reliability and accuracy remain variable. We hypothesised that an artificial intelligence (AI) powered triage and diagnostic system would compare favourably with human doctors with respect to triage and diagnostic accuracy. We performed a prospective validation study of the accuracy and safety of an AI powered triage and diagnostic system. Identical cases were evaluated by both an AI system and human doctors. Differential diagnoses and triage outcomes were evaluated by an independent judge, who was blinded from knowing the source (AI system or human doctor) of the outcomes. Independently of these cases, vignettes from publicly available resources were also assessed to provide a benchmark to previous studies and the diagnostic component of the MRCGP exam. Overall we found that the Babylon AI powered Triage and Diagnostic System was able to identify the condition modelled by a clinical vignette with accuracy comparable to human doctors (in terms of precision and recall). In addition, we found that the triage advice recommended by the AI System was, on average, safer than that of human doctors, when compared to the ranges of acceptable triage provided by independent expert judges, with only a minimal reduction in appropriateness.