Accuracy
Beginner's guide to machine learning in R (with step-by-step tutorial)
If you're a graduate of economics, psychology, sociology, medicine, biostatistics, ecology, or related fields, you probably have received some training in statistics, but much less likely in machine learning. This is a problem because machine-learning algorithms are much better capable to solve many real-world applications compared with the procedures we learned in statistics class (randomized experiments, significance tests, correlation, ANOVA, linear regression, and so on). In all of these examples, statistical models are used to solve the problem, but in a different way than how you learned it in "Introduction to Statistics". In this post I want to give you a brief introduction what "machine learning" means, what the differences to "classical" statistical procedures are, and how you can train a machine learning model in R for your own use case in 8 simple steps. Think of a facial-recognition app. How does the app know whether it's John or rather Jane it's looking at? A conventional approach would be: Create an exhaustive list of features about John which can be quantitatively measured for the computer to memorize. E.g.: Look for short, brown hair, a three-day beard, a prominent nose, a scar on the left forehead, the distance between his eyes is 10.4 centimeters, he often wears a black hat, etc., that's John. The machine-learning approach works differently: You feed a computer many pictures labelled "John" or "Jane", and that's it, you don't provide any additional information โ rather, you let the machine infer the important features which best discern John from Jane. It might be that the form of the cheek bones are actually a better predictor of whether or not it's John on the image, rather than the hair color or the distance between the eyes. You don't care, you let the machine figure it out. Thus, this is a data-driven (inductive) approach, where a machine *learns* the rules how to classify faces (e.g., if X1 and X2 are present, then it's likely John) from a set of training data. You don't specify these rules manually. This is why machine learning is considered (a subfield of) artificial intelligence: The machine carries out tasks without being explicitly told what to do.
Naive Bayes Classifier -- How to Successfully Use It in Python?
The probability of randomly picking a red ball out of the bucket is 7/15. You can write it as P(red) 7/15. If we were to draw balls one at a time without replacing them, what is the probability of getting a black ball on a second attempt after drawing a red one on the first attempt? You can see that the above question is worded to provide us with the condition that needs to be satisfied first before the second attempt is made. That condition says that a red ball must be drawn during the first attempt.
Fair When Trained, Unfair When Deployed: Observable Fairness Measures are Unstable in Performative Prediction Settings
Mishler, Alan, Dalmasso, Niccolรฒ
Many popular algorithmic fairness measures depend on the joint distribution of predictions, outcomes, and a sensitive feature like race or gender. These measures are sensitive to distribution shift: a predictor which is trained to satisfy one of these fairness definitions may become unfair if the distribution changes. In performative prediction settings, however, predictors are precisely intended to induce distribution shift. For example, in many applications in criminal justice, healthcare, and consumer finance, the purpose of building a predictor is to reduce the rate of adverse outcomes such as recidivism, hospitalization, or default on a loan. We formalize the effect of such predictors as a type of concept shift--a particular variety of distribution shift--and show both theoretically and via simulated examples how this causes predictors which are fair when they are trained to become unfair when they are deployed. We further show how many of these issues can be avoided by using fairness definitions that depend on counterfactual rather than observable outcomes.
Off-Policy Fitted Q-Evaluation with Differentiable Function Approximators: Z-Estimation and Inference Theory
Zhang, Ruiqi, Zhang, Xuezhou, Ni, Chengzhuo, Wang, Mengdi
Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially deep neural networks, has gained practical success. While statistical analysis has proved FQE to be minimax-optimal with tabular, linear and several nonparametric function families, its practical performance with more general function approximator is less theoretically understood. We focus on FQE with general differentiable function approximators, making our theory applicable to neural function approximations. We approach this problem using the Z-estimation theory and establish the following results: The FQE estimation error is asymptotically normal with explicit variance determined jointly by the tangent space of the function class at the ground truth, the reward structure, and the distribution shift due to off-policy learning; The finite-sample FQE error bound is dominated by the same variance term, and it can also be bounded by function class-dependent divergence, which measures how the off-policy distribution shift intertwines with the function approximator. In addition, we study bootstrapping FQE estimators for error distribution inference and estimating confidence intervals, accompanied by a Cramer-Rao lower bound that matches our upper bounds. The Z-estimation analysis provides a generalizable theoretical framework for studying off-policy estimation in RL and provides sharp statistical theory for FQE with differentiable function approximators.
Heterogeneous Calibration: A post-hoc model-agnostic framework for improved generalization
Durfee, David, Gupta, Aman, Basu, Kinjal
We introduce the notion of heterogeneous calibration that applies a post-hoc model-agnostic transformation to model outputs for improving AUC performance on binary classification tasks. We consider overconfident models, whose performance is significantly better on training vs test data and give intuition onto why they might under-utilize moderately effective simple patterns in the data. We refer to these simple patterns as heterogeneous partitions of the feature space and show theoretically that perfectly calibrating each partition separately optimizes AUC. This gives a general paradigm of heterogeneous calibration as a post-hoc procedure by which heterogeneous partitions of the feature space are identified through tree-based algorithms and post-hoc calibration techniques are applied to each partition to improve AUC. While the theoretical optimality of this framework holds for any model, we focus on deep neural networks (DNNs) and test the simplest instantiation of this paradigm on a variety of open-source datasets. Experiments demonstrate the effectiveness of this framework and the future potential for applying higher-performing partitioning schemes along with more effective calibration techniques.
FCM-DNN: diagnosing coronary artery disease by deep accuracy Fuzzy C-Means clustering model
Joloudari, Javad Hassannataj, Saadatfar, Hamid, GhasemiGol, Mohammad, Alizadehsani, Roohallah, Sani, Zahra Alizadeh, Hasanzadeh, Fereshteh, Hassannataj, Edris, Sharifrazi, Danial, Mansor, Zulkefli
Cardiovascular disease is one of the most challenging diseases in middle-aged and older people, which causes high mortality. Coronary artery disease (CAD) is known as a common cardiovascular disease. A standard clinical tool for diagnosing CAD is angiography. The main challenges are dangerous side effects and high angiography costs. Today, the development of artificial intelligence-based methods is a valuable achievement for diagnosing disease. Hence, in this paper, artificial intelligence methods such as neural network (NN), deep neural network (DNN), and Fuzzy C-Means clustering combined with deep neural network (FCM-DNN) are developed for diagnosing CAD on a cardiac magnetic resonance imaging (CMRI) dataset. The original dataset is used in two different approaches. First, the labeled dataset is applied to the NN and DNN to create the NN and DNN models. Second, the labels are removed, and the unlabeled dataset is clustered via the FCM method, and then, the clustered dataset is fed to the DNN to create the FCM-DNN model. By utilizing the second clustering and modeling, the training process is improved, and consequently, the accuracy is increased. As a result, the proposed FCM-DNN model achieves the best performance with a 99.91% accuracy specifying 10 clusters, i.e., 5 clusters for healthy subjects and 5 clusters for sick subjects, through the 10-fold cross-validation technique compared to the NN and DNN models reaching the accuracies of 92.18% and 99.63%, respectively. To the best of our knowledge, no study has been conducted for CAD diagnosis on the CMRI dataset using artificial intelligence methods. The results confirm that the proposed FCM-DNN model can be helpful for scientific and research centers.
Prediction Sensitivity: Continual Audit of Counterfactual Fairness in Deployed Classifiers
Maughan, Krystal, Ngong, Ivoline C., Near, Joseph P.
As AI-based systems increasingly impact many areas of our lives, auditing these systems for fairness is an increasingly high-stakes problem. Traditional group fairness metrics can miss discrimination against individuals and are difficult to apply after deployment. Counterfactual fairness describes an individualized notion of fairness but is even more challenging to evaluate after deployment. We present prediction sensitivity, an approach for continual audit of counterfactual fairness in deployed classifiers. Prediction sensitivity helps answer the question: would this prediction have been different, if this individual had belonged to a different demographic group -- for every prediction made by the deployed model. Prediction sensitivity can leverage correlations between protected status and other features and does not require protected status information at prediction time. Our empirical results demonstrate that prediction sensitivity is effective for detecting violations of counterfactual fairness.
Fair SA: Sensitivity Analysis for Fairness in Face Recognition
Joshi, Aparna R., Suau, Xavier, Sivakumar, Nivedha, Zappella, Luca, Apostoloff, Nicholas
As the use of deep learning in high impact domains becomes ubiquitous, it is increasingly important to assess the resilience of models. One such high impact domain is that of face recognition, with real world applications involving images affected by various degradations, such as motion blur or high exposure. Moreover, images captured across different attributes, such as gender and race, can also challenge the robustness of a face recognition algorithm. While traditional summary statistics suggest that the aggregate performance of face recognition models has continued to improve, these metrics do not directly measure the robustness or fairness of the models. Visual Psychophysics Sensitivity Analysis (VPSA) [1] provides a way to pinpoint the individual causes of failure by way of introducing incremental perturbations in the data. However, perturbations may affect subgroups differently. In this paper, we propose a new fairness evaluation based on robustness in the form of a generic framework that extends VPSA. With this framework, we can analyze the ability of a model to perform fairly for different subgroups of a population affected by perturbations, and pinpoint the exact failure modes for a subgroup by measuring targeted robustness. With the increasing focus on the fairness of models, we use face recognition as an example application of our framework and propose to compactly visualize the fairness analysis of a model via AUC matrices. We analyze the performance of common face recognition models and empirically show that certain subgroups are at a disadvantage when images are perturbed, thereby uncovering trends that were not visible using the model's performance on subgroups without perturbations.
A new perspective on classification: optimally allocating limited resources to uncertain tasks
Vanderschueren, Toon, Baesens, Bart, Verdonck, Tim, Verbeke, Wouter
A central problem in business concerns the optimal allocation of limited resources to a set of available tasks, where the payoff of these tasks is inherently uncertain. In credit card fraud detection, for instance, a bank can only assign a small subset of transactions to their fraud investigations team. Typically, such problems are solved using a classification framework, where the focus is on predicting task outcomes given a set of characteristics. Resources are then allocated to the tasks that are predicted to be the most likely to succeed. However, we argue that using classification to address task uncertainty is inherently suboptimal as it does not take into account the available capacity. Therefore, we first frame the problem as a type of assignment problem. Then, we present a novel solution using learning to rank by directly optimizing the assignment's expected profit given limited, stochastic capacity. This is achieved by optimizing a specific instance of the net discounted cumulative gain, a commonly used class of metrics in learning to rank. Empirically, we demonstrate that our new method achieves higher expected profit and expected precision compared to a classification approach for a wide variety of application areas and data sets. This illustrates the benefit of an integrated approach and of explicitly considering the available resources when learning a predictive model.
Mithu
Detecting intrusions and anomalies in Industrial Control Systems at early stages is important to prevent process failure. Operator errors, device or equipment failures, and other non-network events could lead to a critical state. As a result, these events can indirectly lead to anomalous network traffic, and, thus, a manually configured IDS that uses network traffic alone can generate false positives and false negatives. In this paper, we propose a novel approach that uses multimodal machine learning and incorporates both network data and device state information to improve the detection accuracy. Our methodology can detect anomalies as well as their root causes, which is essential.