Accuracy
Does mitigating ML's impact disparity require treatment disparity?
Lipton, Zachary C., Chouldechova, Alexandra, McAuley, Julian
Following related work in law and policy, two notions of disparity have come to shape the study of fairness in algorithmic decision-making. Algorithms exhibit treatment disparity if they formally treat members of protected subgroups differently; algorithms exhibit impact disparity when outcomes differ across subgroups, even if the correlation arises unintentionally. Naturally, we can achieve impact parity through purposeful treatment disparity. In one thread of technical work, papers aim to reconcile the two forms of parity proposing disparate learning processes (DLPs). Here, the learning algorithm can see group membership during training but produce a classifier that is group-blind at test time. In this paper, we show theoretically that: (i) When other features correlate to group membership, DLPs will (indirectly) implement treatment disparity, undermining the policy desiderata they are designed to address; (ii) When group membership is partly revealed by other features, DLPs induce within-class discrimination; and (iii) In general, DLPs provide a suboptimal trade-off between accuracy and impact parity. Based on our technical analysis, we argue that transparent treatment disparity is preferable to occluded methods for achieving impact parity. Experimental results on several real-world datasets highlight the practical consequences of applying DLPs vs. per-group thresholds.
Detecting Statistical Interactions from Neural Network Weights
Tsang, Michael, Cheng, Dehua, Liu, Yan
Interpreting neural networks is a crucial and challenging task in machine learning. In this paper, we develop a novel framework for detecting statistical interactions captured by a feedforward multilayer neural network by directly interpreting its learned weights. Depending on the desired interactions, our method can achieve significantly better or similar interaction detection performance compared to the state-of-the-art without searching an exponential solution space of possible interactions. We obtain this accuracy and efficiency by observing that interactions between input features are created by the non-additive effect of nonlinear activation functions, and that interacting paths are encoded in weight matrices. We demonstrate the performance of our method and the importance of discovered interactions via experimental results on both synthetic datasets and real-world application datasets.
Replacement AutoEncoder: A Privacy-Preserving Algorithm for Sensory Data Analysis
Malekzadeh, Mohammad, Clegg, Richard G., Haddadi, Hamed
An increasing number of sensors on mobile, Internet of things (IoT), and wearable devices generate time-series measurements of physical activities. Though access to the sensory data is critical to the success of many beneficial applications such as health monitoring or activity recognition, a wide range of potentially sensitive information about the individuals can also be discovered through access to sensory data and this cannot easily be protected using traditional privacy approaches. In this paper, we propose a privacy-preserving sensing framework for managing access to time-series data in order to provide utility while protecting individuals' privacy. We introduce Replacement AutoEncoder, a novel algorithm which learns how to transform discriminative features of data that correspond to sensitive inferences, into some features that have been more observed in non-sensitive inferences, to protect users' privacy. This efficiency is achieved by defining a user-customized objective function for deep autoencoders. Our replacement method will not only eliminate the possibility of recognizing sensitive inferences, it also eliminates the possibility of detecting the occurrence of them. That is the main weakness of other approaches such as filtering or randomization. We evaluate the efficacy of the algorithm with an activity recognition task in a multi-sensing environment using extensive experiments on three benchmark datasets. We show that it can retain the recognition accuracy of state-of-the-art techniques while simultaneously preserving the privacy of sensitive information. Finally, we utilize the GANs for detecting the occurrence of replacement, after releasing data, and show that this can be done only if the adversarial network is trained on the users' original data.
Testing for Feature Relevance: The HARVEST Algorithm
Weisberg, Herbert, Pontes, Victor, Thoma, Mathis
Feature selection with high-dimensional data and a very small proportion of relevant features poses a severe challenge to standard statistical methods. We have developed a new approach (HARVEST) that is straightforward to apply, albeit somewhat computer-intensive. This algorithm can be used to pre-screen a large number of features to identify those that are potentially useful. The basic idea is to evaluate each feature in the context of many random subsets of other features. HARVEST is predicated on the assumption that an irrelevant feature can add no real predictive value, regardless of which other features are included in the subset. Motivated by this idea, we have derived a simple statistical test for feature relevance. Empirical analyses and simulations produced so far indicate that the HARVEST algorithm is highly effective in predictive analytics, both in science and business.
Linear Optimal Low Rank Projection for High-Dimensional Multi-Class Data
Vogelstein, Joshua T., Tang, Minh, Bridgeford, Eric, Zheng, Da, Burns, Randal, Maggioni, Mauro
Classifying samples into categories becomes intractable when a single sample can have millions to billions of features, such as in genetics or imaging data. Principal Components Analysis (PCA) is widely used to identify a low-dimensional representation of such features for further analysis. However, PCA ignores class labels, such as whether or not a subject has cancer, thereby discarding information that could substantially improve downstream classification performance. We describe an approach, "Linear Optimal Low-rank" projection (LOL), which extends PCA by incorporating the class labels in a fashion that is advantageous over existing supervised dimensionality reduction techniques. We prove, and substantiate with synthetic experiments, that LOL leads to a better representation of the data for subsequent classification than other linear approaches, while adding negligible computational cost. We then demonstrate that LOL substantially outperforms PCA in differentiating cancer patients from healthy controls using genetic data, and in differentiating gender using magnetic resonance imaging data with $>$500 million features and 400 gigabytes of data. LOL therefore allows the solution of previous intractable problems, yet requires only a few minutes to run on a desktop computer.
DropLasso: A robust variant of Lasso for single cell RNA-seq data
Khalfaoui, Beyrem, Vert, Jean-Philippe
Single-cell RNA sequencing (scRNA-seq) is a fast growing approach to measure the genome-wide transcriptome of many individual cells in parallel, but results in noisy data with many dropout events. Existing methods to learn molecular signatures from bulk transcriptomic data may therefore not be adapted to scRNA-seq data, in order to automatically classify individual cells into predefined classes. We propose a new method called DropLasso to learn a molecular signature from scRNA-seq data. DropLasso extends the dropout regularisation technique, popular in neural network training, to esti- mate sparse linear models. It is well adapted to data corrupted by dropout noise, such as scRNA-seq data, and we clarify how it relates to elastic net regularisation. We provide promising results on simulated and real scRNA-seq data, suggesting that DropLasso may be better adapted than standard regularisa- tions to infer molecular signatures from scRNA-seq data.
On b-bit min-wise hashing for large-scale regression and classification with sparse data
Shah, Rajen D., Meinshausen, Nicolai
Large-scale regression problems where both the number of variables, $p$, and the number of observations, $n$, may be large and in the order of millions or more, are becoming increasingly more common. Typically the data are sparse: only a fraction of a percent of the entries in the design matrix are non-zero. Nevertheless, often the only computationally feasible approach is to perform dimension reduction to obtain a new design matrix with far fewer columns and then work with this compressed data. $b$-bit min-wise hashing (Li and Konig, 2011) is a promising dimension reduction scheme for sparse matrices which produces a set of random features such that regression on the resulting design matrix approximates a kernel regression with the resemblance kernel. In this work, we derive bounds on the prediction error of such regressions. For both linear and logistic models we show that the average prediction error vanishes asymptotically as long as $q \|\beta^*\|_2^2 /n \rightarrow 0$, where $q$ is the average number of non-zero entries in each row of the design matrix and $\beta^*$ is the coefficient of the linear predictor. We also show that ordinary least squares or ridge regression applied to the reduced data can in fact allow us fit more flexible models. We obtain non-asymptotic prediction error bounds for interaction models and for models where an unknown row normalisation must be applied in order for the signal to be linear in the predictors.
Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks
Liang, Shiyu, Li, Yixuan, Srikant, R.
We consider the problem of detecting out-of-distribution images in neural networks. We propose ODIN, a simple and effective method that does not require any change to a pre-trained neural network. Our method is based on the observation that using temperature scaling and adding small perturbations to the input can separate the softmax score distributions between in-and out-of-distribution images, allowing for more effective detection. We show in a series of experiments that ODIN is compatible with diverse network architectures and datasets. It consistently outperforms the baseline approach (Hendrycks & Gimpel, 2017) by a large margin, establishing a new state-of-the-art performance on this task. For example, ODIN reduces the false positive rate from the baseline 34.7% to 4.3% on the DenseNet (applied to CIFAR-10 and Tiny-ImageNet) when the true positive rate is 95%.
PSO-based Fuzzy Markup Language for Student Learning Performance Evaluation and Educational Application
Lee, Chang-Shing, Wang, Mei-Hui, Wang, Chi-Shiang, Teytaud, Olivier, Liu, Jialin, Lin, Su-Wei, Hung, Pi-Hsia
This paper proposes an agent with particle swarm optimization (PSO) based on a Fuzzy Markup Language (FML) for students learning performance evaluation and educational applications, and the proposed agent is according to the response data from a conventional test and an item response theory. First, we apply a GS-based parameter estimation mechanism to estimate the items parameters according to the response data, and then to compare its results with those of an IRT-based Bayesian parameter estimation mechanism. In addition, we propose a static-IRT test assembly mechanism to assemble a form for the conventional test. The presented FML-based dynamic assessment mechanism infers the probability of making a correct response to the item for a student with various abilities. Moreover, this paper also proposes a novel PFML learning mechanism for optimizing the parameters between items and students. Finally, we adopt a K-fold cross validation mechanism to evaluate the performance of the proposed agent. Experimental results show that the novel PFML learning mechanism for the parameter estimation and learning optimization performs favorably. We believe the proposed PFML will be a reference for education research and pedagogy and an important co-learning mechanism for future human-machine educational applications.
An investigation of the classifiers to detect android malicious apps
Sharma, Ashu, Sahay, Sanjay K.
Android devices are growing exponentially and are connected through the internet accessing billion of online websites. The popularity of these devices encourages malware developer to penetrate the market with malicious apps to annoy and disrupt the victim. Although, for the detection of malicious apps different approaches are discussed. However, proposed approaches are not suffice to detect the advanced malware to limit/prevent the damages. In this, very few approaches are based on opcode occurrence to classify the malicious apps. Therefore, this paper investigates the five classifiers using opcodes occurrence as the prominent features for the detection of malicious apps. For the analysis, we use WEKA tool and found that FT detection accuracy ( 79.27%) is best among the investigated classifiers. However, true positives rate i.e. malware detection rate is highest ( 99.91%) by RF and fluctuate least with the different number of prominent features compared to other studied classifiers. The analysis shows that overall accuracy is majorly affected by the false positives of the classifier.