Accuracy
GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings
Sikora, Marek, Wróbel, Łukasz, Gudyś, Adam
GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings Marek Sikora a,b,, Łukasz Wróbel a,b,, Adam Gudyś a, a Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland b Institute of Innovative Technologies, EMAG, Leopolda 31, 40-189 Katowice, PolandAbstract This article presents GuideR, a user-guided rule induction algorithm, which overcomes the largest limitation of the existing methods---the lack of the possibility to introduce user's preferences or domain knowledge to the rule learning process. Automatic selection of attributes and attribute ranges often leads to the situation in which resulting rules do not contain interesting information. We propose an induction algorithm which takes into account user's requirements. Our method uses the sequential covering approach and is suitable for classification, regression, and survival analysis problems. The effectiveness of the algorithm in all these tasks has been verified experimentally, confirming guided rule induction to be a powerful data analysis tool. Introduction Sequential covering rule induction algorithms can be used for both, predictive and descriptive purposes [1, 2, 3, 4]. In spite of the development of increasingly sophisticated versions of those algorithms [5, 6], the main principle remains unchanged and involves two phases: rule growing and rule pruning. In the latter, some of these conditions are removed. In comparison to other machine learning methods, rule sets obtained by sequential covering algorithm, also known as separate-and-conquer strategy (SnC), are characterized by good predictive as well as descriptive capabilities. Taking into consideration only the former, superior results can often be obtained using other methods, e.g. However, data models obtained this way are much less comprehensible than rule sets. In the case of rule learning for descriptive purposes, the algorithms of association rule induction [12, 13, 14] or subgroup discovery [15, 6], are applied. The former leads to a very large number of rules which must then be limited by filtering according to rule interestingness measures [16, 17, 18]. Nevertheless, rule sets obtained by subgroup discovery are characterized by worse predictive abilities than those generated by the standard sequential covering approach. Therefore, if creating a prediction system with comprehensible data model is the main objective, the application of sequential covering rule induction algorithms provides the most sensible solution.
ML-Leaks: Model and Data Independent Membership Inference Attacks and Defenses on Machine Learning Models
Salem, Ahmed, Zhang, Yang, Humbert, Mathias, Fritz, Mario, Backes, Michael
Machine learning (ML) has become a core component of many real-world applications and training data is a key factor that drives current progress. This huge success has led Internet companies to deploy machine learning as a service (MLaaS). Recently, the first membership inference attack has shown that extraction of information on the training set is possible in such MLaaS settings, which has severe security and privacy implications. However, the early demonstrations of the feasibility of such attacks have many assumptions on the adversary such as using multiple so-called shadow models, knowledge of the target model structure and having a dataset from the same distribution as the target model's training data. We relax all 3 key assumptions, thereby showing that such attacks are very broadly applicable at low cost and thereby pose a more severe risk than previously thought. We present the most comprehensive study so far on this emerging and developing threat using eight diverse datasets which show the viability of the proposed attacks across domains. In addition, we propose the first effective defense mechanisms against such broader class of membership inference attacks that maintain a high level of utility of the ML model.
Intentional Control of Type I Error over Unconscious Data Distortion: a Neyman-Pearson Approach to Text Classification
Xia, Lucy, Zhao, Richard, Wu, Yanhui, Tong, Xin
Digital texts have become an increasingly important source of data for social studies. However, textual data from open platforms are vulnerable to manipulation (e.g., censorship and information inflation), often leading to bias in subsequent empirical analysis. This paper investigates the problem of data distortion in text classification when controlling type I error (a relevant textual message is classified as irrelevant) is the priority. The default classical classification paradigm that minimizes the overall classification error can yield an undesirably large type I error, and data distortion exacerbates this situation. As a solution, we propose the Neyman-Pearson (NP) classification paradigm which minimizes type II error under a user-specified type I error constraint. Theoretically, we show that while the classical oracle (i.e., optimal classifier) cannot be recovered under unknown data distortion even if one has the entire post-distortion population, the NP oracle is unaffected by data distortion and can be recovered under the same condition. Empirically, we illustrate the advantage of NP classification methods in a case study that classifies posts about strikes and corruption published on a leading Chinese blogging platform.
Accounting for the Neglected Dimensions of AI Progress
Martínez-Plumed, Fernando, Avin, Shahar, Brundage, Miles, Dafoe, Allan, hÉigeartaigh, Sean Ó, Hernández-Orallo, José
We analyze and reframe AI progress. In addition to the prevailing metrics of performance, we highlight the usually neglected costs paid in the development and deployment of a system, including: data, expert knowledge, human oversight, software resources, computing cycles, hardware and network facilities, development time, etc. These costs are paid throughout the life cycle of an AI system, fall differentially on different individuals, and vary in magnitude depending on the replicability and generality of the AI solution. The multidimensional performance and cost space can be collapsed to a single utility metric for a user with transitive and complete preferences. Even absent a single utility function, AI advances can be generically assessed by whether they expand the Pareto (optimal) surface. We explore a subset of these neglected dimensions using the two case studies of Alpha* and ALE. This broadened conception of progress in AI should lead to novel ways of measuring success in AI, and can help set milestones for future progress.
What is ROC and AUC? – Vikrant Jain – Medium
ROC (Receiver Operating Characteristics) -- It was originated from signal detection theory. Now it is heavily used by Data Miners, Economists and in Machine Learning. It basically shows the trade-off between the true positive rate (TPR) and false positive rate (FPR). So we compare the actual Vs predicted and find the TPR and FPR and plot a graph for all the data points (below). In this case the curve which we get is called ROC.
Variable Selection for Nonparametric Learning with Power Series Kernels
Matsui, Kota, Kumagai, Wataru, Kanamori, Kenta, Nishikimi, Mitsuaki, Kanamori, Takafumi
In this paper, we propose a variable selection method for general nonparametric kernel-based estimation. The proposed method consists of two-stage estimation: (1) construct a consistent estimator of the target function, (2) approximate the estimator using a few variables by l1-type penalized estimation. We see that the proposed method can be applied to various kernel nonparametric estimation such as kernel ridge regression, kernel-based density and density-ratio estimation. We prove that the proposed method has the property of the variable selection consistency when the power series kernel is used. This result is regarded as an extension of the variable selection consistency for the non-negative garrote to the kernel-based estimators. Several experiments including simulation studies and real data applications show the effectiveness of the proposed method.
A Fast and Scalable Joint Estimator for Integrating Additional Knowledge in Learning Multiple Related Sparse Gaussian Graphical Models
Wang, Beilun, Sekhon, Arshdeep, Qi, Yanjun
We consider the problem of including additional knowledge in estimating sparse Gaussian graphical models (sGGMs) from aggregated samples, arising often in bioinformatics and neuroimaging applications. Previous joint sGGM estimators either fail to use existing knowledge or cannot scale-up to many tasks (large $K$) under a high-dimensional (large $p$) situation. In this paper, we propose a novel \underline{J}oint \underline{E}lementary \underline{E}stimator incorporating additional \underline{K}nowledge (JEEK) to infer multiple related sparse Gaussian Graphical models from large-scale heterogeneous data. Using domain knowledge as weights, we design a novel hybrid norm as the minimization objective to enforce the superposition of two weighted sparsity constraints, one on the shared interactions and the other on the task-specific structural patterns. This enables JEEK to elegantly consider various forms of existing knowledge based on the domain at hand and avoid the need to design knowledge-specific optimization. JEEK is solved through a fast and entry-wise parallelizable solution that largely improves the computational efficiency of the state-of-the-art $O(p^5K^4)$ to $O(p^2K^4)$. We conduct a rigorous statistical analysis showing that JEEK achieves the same convergence rate $O(\log(Kp)/n_{tot})$ as the state-of-the-art estimators that are much harder to compute. Empirically, on multiple synthetic datasets and two real-world data, JEEK outperforms the speed of the state-of-arts significantly while achieving the same level of prediction accuracy.
Multi-layer Kernel Ridge Regression for One-class Classification
Gautam, Chandan, Tiwari, Aruna, Suresh, Sundaram, Iosifidis, Alexandros
In this paper, a multi-layer architecture (in a hierarchical fashion) by stacking various Kernel Ridge Regression (KRR) based Auto-Encoder for one-class classification is proposed and is referred as MKOC. MKOC has many layers of Auto-Encoders to project the input features into new feature space and the last layer was regression based one class classifier. The Auto-Encoders use an unsupervised approach of learning and the final layer uses semi-supervised (trained by only positive samples) approach of learning. The proposed MKOC is experimentally evaluated on 15 publicly available benchmark datasets. Experimental results verify the effectiveness of the proposed approach over 11 existing state-of-the-art kernel-based one-class classifiers. Friedman test is also performed to verify the statistical significance of the claim of the superiority of the proposed one-class classifiers over the existing state-of-the-art methods.
Multiaccuracy: Black-Box Post-Processing for Fairness in Classification
Kim, Michael P., Ghorbani, Amirata, Zou, James
Machine learning predictors are successfully deployed in applications ranging from disease diagnosis, to predicting credit scores, to image recognition. Even when the overall accuracy is high, the predictions often have systematic biases that harm specific subgroups, especially for subgroups that are minorities in the training data. We develop a rigorous framework of multiaccuracy auditing and post-processing to improve predictor accuracies across identifiable subgroups. Our algorithm, MultiaccuracyBoost, works in any setting where we have black-box access to a predictor and a relatively small set of labeled data for auditing. We prove guarantees on the convergence rate of the algorithm and show that it improves overall accuracy at each step. Importantly, if the initial model is accurate on an identifiable subgroup, then the post-processed model will be also. We demonstrate the effectiveness of this approach on diverse applications in image classification, finance, and population health. MultiaccuracyBoost can improve subpopulation accuracy (e.g. for `black women') even when the sensitive features (e.g. `race', `gender') are not known to the algorithm.
Assessing Generative Models via Precision and Recall
Sajjadi, Mehdi S. M., Bachem, Olivier, Lucic, Mario, Bousquet, Olivier, Gelly, Sylvain
Sylvain Gelly Google Brain Mario Lucic Google Brain Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as Fréchet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since they yield one-dimensional scores. We propose a novel definition of precision and recall for distributions which disentangles the divergence into two separate dimensions. The proposed notion is intuitive, retains desirable properties, and naturally leads to an efficient algorithm that can be used to evaluate generative models. We relate this notion to total variation as well as to recent evaluation metrics such as Inception Score and FID. To demonstrate the practical utility of the proposed approach we perform an empirical study on several variants of Generative Adversarial Networks and the Variational Autoencoder. In an extensive set of experiments we show that the proposed metric is able to disentangle the quality of generated samples from the coverage of the target distribution.