Accuracy
Classification Uncertainty of Deep Neural Networks Based on Gradient Information
Oberdiek, Philipp, Rottmann, Matthias, Gottschalk, Hanno
We study the quantification of uncertainty of Convolutional Neural Networks (CNNs) based on gradient metrics. Unlike the classical softmax entropy, such metrics gather information from all layers of the CNN. We show for the (E)MNIST data set that for several such metrics we achieve the same meta classification accuracy -- i.e. the task of classifying correctly predicted labels as correct and incorrectly predicted ones as incorrect without knowing the actual label -- as for entropy thresholding. Meta classification rates for out of sample images can be increased when using entropy together with several gradient based metrics as input quantities for a meta-classifier. This proves that our gradient based metrics do not contain the same information as the entropy. We also apply meta classification to concepts not used during training: EMNIST/Omniglot letters, CIFAR10 and noise. Meta classifiers only trained on the uncertainty metrics of classes available during training usually do not perform equally well for all the unknown concepts letters, CIFAR10 and uniform noise. If we however allow the meta classifier to be trained on uncertainty metrics including some samples of some or all of the categories, meta classification for concepts remote from MNIST digits can be improved considerably.
The Best of AI: New Articles Published This Month (April 2018)
We begin this journey through our favorite articles of the month by discovering and reading one of the most influential paper of natural language processing (NLP). Papers can be intimidating to read. So I liked the author's idea of combining screenshots from the original paper, explanations in plain English and code snippets with an actual implementation of the paper in Python. The article can be read at different levels: from the presentation of encoders, decoders and the attention function to real-world examples including the use of regularization and GPU training. One of the world-famous universities of California -- Berkleley -- launches its new data science program.
Why I've lost faith in p values
There has been a lot written over the past decade (and even longer) about problems associated with null hypothesis statistical testing (NHST) and p values. Personally, I have found most of these arguments unconvincing. However, one of the problems with p values has been gnawing at me for the past couple years, and it has finally gotten to the point that I'm thinking about abandoning p values. Note: this has nothing to do with p-hacking (which is a huge but separate issue). Here's the problem in a nutshell: If you run 1000 experiments over the course of your career, and you get a significant effect (p .05) in 95 of those experiments, you might expect that 5% of these 95 significant effects would be false positives.
Minimax Lower Bounds for Cost Sensitive Classification
Kamalaruban, Parameswaran, Williamson, Robert C.
The central problem of this paper is the cost-sensitive binary classification problem, where different costs are associated with different types of mistakes. Several important machine learning applications such as medical decision making, targeted marketing, and intrusion detection can be naturally formalized as costsensitive classification setup ([1]). In these domains, the cost of missing a target is much higher than that of a false-positive, and classifiers that do not take misclassification costs into account do not perform well. The cost-sensitive classification problem has been extensively studied, and people have developed efficient algorithms with provable guarantees on the (generalization) error [6, 9, 26, 27, 11, 4]. These methods primarily take existing classification methods based on empirical risk minimization and try to adapt them in various ways to be sensitive to these misclassification costs. Despite all these efforts, the understanding of the fundamental limits of this problem is still missing. In this paper, we study the hardness of this problem by obtaining minimax lower bounds. In particular, we are interested in understanding how the cost parameter influences the hardness or complexity of the cost-sensitive classification. Minimax Lower Bounds Understanding the hardness or fundamental limits of a learning problem is important for practice for the following reasons: - They give an estimate on the number of samples required for a good performance of a learning algorithm.
Datasheets for Datasets
Gebru, Timnit, Morgenstern, Jamie, Vecchione, Briana, Vaughan, Jennifer Wortman, Wallach, Hanna, Daumeรฉ, Hal III, Crawford, Kate
Currently there is no standard way to identify how a dataset was created, and what characteristics, motivations, and potential skews it represents. To begin to address this issue, we propose the concept of a datasheet for datasets, a short document to accompany public datasets, commercial APIs, and pretrained models. The goal of this proposal is to enable better communication between dataset creators and users, and help the AI community move toward greater transparency and accountability. By analogy, in computer hardware, it has become industry standard to accompany everything from the simplest components (e.g., resistors), to the most complex microprocessor chips, with datasheets detailing standard operating characteristics, test results, recommended usage, and other information. We outline some of the questions a datasheet for datasets should answer. These questions focus on when, where, and how the training data was gathered, its recommended use cases, and, in the case of human-centric datasets, information regarding the subjects' demographics and consent as applicable. We develop prototypes of datasheets for two well-known datasets: Labeled Faces in The Wild~\cite{lfw} and the Pang \& Lee Polarity Dataset~\cite{polarity}.
Change Point Methods on a Sequence of Graphs
Zambon, Daniele, Alippi, Cesare, Livi, Lorenzo
The present paper considers a finite sequence of graphs, e.g., coming from technological, biological, and social networks, each of which is modelled as a realization of a graph-valued random variable, and proposes a methodology to identify possible changes in stationarity in its generating stochastic process. In order to cover a large class of applications, we consider a general family of attributed graphs, chatacterized by a possible variable topology (edges and vertices) also in the stationary case. A Change Point Method (CPM) approach is proposed, that (i) maps graphs into a vector domain; (ii) applies a suitable statistical test; (iii) detects the change --if any-- according to a confidence level and provides an estimate for its time of occurrence. Two specific CPMs are proposed: one detecting shifts in the distribution mean, the other addressing generic changes affecting the distribution. We ground our proposal with theoretical results showing how to relate the inference attained in the numerical vector space to the graph domain, and vice versa. Finally, simulations on epileptic-seizure detection problems are conducted on real-world data providing evidence for the CPMs effectiveness.
Spectral feature scaling method for supervised dimensionality reduction
Matsuda, Momo, Morikuni, Keiichi, Sakurai, Tetsuya
Spectral dimensionality reduction methods enable linear separations of complex data with high-dimensional features in a reduced space. However, these methods do not always give the desired results due to irregularities or uncertainties of the data. Thus, we consider aggressively modifying the scales of the features to obtain the desired classification. Using prior knowledge on the labels of partial samples to specify the Fiedler vector, we formulate an eigenvalue problem of a linear matrix pencil whose eigenvector has the feature scaling factors. The resulting factors can modify the features of entire samples to form clusters in the reduced space, according to the known labels. In this study, we propose new dimensionality reduction methods supervised using the feature scaling associated with the spectral clustering. Numerical experiments show that the proposed methods outperform well-established supervised methods for toy problems with more samples than features, and are more robust regarding clustering than existing methods. Also, the proposed methods outperform existing methods regarding classification for real-world problems with more features than samples of gene expression profiles of cancer diseases. Furthermore, the feature scaling tends to improve the clustering and classification accuracies of existing unsupervised methods, as the proportion of training data increases.
Using A Personalized Anomaly Detection Approach with Machine Learning to Detect Stolen Phones
Hu, Huizhong (Florida Institute of Technology) | Chan, Philip K. (Florida Institute of Technology)
We devise an anomaly detection system that detects stolen phones. In this system, we use a mining algorithm to extract sequential patterns from a userโs past behavior to construct a personalized model. We then put forward scoring functions and threshold setting strategies to detect stealing events. We evaluate our approach with a data set from the MIT Reality Mining project. Experimental results indicate that our approach can detect 87% of simulated stealing events with an average false positive rate of 0.9%.
The Detection of Medicare Fraud Using Machine Learning Methods with Excluded Provider Labels
Bauder, Richard A. (Florida Atlantic University) | Khoshgoftaar, Taghi M. (Florida Atlantic University)
With the overall increase in the elderly population comes additional, necessary medical needs and costs. Medicare is a U.S. healthcare program that provides insurance, primarily to individuals 65 years or older, to offload some of the financial burden associated with medical care. Even so, healthcare costs are high and continue to increase. Fraud is a major contributor to these inflating healthcare expenses. Our paper provides a comprehensive study leveraging machine learning methods to detect fraudulent Medicare providers. We use publicly available Medicare data and provider exclusions for fraud labels to build and assess three different learners. In order to lessen the impact of class imbalance, given so few actual fraud labels, we employ random undersampling creating four class distributions. Our results show that the C4.5 decision tree and logistic regression learners have the best fraud detection performance, particularly for the 80:20 class distribution with average AUC scores of 0.883 and 0.882, respectively, and low false negative rates. We successfully demonstrate the efficacy of employing machine learning with random undersampling to detect Medicare fraud.
Learning Graph Embeddings on Constant-Curvature Manifolds for Change Detection in Graph Streams
Grattarola, Daniele, Zambon, Daniele, Alippi, Cesare, Livi, Lorenzo
The space of graphs is characterized by a non-trivial geometry, which often complicates performing inference in practical applications. A common approach is to use embedding techniques to represent graphs as points in a conventional Euclidean space, but non-Euclidean spaces are often better suited for embedding graphs. Among these, constant curvature manifolds (CCMs), like hyperspheres and hyperboloids, offer a computationally tractable way to compute metric, yet non-Euclidean, geodesic distances. In this paper, we introduce a novel adversarial graph embedding technique to represent graphs on CCMs, and exploit such a mapping for detecting changes in stationarity in a graph-generating process. To this end, we introduce a novel family of change detection tests operating by means of distances on CCMs. We perform experiments on synthetic graph streams, and on sequences of functional networks extracted from iEEG data with the aim of detecting the onset of epileptic seizures. We show that our methods are able to detect extremely small changes in the graph-generating process, consistently outperforming solutions based on Euclidean embeddings. The general nature of our framework highlights its potential to be extended to other applications characterized by graph data or non-Euclidean geometries.