Performance Analysis
A Large Deviation Bound for the Area Under the ROC Curve
Agarwal, Shivani, Graepel, Thore, Herbrich, Ralf, Roth, Dan
The area under the ROC curve (AUC) has been advocated as an evaluation criterion for the bipartite ranking problem. We study large deviation properties of the AUC; in particular, we derive a distribution-free large deviation bound for the AUC which serves to bound the expected accuracy of a ranking function in terms of its empirical AUC on an independent test sequence. A comparison of our result with a corresponding large deviation result for the classification error rate suggests that the test sample size required to obtain an ษ-accurate estimate of the expected accuracy of a ranking function with ฮด-confidence is larger than that required to obtain an ษ-accurate estimate of the expected error rate of a classification function with the same confidence. A simple application of the union bound allows the large deviation bound to be extended to learned ranking functions chosen from finite function classes.
Supervised Graph Inference
Vert, Jean-philippe, Yamanishi, Yoshihiro
We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated asan optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction fromgenomic data.
Synergistic Face Detection and Pose Estimation with Energy-Based Models
Osadchy, Margarita, Miller, Matthew L., Cun, Yann L.
We describe a novel method for real-time, simultaneous multi-view face detection and facial pose estimation. The method employs a convolutional networkto map face images to points on a manifold, parametrized by pose, and non-face images to points far from that manifold. This network is trained by optimizing a loss function of three variables: image, pose,and face/non-face label. We test the resulting system, in a single configuration, on three standard data sets - one for frontal pose, one for rotated faces, and one for profiles - and find that its performance on each set is comparable to previous multi-view face detectors that can only handle one form of pose variation. We also show experimentally that the system's accuracy on both face detection and pose estimation is improved by training for the two tasks together.
Face Detection --- Efficient and Rank Deficient
Kienzle, Wolf, Franz, Matthias O., Schรถlkopf, Bernhard, Bakir, Gรถkhan H.
This paper proposes a method for computing fast approximations to support vectordecision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized inputspace points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning largeimages, this decreases the computational complexity by a significant amount.Experimental results show that in face detection, rank deficient approximations are 4 to 6 times faster than unconstrained reduced setsystems.
Result Analysis of the NIPS 2003 Feature Selection Challenge
Guyon, Isabelle, Gunn, Steve, Ben-Hur, Asa, Dror, Gideon
The NIPS 2003 workshops included a feature selection competition organizedby the authors. We provided participants with five datasets from different application domains and called for classification resultsusing a minimal number of features. The competition took place over a period of 13 weeks and attracted 78 research groups. Participants were asked to make online submissions on the validation and test sets, with performance on the validation set being presented immediately to the participant and performance on the test set presented to the participants at the workshop. In total 1863 entries were made on the validation sets during the development period and 135 entries on all test sets for the final competition. The winners used a combination of Bayesian neural networkswith ARD priors and Dirichlet diffusion trees. Other top entries used a variety of methods for feature selection, which combined filters and/or wrapper or embedded methods using Random Forests,kernel methods, or neural networks as a classification engine. The results of the benchmark (including the predictions made by the participants and the features they selected) and the scoring software are publicly available. The benchmark is available at www.nipsfsc.ecs.soton.ac.uk for post-challenge submissions to stimulate further research.
Confidence Intervals for the Area Under the ROC Curve
Cortes, Corinna, Mohri, Mehryar
In many applications, good ranking is a highly desirable performance for a classifier. The criterion commonly used to measure the ranking quality of a classification algorithm is the area under the ROC curve (AUC). To report it properly, it is crucial to determine an interval of confidence for its value. This paper provides confidence intervals for the AUC based on a statistical and combinatorial analysis using only simple parameters such as the error rate and the number of positive and negative examples. The analysis is distribution-independent, it makes no assumption about the distribution of the scores of negative or positive examples. The results are of practical use and can be viewed as the equivalent for AUC of the standard confidence intervals given in the case of the error rate. They are compared with previous approaches in several standard classification tasks demonstrating the benefits of our analysis.
Distributed Information Regularization on Graphs
Corduneanu, Adrian, Jaakkola, Tommi S.
We provide a principle for semi-supervised learning based on optimizing the rate of communicating labels for unlabeled points with side information. Theside information is expressed in terms of identities of sets of points or regions with the purpose of biasing the labels in each region to be the same. The resulting regularization objective is convex, has a unique solution, and the solution can be found with a pair of local propagation operationson graphs induced by the regions. We analyze the properties of the algorithm and demonstrate its performance on document classificationtasks.