Accuracy
Result Analysis of the NIPS 2003 Feature Selection Challenge
Guyon, Isabelle, Gunn, Steve, Ben-Hur, Asa, Dror, Gideon
The NIPS 2003 workshops included a feature selection competition organizedby the authors. We provided participants with five datasets from different application domains and called for classification resultsusing a minimal number of features. The competition took place over a period of 13 weeks and attracted 78 research groups. Participants were asked to make online submissions on the validation and test sets, with performance on the validation set being presented immediately to the participant and performance on the test set presented to the participants at the workshop. In total 1863 entries were made on the validation sets during the development period and 135 entries on all test sets for the final competition. The winners used a combination of Bayesian neural networkswith ARD priors and Dirichlet diffusion trees. Other top entries used a variety of methods for feature selection, which combined filters and/or wrapper or embedded methods using Random Forests,kernel methods, or neural networks as a classification engine. The results of the benchmark (including the predictions made by the participants and the features they selected) and the scoring software are publicly available. The benchmark is available at www.nipsfsc.ecs.soton.ac.uk for post-challenge submissions to stimulate further research.
Synergistic Face Detection and Pose Estimation with Energy-Based Models
Osadchy, Margarita, Miller, Matthew L., Cun, Yann L.
We describe a novel method for real-time, simultaneous multi-view face detection and facial pose estimation. The method employs a convolutional networkto map face images to points on a manifold, parametrized by pose, and non-face images to points far from that manifold. This network is trained by optimizing a loss function of three variables: image, pose,and face/non-face label. We test the resulting system, in a single configuration, on three standard data sets - one for frontal pose, one for rotated faces, and one for profiles - and find that its performance on each set is comparable to previous multi-view face detectors that can only handle one form of pose variation. We also show experimentally that the system's accuracy on both face detection and pose estimation is improved by training for the two tasks together.
Large-Scale Prediction of Disulphide Bond Connectivity
Cheng, Jianlin, Vullo, Alessandro, Baldi, Pierre F.
The formation of disulphide bridges among cysteines is an important feature ofprotein structures. Here we develop new methods for the prediction ofdisulphide bond connectivity. We first build a large curated data set of proteins containing disulphide bridges and then use 2-Dimensional Recursive Neural Networks to predict bonding probabilities between cysteine pairs.These probabilities in turn lead to a weighted graph matching problem that can be addressed efficiently. We show how the method consistently achievesbetter results than previous approaches on the same validation data. In addition, the method can easily cope with chains with arbitrary numbers of bonded cysteines. Therefore, it overcomes one of the major limitations of previous approaches restricting predictions to chains containing no more than 10 oxidized cysteines. The method can be applied both to situations where the bonded state of each cysteine is known or unknown, in which case bonded state can be predicted with 85% precision and 90% recall. The method also yields an estimate for the total number of disulphide bridges in each chain.
Face Detection --- Efficient and Rank Deficient
Kienzle, Wolf, Franz, Matthias O., Schรถlkopf, Bernhard, Bakir, Gรถkhan H.
This paper proposes a method for computing fast approximations to support vectordecision functions in the field of object detection. In the present approach we are building on an existing algorithm where the set of support vectors is replaced by a smaller, so-called reduced set of synthesized inputspace points. In contrast to the existing method that finds the reduced set via unconstrained optimization, we impose a structural constraint on the synthetic points such that the resulting approximations can be evaluated via separable filters. For applications that require scanning largeimages, this decreases the computational complexity by a significant amount.Experimental results show that in face detection, rank deficient approximations are 4 to 6 times faster than unconstrained reduced setsystems.
Supervised Graph Inference
Vert, Jean-philippe, Yamanishi, Yoshihiro
We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated asan optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction fromgenomic data.
Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)
Chellapilla, Kumar, Simard, Patrice Y.
Machine learning is often used to automatically solve human tasks. In this paper, we look for tasks where machine learning algorithms are not as good as humans with the hope of gaining insight into their current limitations. We studied various Human Interactive Proofs (HIPs) on the market, because they are systems designed to tell computers and humans apart by posing challenges presumably too hard for computers. We found that most HIPs are pure recognition tasks which can easily be broken using machine learning.
Confidence Intervals for the Area Under the ROC Curve
Cortes, Corinna, Mohri, Mehryar
In many applications, good ranking is a highly desirable performance for a classifier. The criterion commonly used to measure the ranking quality of a classification algorithm is the area under the ROC curve (AUC). To report it properly, it is crucial to determine an interval of confidence for its value. This paper provides confidence intervals for the AUC based on a statistical and combinatorial analysis using only simple parameters such as the error rate and the number of positive and negative examples. The analysis is distribution-independent, it makes no assumption about the distribution of the scores of negative or positive examples. The results are of practical use and can be viewed as the equivalent for AUC of the standard confidence intervals given in the case of the error rate. They are compared with previous approaches in several standard classification tasks demonstrating the benefits of our analysis.
Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains
There has been increased interest in devising learning techniques that combine unlabeled data with labeled data - i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sample-selection bias on the semi -supervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias.