Goto

Collaborating Authors

 Statistical Learning


Optimal Kernel Shapes for Local Linear Regression

Neural Information Processing Systems

Local linear regression performs very well in many low-dimensional forecasting problems. In high-dimensional spaces, its performance typically decays due to the well-known "curse-of-dimensionality". A possible way to approach this problem is by varying the "shape" of the weighting kernel. In this work we suggest a new, data-driven method to estimating the optimal kernel shape. Experiments using anartificially generated data set and data from the UC Irvine repository show the benefits of kernel shaping. 1 Introduction Local linear regression has attracted considerable attention in both statistical and machine learning literature as a flexible tool for nonparametric regression analysis [Cle79, FG96, AMS97]. Like most statistical smoothing approaches, local modeling suffers from the so-called "curse-of-dimensionality", the well-known fact that the proportion of the training data that lie in a fixed-radius neighborhood of a point decreases to zero at an exponential rate with increasing dimension of the input space.


Large Margin DAGs for Multiclass Classification

Neural Information Processing Systems

We present a new learning architecture: the Decision Directed Acyclic Graph (DDAG), which is used to combine many two-class classifiers into a multiclass classifier. For an N -class problem, the DDAG contains N(N-1)/2 classifiers, one for each pair of classes. We present a VC analysis of the case when the node classifiers are hyperplanes; the resulting boundon the test error depends on N and on the margin achieved at the nodes, but not on the dimension of the space. This motivates an algorithm, DAGSVM, which operates in a kernel-induced feature space and uses two-class maximal margin hyperplanes at each decision-node of the DDAG. The DAGSVM is substantially faster to train and evaluate thaneither the standard algorithm or Max Wins, while maintaining comparable accuracy to both of these algorithms. 1 Introduction The problem of multiclass classificatIon, especially for systems like SVMs, doesn't present an easy solution. It is generally simpler to construct classifier theory and algorithms for two mutually-exclusive classes than for N mutually-exclusive classes.


Maximum Entropy Discrimination

Neural Information Processing Systems

We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involvedistributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under thisclass and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class-conditional models within this framework.


A Geometric Interpretation of v-SVM Classifiers

Neural Information Processing Systems

We show that the recently proposed variant of the Support Vector machine (SVM) algorithm, known as v-SVM, can be interpreted as a maximal separation between subsets of the convex hulls of the data, which we call soft convex hulls. The soft convex hulls are controlled by choice of the parameter v. If the intersection of the convex hulls is empty, the hyperplane is positioned halfway between them such that the distance between convex hulls, measured along the normal, is maximized; and if it is not, the hyperplane's normal is similarly determined by the soft convex hulls, but its position (perpendicular distance from the origin) is adjusted to minimize the error sum. The proposed geometric interpretation of v-SVM also leads to necessary and sufficient conditions for the existence of a choice of v for which the v-SVM solution is nontrivial. 1 Introduction Recently, SchOlkopf et al. [I) introduced a new class of SVM algorithms, called v-SVM, for both regression estimation and pattern recognition. The basic idea is to remove the user-chosen error penalty factor C that appears in SVM algorithms by introducing a new variable p which, in the pattern recognition case, adds another degree of freedom to the margin.


Support Vector Method for Novelty Detection

Neural Information Processing Systems

Suppose you are given some dataset drawn from an underlying probability distributionP and you want to estimate a "simple" subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified



Constructing Heterogeneous Committees Using Input Feature Grouping: Application to Economic Forecasting

Neural Information Processing Systems

Yuansong Liao and John Moody Department of Computer Science, Oregon Graduate Institute, P.O.Box 91000, Portland, OR 97291-1000 Abstract The committee approach has been proposed for reducing model uncertainty and improving generalization performance. The advantage ofcommittees depends on (1) the performance of individual members and (2) the correlational structure of errors between members. This paper presents an input grouping technique for designing aheterogeneous committee. With this technique, all input variables are first grouped based on their mutual information. Statistically similarvariables are assigned to the same group.



Image Representations for Facial Expression Coding

Neural Information Processing Systems

The Facial Action Coding System (FACS) (9) is an objective method for quantifying facial movement in terms of component actions. This system is widely used in behavioral investigations of emotion, cognitive processes, and social interaction. The coding ispresently performed by highly trained human experts. This paper explores and compares techniques for automatically recognizing facialactions in sequences of images. These methods include unsupervised learning techniques for finding basis images such as principal component analysis, independent component analysis and local feature analysis, and supervised learning techniques such as Fisher's linear discriminants.


A SNoW-Based Face Detector

Neural Information Processing Systems

A novel learning approach for human face detection using a network of linear units is presented. The SNoW learning architecture is a sparse network of linear functions over a predefined or incrementally learnedfeature space and is specifically tailored for learning in the presence of a very large number of features. A wide range of face images in different poses, with different expressions and under different lighting conditions are used as a training set to capture the variations of human faces. Experimental results on commonly used benchmark data sets of a wide range of face images show that the SNoW-based approach outperforms methods that use neural networks, Bayesian methods, support vector machines and others. Furthermore,learning and evaluation using the SNoW-based method are significantly more efficient than with other methods. 1 Introduction Growing interest in intelligent human computer interactions has motivated a recent surge in research on problems such as face tracking, pose estimation, face expression and gesture recognition. Most methods, however, assume human faces in their input images have been detected and localized.