Inductive Learning
Semi-supervised Learning with Penalized Probabilistic Clustering
While clustering is usually an unsupervised operation, there are circum- stances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is proba- bilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the prefer- ences.
Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine la- beled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonpara- metric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is compu- tationally feasible for large datasets.
Learning Hyper-Features for Visual Identification
We address the problem of identifying specific instances of a class (cars) from a set of images all belonging to that class. Although we cannot build a model for any particular instance (as we may be provided with only one "training" example of it), we can use information extracted from observ- ing other members of the class. We pose this task as a learning problem, in which the learner is given image pairs, labeled as matching or not, and must discover which image features are most consistent for matching in- stances and discriminative for mismatches. We explore a patch based representation, where we model the distributions of similarity measure- ments defined on the patches. Finally, we describe an algorithm that selects the most salient patches based on a mutual information criterion.
Maximum Margin Semi-Supervised Learning for Structured Variables
Many real-world classification problems involve the prediction of multiple inter-dependent variables forming some structural depen- dency. Recent progress in machine learning has mainly focused on supervised classification of such structured variables. In this paper, we investigate structured classification in a semi-supervised setting. We present a discriminative approach that utilizes the intrinsic ge- ometry of input patterns revealed by unlabeled data points and we derive a maximum-margin formulation of semi-supervised learning for structured variables. Unlike transductive algorithms, our for- mulation naturally extends to new test points.
Structured Prediction via the Extragradient Method
We present a simple and scalable algorithm for large-margin estima- tion of structured models, including an important class of Markov net- works and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gra- dient and projection calculations. The projection step can be solved us- ing combinatorial algorithms for min-cost quadratic flow. This makes the approach an efficient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm.
Fast Gaussian Process Regression using KD-Trees
We consider (regression) estimation of a function x u(x) from noisy observations. If the data-generating process is not well understood, simple parametric learning algorithms, for example ones from the generalized linear model (GLM) family, may be hard to apply because of the difficulty of choosing good features. In contrast, the nonparametric Gaussian process (GP) model [19] offers a flexible and powerful alternative. However, a major drawback of GP models is that the computational cost of learning is about O(n 3), and the cost of making a single prediction is O(n), where n is the number of training examples. This high computational complexity severely limits its scalability to large problems, and we believe has proved a significant barrier to the wider adoption of the GP model.
Combining Graph Laplacians for Semi--Supervised Learning
A foundational problem in semi-supervised learning is the construction of a graph underlying the data. We propose to use a method which optimally combines a number of differently constructed graphs. We then compute an optimal combined kernel. This kernel solves an extended regularization problem which requires a joint minimization over both the data and the set of graph kernels. We present encouraging results on different OCR tasks where the optimal combined kernel is computed from graphs constructed with a variety of distances functions and the k ' in nearest neighbors.
Analysis of Spectral Kernel Design based Semi-supervised Learning
We consider a framework for semi-supervised learning using spectral decomposition based un-supervised kernel design. This approach sub- sumes a class of previously proposed semi-supervised learning methods on data graphs. We examine various theoretical properties of such meth- ods. In particular, we derive a generalization performance bound, and obtain the optimal kernel design by minimizing the bound. Based on the theoretical analysis, we are able to demonstrate why spectral kernel design based methods can often improve the predictive performance.
Active Learning for Misspecified Models
Active learning is the problem in supervised learning to design the loca- tions of training input points so that the generalization error is minimized. Existing active learning methods often assume that the model used for learning is correctly specified, i.e., the learning target function can be ex- pressed by the model at hand. In many practical situations, however, this assumption may not be fulfilled. In this paper, we first show that the ex- isting active learning method can be theoretically justified under slightly weaker condition: the model does not have to be correctly specified, but slightly misspecified models are also allowed. However, it turns out that the weakened condition is still restrictive in practice. To cope with this problem, we propose an alternative active learning method which can be theoretically justified for a wider class of misspecified models.
Boosting Structured Prediction for Imitation Learning
The Maximum Margin Planning (MMP) (Ratliff et al., 2006) algorithm solves imitation learning problems by learning linear mappings from features to cost functions in a planning domain. The learned policy is the result of minimum-cost planning using these cost functions. These mappings are chosen so that example policies (or trajectories) given by a teacher appear to be lower cost (with a lossscaled margin) than any other policy for a given planning domain. We provide a novel approach, M M P B O O S T, based on the functional gradient descent view of boosting (Mason et al., 1999; Friedman, 1999a) that extends MMP by "boosting" in new features. This approach uses simple binary classification or regression to improve performance of MMP imitation learning, and naturally extends to the class of structured maximum margin prediction problems.