Unsupervised or Indirectly Supervised Learning
Learning with Local and Global Consistency
We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive in- ference. A principled approach to semi-supervised learning is to design a classifying function which is suf(cid:2)ciently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of clas- si(cid:2)cation problems and demonstrates effective use of unlabeled data.
Semi-Supervised Learning with Trees
We describe a nonparametric Bayesian approach to generalizing from few labeled examples, guided by a larger set of unlabeled objects and the assumption of a latent tree-structure to the domain. The tree (or a distribution over trees) may be inferred using the unlabeled data. A prior over concepts generated by a mutation process on the inferred tree(s) allows efficient computation of the optimal Bayesian classification func- tion from the labeled examples. We test our approach on eight real-world datasets.
A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
We consider the situation in semi-supervised learning, where the "label sampling" mechanism stochastically depends on the true response (as well as potentially on the features). We suggest a method of moments for estimating this stochastic dependence using the unlabeled data. This is potentially useful for two distinct purposes: a. As an input to a super- vised learning procedure which can be used to "de-bias" its results using labeled data only and b. We present several examples to illustrate the practical usefulness of our method.
Sampling Methods for Unsupervised Learning
We present an algorithm to overcome the local maxima problem in es- timating the parameters of mixture models. It combines existing ap- proaches from both EM and a robust fitting algorithm, RANSAC, to give a data-driven stochastic learning scheme. Minimal subsets of data points, sufficient to constrain the parameters of the model, are drawn from pro- posal densities to discover new regions of high likelihood. The proposal densities are learnt using EM and bias the sampling toward promising solutions. The algorithm is computationally efficient, as well as effective at escaping from local maxima. We compare it with alternative methods, including EM and RANSAC, on both challenging synthetic data and the computer vision problem of alpha-matting.
Semi-supervised Learning via Gaussian Processes
We present a probabilistic approach to learning a Gaussian Process classifier in the presence of unlabeled data. Our approach involves a "null category noise model" (NCNM) inspired by ordered cate- gorical noise models. The noise model reflects an assumption that the data density is lower between the class-conditional densities. We illustrate our approach on a toy problem and present compar- ative results for the semi-supervised classification of handwritten digits.
Semi-supervised Learning on Directed Graphs
Given a directed graph in which some of the nodes are labeled, we inves- tigate the question of how to exploit the link structure of the graph to infer the labels of the remaining unlabeled nodes. To that extent we propose a regularization framework for functions de(cid:2)ned over nodes of a directed graph that forces the classi(cid:2)cation function to change slowly on densely linked subgraphs. A powerful, yet computationally simple classi(cid:2)cation algorithm is derived within the proposed framework. The experimental evaluation on real-world Web classi(cid:2)cation problems demonstrates en- couraging results that validate our approach.
Semi-supervised Learning by Entropy Minimization
We consider the semi-supervised learning problem, where a decision rule is to be learned from labeled and unlabeled data. In this framework, we motivate minimum entropy regularization, which enables to incorporate unlabeled data in the standard supervised learning. A series of experiments illustrates that the proposed solu- tion benefits from unlabeled data. The method challenges mixture mod- els when the data are sampled from the distribution class spanned by the generative model. The performances are definitely in favor of minimum entropy regularization when generative models are misspecified, and the weighting of unlabeled data provides robustness to the violation of the "cluster assumption".
Co-Validation: Using Model Disagreement on Unlabeled Data to Validate Classification Algorithms
In the context of binary classification, we define disagreement as a mea- sure of how often two independently-trained models differ in their clas- sification of unlabeled data. We explore the use of disagreement for error estimation and model selection. We call the procedure co-validation, since the two models effectively (in)validate one another by comparing results on unlabeled data, which we assume is relatively cheap and plen- tiful compared to labeled data. We show that per-instance disagreement is an unbiased estimate of the variance of error for that instance. We also show that disagreement provides a lower bound on the prediction (gen- eralization) error, and a tight upper bound on the "variance of prediction error", or the variance of the average error across instances, where vari- ance is measured across training sets.
Semi-supervised Learning with Penalized Probabilistic Clustering
While clustering is usually an unsupervised operation, there are circum- stances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is proba- bilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the prefer- ences.
Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine la- beled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonpara- metric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is compu- tationally feasible for large datasets.