Goto

Collaborating Authors

 Unsupervised or Indirectly Supervised Learning


Unsupervised Learning of Mixtures of Multiple Causes in Binary Data

Neural Information Processing Systems

This paper presents a formulation for unsupervised learning of clus(cid:173) ters reflecting multiple causal structure in binary data. Unlike the standard mixture model, a multiple cause model accounts for ob(cid:173) served data by combining assertions from many hidden causes, each of which can pertain to varying degree to any subset of the observ(cid:173) able dimensions. A crucial issue is the mixing-function for combin(cid:173) ing beliefs from different cluster-centers in order to generate data reconstructions whose errors are minimized both during recognition and learning. We demonstrate a weakness inherent to the popular weighted sum followed by sigmoid squashing, and offer an alterna(cid:173) tive form of the nonlinearity. Results are presented demonstrating the algorithm's ability successfully to discover coherent multiple causal representat.ions of noisy test data and in images of printed characters.


Using Unlabeled Data for Supervised Learning

Neural Information Processing Systems

Many classification problems have the property that the only costly part of obtaining examples is the class label. This paper suggests a simple method for using distribution information contained in unlabeled examples to augment labeled examples in a supervised training framework. Empirical tests show that the technique de(cid:173) scribed in this paper can significantly improve the accuracy of a supervised learner when the learner is well below its asymptotic accuracy level.


Unsupervised Learning by Convex and Conic Coding

Neural Information Processing Systems

Unsupervised learning algorithms based on convex and conic en(cid:173) coders are proposed. The encoders find the closest convex or conic combination of basis vectors to the input. The learning algorithms produce basis vectors that minimize the reconstruction error of the encoders. The convex algorithm develops locally linear models of the input, while the conic algorithm discovers features. Both al(cid:173) gorithms are used to model handwritten digits and compared with vector quantization and principal component analysis.


Probabilistic Modeling for Face Orientation Discrimination: Learning from Labeled and Unlabeled Data

Neural Information Processing Systems

This paper presents probabilistic modeling methods to solve the problem of dis(cid:173) criminating between five facial orientations with very little labeled data. The first model maintains no inter-pixel dependencies, the second model is capable of modeling a set of arbitrary pair-wise dependencies, and the last model allows dependencies only between neighboring pixels. We show that for all three of these models, the accuracy of the learned models can be greatly improved by augmenting a small number of labeled training images with a large set of unlabeled images using Expectation-Maximization. This is important because it is often difficult to obtain image labels, while many unla(cid:173) beled images are readily available. Through a large set of empirical tests, we examine the benefits of unlabeled data for each of the models.


Generalized Model Selection for Unsupervised Learning in High Dimensions

Neural Information Processing Systems

We describe a Bayesian approach to model selection in unsupervised learning that determines both the feature set and the number of clusters. We then evaluate this scheme (based on marginal likelihood) and one based on cross-validated likelihood. For the Bayesian scheme we derive a closed-form solution of the marginal likelihood by assuming appropriate forms of the likelihood function and prior. Extensive experiments compare these approaches and all results are verified by comparison against ground truth. In these experiments the Bayesian scheme using our objective function gave better results than cross-validation.


Unsupervised Learning of Human Motion Models

Neural Information Processing Systems

This paper presents an unsupervised learning algorithm that can derive the probabilistic dependence structure of parts of an object (a moving hu- man body in our examples) automatically from unlabeled data. The dis- tinguished part of this work is that it is based on unlabeled data, i.e., the training features include both useful foreground parts and background clutter and the correspondence between the parts and detected features are unknown. We use decomposable triangulated graphs to depict the probabilistic independence of parts, but the unsupervised technique is not limited to this type of graph. In the new approach, labeling of the data (part assignments) is taken as hidden variables and the EM algo- rithm is applied. A greedy algorithm is developed to select parts and to search for the optimal structure based on the differential entropy of these variables.


Semi-supervised MarginBoost

Neural Information Processing Systems

In many discrimination problems a large amount of data is available but only a few of them are labeled. This provides a strong motivation to improve or develop methods for semi-supervised learning. In this paper, boosting is generalized to this task within the optimization framework of MarginBoost . We extend the margin definition to unlabeled data and develop the gradient descent algorithm that corresponds to the resulting margin cost function. This meta-learning scheme can be applied to any base classifier able to benefit from unlabeled data.


Iterative Double Clustering for Unsupervised and Semi-Supervised Learning

Neural Information Processing Systems

We present a powerful meta-clustering technique called Iterative Dou- ble Clustering (IDC). The IDC method is a natural extension of the recent Double Clustering (DC) method of Slonim and Tishby that ex- hibited impressive performance on text categorization tasks [12]. Us- ing synthetically generated data we empirically flnd that whenever the DC procedure is successful in recovering some of the structure hidden in the data, the extended IDC procedure can incrementally compute a signiflcantly more accurate classiflcation. IDC is especially advan- tageous when the data exhibits high attribute noise. Our simulation results also show the efiectiveness of IDC in text categorization prob- lems.


Probabilistic principles in unsupervised learning of visual structure: human data and a model

Neural Information Processing Systems

To find out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the condi- tional probabilities of the constituent fragments, and (2) the value of Bar- low's criterion of "suspicious coincidence" (the ratio of joint probability to the product of marginals). We then compared the part verification re- sponse times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for tar- gets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the sig- nificance of their co-occurrence as estimated by Barlow's criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites.


Adaptation and Unsupervised Learning

Neural Information Processing Systems

Adaptation is a ubiquitous neural and psychological phenomenon, with a wealth of instantiations and implications. Although a basic form of plasticity, it has, bar some notable exceptions, attracted computational theory of only one main variety. In this paper, we study adaptation from the perspective of factor analysis, a paradigmatic technique of unsuper- vised learning. We use factor analysis to re-interpret a standard view of adaptation, and apply our new model to some recent data on adaptation in the domain of face discrimination.