Goto

Collaborating Authors

 Unsupervised or Indirectly Supervised Learning


A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context

Neural Information Processing Systems

Integrating semantic and syntactic analysis is essential for document analysis. Using an analogous reasoning, we present an approach that combines bag-of-words and spatial models to perform semantic and syntactic analysis for recognition of an object based on its internal appearance and its context. We argue that while object recognition requires modeling relative spatial locations of image features within the object, a bag-of-word is sufficient for representing context. Learning such a model from weakly labeled data involves labeling of features into two classes: foreground(object) or ''informative'' background(context). labeling. We present a ''shape-aware'' model which utilizes contour information for efficient and accurate labeling of features in the image. Our approach iterates between an MCMC-based labeling and contour based labeling of features to integrate co-occurrence of features and shape similarity.


A Rate Distortion Approach for Semi-Supervised Conditional Random Fields

Neural Information Processing Systems

We propose a novel information theoretic approach for semi-supervised learning of conditional random fields. Our approach defines a training objective that combines the conditional likelihood on labeled data and the mutual information on unlabeled data. Different from previous minimum conditional entropy semi-supervised discriminative learning methods, our approach can be naturally cast into the rate distortion theory framework in information theory. We analyze the tractability of the framework for structured prediction and present a convergent variational training algorithm to defy the combinatorial explosion of terms in the sum over label configurations. Our experimental results show that the rate distortion approach outperforms standard $l_2$ regularization and minimum conditional entropy regularization on both multi-class classification and sequence labeling problems.


Semi-supervised Learning using Sparse Eigenfunction Bases

Neural Information Processing Systems

We present a new framework for semi-supervised learning with sparse eigenfunction bases of kernel matrices. It turns out that when the \emph{cluster assumption} holds, that is, when the high density regions are sufficiently separated by low density valleys, each high density area corresponds to a unique representative eigenvector. Linear combination of such eigenvectors (or, more precisely, of their Nystrom extensions) provide good candidates for good classification functions. By first choosing an appropriate basis of these eigenvectors from unlabeled data and then using labeled data with Lasso to select a classifier in the span of these eigenvectors, we obtain a classifier, which has a very sparse representation in this basis. Importantly, the sparsity appears naturally from the cluster assumption. Experimental results on a number of real-world data-sets show that our method is competitive with the state of the art semi-supervised learning algorithms and outperforms the natural base-line algorithm (Lasso in the Kernel PCA basis).


Augmenting Feature-driven fMRI Analyses: Semi-supervised learning and resting state activity

Neural Information Processing Systems

Resting state activity is brain activation that arises in the absence of any task, and is usually measured in awake subjects during prolonged fMRI scanning sessions where the only instruction given is to close the eyes and do nothing. It has been recognized in recent years that resting state activity is implicated in a wide variety of brain function. While certain networks of brain areas have different levels of activation at rest and during a task, there is nevertheless significant similarity between activations in the two cases. This suggests that recordings of resting state activity can be used as a source of unlabeled data to augment discriminative regression techniques in a semi-supervised setting. We evaluate this setting empirically yielding three main results: (i) regression tends to be improved by the use of Laplacian regularization even when no additional unlabeled data are available, (ii) resting state data may have a similar marginal distribution to that recorded during the execution of a visual processing task reinforcing the hypothesis that these conditions have similar types of activation, and (iii) this source of information can be broadly exploited to improve the robustness of empirical inference in fMRI studies, an inherently data poor domain.


On the numeric stability of the SFA implementation sfa-tk

arXiv.org Machine Learning

Slow feature analysis (SFA) is an information processing method proposed by Wiskott and Sejnowski (WS02) which allows to extract slowly varying signals from complex multidimensional time series. Wiskott (Wis98) formulated a similar idea already before as a model of unsupervised learning of invariances in the visual system of vertebrates. SFA has been applied successfully to numerous different tasks: to reproduce a wide range of properties of complex cells in primary visual cortex (BW05), to model the self-organized formation of place cells in the hippocampus (FSW07), to classify handwritten digits (Ber05) and to extract driving forces from nonstationary time series (Wis03). The analysis of nonstationary time series plays an important role in the data understanding of various phenomena such as temperature drift in experimental setup, global warming in climate data or varying heart rate in cardiology. Such nonstationarities can be modeled by underlying parameters, referred to as driving forces, that change the dynamics of the system smoothly on a slow time scale or abruptly but rarely, e.g. if the dynamics switches between different discrete states.


Semi-Supervised Learning -- A Statistical Physics Approach

arXiv.org Artificial Intelligence

We present a novel approach to semi-supervised learning which is based on statistical physics. Most of the former work in the field of semi-supervised learning classifies the points by minimizing a certain energy function, which corresponds to a minimal k-way cut solution. In contrast to these methods, we estimate the distribution of classifications, instead of the sole minimal k-way cut, which yields more accurate and robust results. Our approach may be applied to all energy functions used for semi-supervised learning. The method is based on sampling using a Mul-ticanonical Markov chain Monte-Carlo algorithm, and has a straightforward probabilistic interpretation, which allows for soft assignments of points to classes, and also to cope with yet unseen class types. The suggested approach is demonstrated on a toy data set and on two real-life data sets of gene expression.


Likelihood-based semi-supervised model selection with applications to speech processing

arXiv.org Machine Learning

In conventional supervised pattern recognition tasks, model selection is typically accomplished by minimizing the classification error rate on a set of so-called development data, subject to ground-truth labeling by human experts or some other means. In the context of speech processing systems and other large-scale practical applications, however, such labeled development data are typically costly and difficult to obtain. This article proposes an alternative semi-supervised framework for likelihood-based model selection that leverages unlabeled data by using trained classifiers representing each model to automatically generate putative labels. The errors that result from this automatic labeling are shown to be amenable to results from robust statistics, which in turn provide for minimax-optimal censored likelihood ratio tests that recover the nonparametric sign test as a limiting case. This approach is then validated experimentally using a state-of-the-art automatic speech recognition system to select between candidate word pronunciations using unlabeled speech data that only potentially contain instances of the words under test. Results provide supporting evidence for the utility of this approach, and suggest that it may also find use in other applications of machine learning.


Semi-Supervised Learning Using Sparse Eigenfunction Bases

AAAI Conferences

We present a new framework for semi-supervised learning with sparse eigenfunction bases of kernel matrices. It turns out that when the cluster assumption holds, that is, when the high density regions are sufficiently separated by low density valleys, each high density area corresponds to a unique representative eigenvector. Linear combination of such eigenvectors (or, more precisely, of their Nystrom extensions) provide good candidates for good classification functions. By first choosing an appropriate basis of these eigenvectors from unlabeled data and then using labeled data with Lasso to select a classifier in the span of these eigenvectors, we obtain a classifier, which has a very sparse representation in this basis. Importantly, the sparsity appears naturally from the cluster assumption. Experimental results on a number of real-world datasets show that our method is competitive with the state of the art semi-supervised learning algorithms and out-performs the natural base-line algorithm (Lasso in the Kernel PCA basis).


Sparse Geodesic Paths

AAAI Conferences

In this paper we propose a new distance metric for signals that admit a sparse representation in a known basis or dictionary. The metric is derived as the length of the sparse geodesic path between two points, by which we mean the shortest path between the points that is itself sparse. We show that the distance can be computed via a simple formula and that the entire geodesic path can be easily generated. The distance provides a natural similarity measure that can be exploited as a perceptually meaningful distance metric for natural images. Furthermore, the distance has applications in supervised, semi-supervised, and unsupervised learning settings.


Non-Metric Label Propagation

AAAI Conferences

In many applications non-metric distances are better than metric distances in reflecting the perceptual distances of human beings. Previous studies on non-metric distances mainly focused on supervised setting and did not consider the usefulness of unlabeled data. In this paper, we present probably the first study of label propagation on graphs induced from non-metric distances. The challenge here lies in the fact that the triangular inequality does not hold for non-metric distances and therefore, a direct application of existing label propagation methods will lead to inconsistency and conflict. We show that by applying spectrum transformation, non-metric distances can be converted into metric ones, and thus label propagation can be executed. Such methods, however, suffer from the change of original semantic relations. As a main result of this paper, we prove that any non-metric distance matrix can be decomposed into two metric distance matrices containing different information of the data. Based on this recognition, our proposed NMLP method derives two graphs from the original non-metric distance and performs a joint label propagation on the joint graph. Experiments validate the effectiveness of the proposed NMLP method.