Unsupervised or Indirectly Supervised Learning
Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario
Yan, Yan, Rosales, Romer, Fung, Glenn, Dy, Jennifer
Learning algorithms normally assume that there is at most one annotation or label per data point. However, in some scenarios, such as medical diagnosis and on-line collaboration,multiple annotations may be available. In either case, obtaining labels for data points can be expensive and time-consuming (in some circumstances ground-truth may not exist). Semi-supervised learning approaches have shown that utilizing the unlabeled data is often beneficial in these cases. This paper presents a probabilistic semi-supervised model and algorithm that allows for learning from both unlabeled and labeled data in the presence of multiple annotators. We assume that it is known what annotator labeled which data points. The proposed approach produces annotator models that allow us to provide (1) estimates of the true label and (2) annotator variable expertise for both labeled and unlabeled data. We provide numerical comparisons under various scenarios and with respect to standard semi-supervised learning. Experiments showed that the presented approach provides clear advantages over multi-annotator methods that do not use the unlabeled data and over methods that do not use multi-labeler information.
Active Semi-Supervised Learning using Submodular Functions
Guillory, Andrew, Bilmes, Jeff A.
We consider active, semi-supervised learning in an offline transductive setting. We show that a previously proposed error bound for active learning on undirected weighted graphs can be generalized by replacing graph cut with an arbitrary symmetric submodular function. Arbitrary non-symmetric submodular functions can be used via symmetrization. Different choices of submodular functions give different versions of the error bound that are appropriate for different kinds of problems. Moreover, the bound is deterministic and holds for adversarially chosen labels. We show exactly minimizing this error bound is NP-complete. However, we also introduce for any submodular function an associated active semi-supervised learning method that approximately minimizes the corresponding error bound. We show that the error bound is tight in the sense that there is no other bound of the same form which is better. Our theoretical results are supported by experiments on real data.
Semi-supervised Regression via Parallel Field Regularization
Lin, Binbin, Zhang, Chiyuan, He, Xiaofei
This paper studies the problem of semi-supervised learning from the vector field perspective. Many of the existing work use the graph Laplacian to ensure the smoothness of the prediction function on the data manifold. However, beyond smoothness, it is suggested by recent theoretical work that we should ensure second order smoothness for achieving faster rates of convergence for semi-supervised regression problems. To achieve this goal, we show that the second order smoothness measures the linearity of the function, and the gradient field of a linear function has to be a parallel vector field. Consequently, we propose to find a function which minimizes the empirical error, and simultaneously requires its gradient field to be as parallel as possible. We give a continuous objective function on the manifold and discuss how to discretize it by using random points. The discretized optimization problem turns out to be a sparse linear system which can be solved very efficiently. The experimental results have demonstrated the effectiveness of our proposed approach.
Selecting Receptive Fields in Deep Networks
Recent deep learning and unsupervised feature learning systems that learn from unlabeled data have achieved high performance in benchmarks by using extremely large architectures with many features (hidden units) at each layer. Unfortunately, for such large architectures the number of parameters usually grows quadratically in the width of the network, thus necessitating hand-coded "local receptive fields" that limit the number of connections from lower level features to higher ones (e.g., based on spatial locality). In this paper we propose a fast method to choose these connections that may be incorporated into a wide variety of unsupervised training methods. Specifically, we choose local receptive fields that group together those low-level features that are most similar to each other according to a pairwise similarity metric. This approach allows us to harness the advantages of local receptive fields (such as improved scalability, and reduced data requirements) when we do not know how to specify such receptive fields by hand or where our unsupervised training algorithm has no obvious generalization to a topographic setting. We produce results showing how this method allows us to use even simple unsupervised training algorithms to train successful multi-layered etworks that achieve state-of-the-art results on CIFAR and STL datasets: 82.0% and 60.1% accuracy, respectively.
Multi-View Learning of Word Embeddings via CCA
Dhillon, Paramveer, Foster, Dean P., Ungar, Lyle H.
Recently, there has been substantial interest in using large amounts of unlabeled data to learn word representations which can then be used as features in supervised classifiers for NLP tasks. However, most current approaches are slow to train, do not model the context of the word, and lack theoretical grounding. In this paper, we present a new learning method, Low Rank Multi-View Learning (LR-MVL) which uses a fast spectral method to estimate low dimensional context-specific word representations from unlabeled data. These representation features can then be used with any supervised learner. LR-MVL is extremely fast, gives guaranteed convergence to a global optimum, is theoretically elegant, and achieves state-ofthe-art performanceon named entity recognition (NER) and chunking problems.
Unsupervised Learning of Human Behaviours
Chua, Sook-Ling (Massey University) | Marsland, Stephen (Massey University) | Guesgen, Hans W. (Massey University)
Behaviour recognition is the process of inferring the behaviour of an individual from a series of observations acquired from sensors such as in a smart home. The majority of existing behaviour recognition systems are based on supervised learning algorithms, which means that training them requires a preprocessed, annotated dataset. Unfortunately, annotating a dataset is a rather tedious process and one that is prone to error. In this paper we suggest a way to identify structure in the data based on text compression and the edit distance between words, without any prior labelling. We demonstrate that by using this method we can automatically identify patterns and segment the data into patterns that correspond to human behaviours. To evaluate the effectiveness of our proposed method, we use a dataset from a smart home and compare the labels produced by our approach with the labels assigned by a human to the activities in the dataset. We find that the results are promising and show significant improvement in the recognition accuracy over Self-Organising Maps (SOMs).
Co-Training as a Human Collaboration Policy
Zhu, Xiaojin (University of Wisconsin-Madison) | Gibson, Bryan R. (University of Wisconsin-Madison) | Rogers, Timothy T. (University of Wisconsin-Madison)
We consider the task of human collaborative category learning, where two people work together to classify test items into appropriate categories based on what they learn from a training set. We propose a novel collaboration policy based on the Co-Training algorithm in machine learning, in which the two people play the role of the base learners. The policy restricts each learner's view of the data and limits their communication to only the exchange of their labelings on test items. In a series of empirical studies, we show that the Co-Training policy leads collaborators to jointly produce unique and potentially valuable classification outcomes that are not generated under other collaboration policies. We further demonstrate that these observations can be explained with appropriate machine learning models.
Co-regularization Based Semi-supervised Domain Adaptation
Kumar, Abhishek, Saha, Avishek, Daume, Hal
This paper presents a co-regularization based approach to semi-supervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in EASYADAPT (EA) [1]) and harnesses unlabeled data in target domain to further enable the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement and can be applied as a pre-processing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA++ show that the hypothesis class of EA++ has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as a few other baseline approaches.
Exploiting Unlabeled Data to Enhance Ensemble Diversity
Zhang, Min-Ling, Zhou, Zhi-Hua
Ensemble learning aims to improve generalization ability by using multiple base learners. It is well-known that to construct a good ensemble, the base learners should be accurate as well as diverse. In this paper, unlabeled data is exploited to facilitate ensemble learning by helping augment the diversity among the base learners. Specifically, a semi-supervised ensemble method named UDEED is proposed. Unlike existing semi-supervised ensemble methods where error-prone pseudo-labels are estimated for unlabeled data to enlarge the labeled data to improve accuracy, UDEED works by maximizing accuracies of base learners on labeled data while maximizing diversity among them on unlabeled data. Experiments show that UDEED can effectively utilize unlabeled data for ensemble learning and is highly competitive to well-established semi-supervised ensemble methods.
Transductive Learning on Adaptive Graphs
Zhang, Yan-Ming (Chinese Academy of Sciences) | Zhang, Yu (Hong Kong University of Science and Technology) | Yeung, Dit-Yan (Hong Kong University of Science and Technology) | Liu, Cheng-Lin (Chinese Academy of Sciences) | Hou, Xinwen (Chinese Academy of Sciences)
Graph-based semi-supervised learning methods are based on some smoothness assumption about the data. As a discrete approximation of the data manifold, the graph plays a crucial role in the success of such graph-based methods. In most existing methods, graph construction makes use of a predefined weighting function without utilizing label information even when it is available. In this work, by incorporating label information, we seek to enhance the performance of graph-based semi-supervised learning by learning the graph and label inference simultaneously. In particular, we consider a particular setting of semi-supervised learning called transductive learning. Using the LogDet divergence to define the objective function, we propose an iterative algorithm to solve the optimization problem which has closed-form solution in each step. We perform experiments on both synthetic and real data to demonstrate improvement in the graph and in terms of classification accuracy.