Unsupervised or Indirectly Supervised Learning
Ranking Tweets by Labeled and Collaboratively Selected Pairs with Transitive Closure
Liu, Shenghua (Chinese Academy of Sciences) | Cheng, Xueqi (Chinese Academy of Sciences) | Li, Fangtao (Google Inc.)
Tweets ranking is important for information acquisition in Microblog. Due to the content sparsity and lackof labeled data, it is better to employ semi-supervisedlearning methods to utilize the unlabeled data. However,most of previous semi-supervised learning methods donot consider the pair conflict problem, which means thatthe new selected unlabeled data may conflict with the labeled and previously selected data. It will hurt the learning performance a lot, if the training data contains manyconflict pairs. In this paper, we propose a new collaborative semi-supervised SVM ranking model (CSR-TC)with consideration of the order conflict. The unlabeleddata is selected based on a dynamically maintained transitive closure graph to avoid pair conflict. We also investigate the two views of features, intrinsic and contentrelevant features, for the proposed model. Extensive experiments are conducted on TREC Microblogging corpus. The results demonstrate that our proposed methodachieves significant improvement, compared to severalstate-of-the-art models.
Graph Approximation and Clustering on a Budget
Fetaya, Ethan, Shamir, Ohad, Ullman, Shimon
Shimon Ullman Weizmann Institute of Science shimon.ullman@weizmann.ac.il Abstract We consider the problem of learning from a similarity matrix (such as spectral clustering and low-dimensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed. We provide a theoretical analysis using standard notions of graph approximation, significantly generalizing previous results (which focused on spectral clustering with two clusters). We also propose a new algorithmic approach based on adaptive sampling, which experimentally matches or improves on previous methods, while being considerably more general and computationally cheaper. 1 Introduction Many unsupervised learning algorithms, such as spectral clustering [18], [2] and low-dimensional embedding via Laplacian eigenmaps and diffusion maps [3],[16], need as input a matrix of pairwise similaritiesW between the different objects in our data. In some cases, obtaining the full matrix can be a costly matter. For example, w ij may be based on some expensive-to-compute metric such as W2D [5]; based on some physical measurement (such as in some computational biology applications); or is given by a human annotator. In such cases, we would like to have a good approximation of the (initially unknown) matrix, while querying only a limited number of entries. An alternative but equivalent viewpoint is the problem of approximating an unknown weighted undirected graph, by querying a limited number of edges. This question has received previous attention in works such as [17] and [9], which focus on the task of spectral clustering into two clusters, and assuming two such distinct clusters indeed exist (i.e. that there is a big gap between the second and third eigenvalues of the Laplacian matrix). In this work we consider, both theoretically and algorithmically, the question of query-based graph approximation more generally, obtaining results relevant beyond two clusters and beyond spectral clustering. When considering graph approximations, the first question is what notion of approximation to consider. One important notion is cut approximation [14] where we wish for every cut in the approximated graph to have weight close to the weight of the cut in the original graph up to a multiplicative factor. Many machine learning algorithms (and many more general algorithms) such as cut based clustering [11], energy minimization [22], etc. [1] are based on cuts, so this notion of approximation is natural for these uses.
Active Semi-Supervised Learning Using Sampling Theory for Graph Signals
Gadde, Akshay, Anis, Aamir, Ortega, Antonio
We consider the problem of offline, pool-based active semi-supervised learning on graphs. This problem is important when the labeled data is scarce and expensive whereas unlabeled data is easily available. The data points are represented by the vertices of an undirected graph with the similarity between them captured by the edge weights. Given a target number of nodes to label, the goal is to choose those nodes that are most informative and then predict the unknown labels. We propose a novel framework for this problem based on our recent results on sampling theory for graph signals. A graph signal is a real-valued function defined on each node of the graph. A notion of frequency for such signals can be defined using the spectrum of the graph Laplacian matrix. The sampling theory for graph signals aims to extend the traditional Nyquist-Shannon sampling theory by allowing us to identify the class of graph signals that can be reconstructed from their values on a subset of vertices. This approach allows us to define a criterion for active learning based on sampling set selection which aims at maximizing the frequency of the signals that can be reconstructed from their samples on the set. Experiments show the effectiveness of our method.
Justifying Information-Geometric Causal Inference
Janzing, Dominik, Steudel, Bastian, Shajarisales, Naji, Schölkopf, Bernhard
Information Geometric Causal Inference (IGCI) is a new approach to distinguish between cause and effect for two variables. It is based on an independence assumption between input distribution and causal mechanism that can be phrased in terms of orthogonality in information space. We describe two intuitive reinterpretations of this approach that makes IGCI more accessible to a broader audience. Moreover, we show that the described independence is related to the hypothesis that unsupervised learning and semi-supervised learning only works for predicting the cause from the effect and not vice versa.
Correlated random features for fast semi-supervised learning
McWilliams, Brian, Balduzzi, David, Buhmann, Joachim M.
This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, multiview regression, using Canonical Correlation Analysis (CCA) on unlabeled data, biases the regression towards useful features. It has been shown that CCA regression can substantially reduce variance with a minimal increase in bias if the views contains accurate estimators. Recent theoretical and empirical work shows that regression with random features closely approximates kernel regression, implying that the accuracy requirement holds for random views. We show that XNV consistently outperforms a state-of-the-art algorithm for semi-supervised learning: substantially improving predictive performance and reducing the variability of performance on a wide variety of real-world datasets, whilst also reducing runtime by orders of magnitude.
Transfer Learning in a Transductive Setting
Rohrbach, Marcus, Ebert, Sandra, Schiele, Bernt
Category models for objects or activities typically rely on supervised learning requiring sufficiently large training sets. Transferring knowledge from known categories to novel classes with no or only a few labels however is far less researched even though it is a common scenario. In this work, we extend transfer learning with semi-supervised learning to exploit unlabeled instances of (novel) categories with no or only a few labeled instances. Our proposed approach Propagated Semantic Transfer combines three main ingredients. First, we transfer information from known to novel categories by incorporating external knowledge, such as linguistic or expert-specified information, e.g., by a mid-level layer of semantic attributes. Second, we exploit the manifold structure of novel classes. More specifically we adapt a graph-based learning algorithm - so far only used for semi-supervised learning - to zero-shot and few-shot learning. Third, we improve the local neighborhood in such graph structures by replacing the raw feature-based representation with a mid-level object- or attribute-based representation. We evaluate our approach on three challenging datasets in two different applications, namely on Animals with Attributes and ImageNet for image classification and on MPII Composites for activity recognition. Our approach consistently outperforms state-of-the-art transfer and semi-supervised approaches on all datasets.
Unsupervised learning human's activities by overexpressed recognized non-speech sounds
Smidtas, Serge, Peyrot, Magalie
Human activity and environment produces sounds such as, at home, the noise produced by water, cough, or television. These sounds can be used to determine the activity in the environment. The objective is to monitor a person's activity or determine his environment using a single low cost microphone by sound analysis. The purpose is to adapt programs to the activity or environment or detect abnormal situations. Some patterns of over expressed repeatedly in the sequences of recognized sounds inter and intra environment allow to characterize activities such as the entrance of a person in the house, or a tv program watched. We first manually annotated 1500 sounds of daily life activity of old persons living at home recognized sounds. Then we inferred an ontology and enriched the database of annotation with a crowed sourced manual annotation of 7500 sounds to help with the annotation of the most frequent sounds. Using learning sound algorithms, we defined 50 types of the most frequent sounds. We used this set of recognizable sounds as a base to tag sounds and put tags on them. By using over expressed number of motifs of sequences of the tags, we were able to categorize using only a single low-cost microphone, complex activities of daily life of a persona at home as watching TV, entrance in the apartment of a person, or phone conversation including detecting unknown activities as repeated tasks performed by users.
Correlated random features for fast semi-supervised learning
McWilliams, Brian, Balduzzi, David, Buhmann, Joachim M.
This paper presents Correlated Nystrom Views (XNV), a fast semi-supervised algorithm for regression and classification. The algorithm draws on two main ideas. First, it generates two views consisting of computationally inexpensive random features. Second, XNV applies multiview regression using Canonical Correlation Analysis (CCA) on unlabeled data to bias the regression towards useful features. It has been shown that, if the views contains accurate estimators, CCA regression can substantially reduce variance with a minimal increase in bias. Random views are justified by recent theoretical and empirical work showing that regression with random features closely approximates kernel regression, implying that random views can be expected to contain accurate estimators. We show that XNV consistently outperforms a state-of-the-art algorithm for semi-supervised learning: substantially improving predictive performance and reducing the variability of performance on a wide variety of real-world datasets, whilst also reducing runtime by orders of magnitude.
Randomized co-training: from cortical neurons to machine learning and back again
Despite its size and complexity, the human cortex exhibits striking anatomical regularities, suggesting there may simple meta-algorithms underlying cortical learning and computation. We expect such meta-algorithms to be of interest since they need to operate quickly, scalably and effectively with little-to-no specialized assumptions. This note focuses on a specific question: How can neurons use vast quantities of unlabeled data to speed up learning from the comparatively rare labels provided by reward systems? As a partial answer, we propose randomized co-training as a biologically plausible meta-algorithm satisfying the above requirements. As evidence, we describe a biologically-inspired algorithm, Correlated Nystrom Views (XNV) that achieves state-of-the-art performance in semi-supervised learning, and sketch work in progress on a neuronal implementation.
Semi-Supervised Learning for Integration of Aerosol Predictions from Multiple Satellite Instruments
Djuric, Nemanja (Temple University) | Kansakar, Lakesh (Temple University) | Vucetic, Slobodan (Temple University)
Aerosol Optical Depth (AOD), recognized as one of the most important quantities in understanding and predicting the Earth's climate, is estimated daily on a global scale by several Earth-observing satellite instruments. Each instrument has different coverage and sensitivity to atmospheric and surface conditions, and, as a result, the quality of AOD estimated by different instruments varies across the globe. We present a method for learning how to aggregate AOD estimations from multiple satellite instruments into a more accurate estimation. The proposed method is semi-supervised, as it is able to learn from a small number of labeled data, where labels come from a few accurate and expensive ground-based instruments, and a large number of unlabeled data. The method uses a latent variable to partition the data, so that in each partition the expert AOD estimations are aggregated in a different, optimal way. We applied the method to combine AOD estimations from 5 instruments aboard 4 satellites, and the results indicate that it can successfully exploit labeled and unlabeled data to produce accurate aggregated AOD estimations.