Unsupervised or Indirectly Supervised Learning
Representation Learning for Words and Entities
This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview Latent Semantic Analysis (MVLSA). By incorporating up to 46 different types of co-occurrence statistics for the same vocabulary of english words, I show that MVLSA outperforms other state-of-the-art word embedding models. Next, I focus on learning entity representations for search and recommendation and present the second method of this thesis, Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints.
Manifold Graph with Learned Prototypes for Semi-Supervised Image Classification
Kuo, Chia-Wen, Ma, Chih-Yao, Huang, Jia-Bin, Kira, Zsolt
Recent advances in semi-supervised learning methods rely on estimating the categories of unlabeled data using a model trained on the labeled data (pseudo-labeling) and using the unlabeled data for various consistency-based regularization. In this work, we propose to explicitly leverage the structure of the data manifold based on a Manifold Graph constructed over the image instances within the feature space. Specifically, we propose an architecture based on graph networks that jointly optimizes feature extraction, graph connectivity, and feature propagation and aggregation to unlabeled data in an end-to-end manner. Further, we present a novel Prototype Generator for producing a diverse set of prototypes that compactly represent each category, which supports feature propagation. To evaluate our method, we first contribute a strong baseline that combines two consistency-based regularizers that already achieves state-of-the-art results especially with fewer labels. We then show that when combined with these regularizers, the proposed method facilitates the propagation of information from generated prototypes to image data to further improve results. We provide extensive qualitative and quantitative experimental results on semi-supervised benchmarks demonstrating the improvements arising from our design and show that our method achieves state-of-the-art performance when compared with existing methods using a single model and comparable with ensemble methods. Specifically, we achieve error rates of 3.35% on SVHN, 8.27% on CIFAR-10, and 33.83% on CIFAR-100. With much fewer labels, we surpass the state of the arts by significant margins of 41% relative error decrease on average.
New DeepMind Unsupervised Image Model Challenges AlexNet
While supervised learning has tremendously improved AI performance in image classification, a major drawback is its reliance on large-scale labeled datasets. This has prompted researchers to explore the potential of unsupervised learning and semi-supervised learning -- techniques that forego data annotation but have their own drawback: diminished accuracy. A new paper from Google's UK-based research company DeepMind addresses this with a model based on Contrastive Predictive Coding (CPC) that outperforms the fully-supervised AlexNet model in Top-1 and Top-5 accuracy on ImageNet. CPC was introduced by DeepMind in 2018. The unsupervised learning approach uses a powerful autoregressive model to extract representations of high-dimensional data to predict future samples.
Data Sampling for Graph Based Unsupervised Learning: Convex and Greedy Optimization
Vahidian, Saeed, Cloninger, Alexander, Mirzasoleiman, Baharan
In a number of situations, collecting a function value for every data point may be prohibitively expensive, and random sampling ignores any structure in the underlying data. We introduce a scalable optimization algorithm with no correction steps (in contrast to Frank-Wolfe and its variants), a variant of gradient ascent for coreset selection in graphs, that greedily selects a weighted subset of vertices that are deemed most important to sample. Our algorithm estimates the mean of the function by taking a weighted sum only at these vertices, and we provably bound the estimation error in terms of the location and weights of the selected vertices in the graph. In addition, we consider the case where nodes have different selection costs and provide bounds on the quality of the low-cost selected coresets. We demonstrate the benefits of our algorithm on point clouds and structured graphs, as well as sensor placement where the cost of placing sensors depends on the location of the placement. We also elucidate that the empirical convergence of our proposed method is faster than random selection and various clustering methods while still respecting sensor placement cost. The paper concludes with validation of the developed algorithm on both synthetic and real datasets, demonstrating that it performs very well compared to the current state of the art.
From Caesar Cipher to Unsupervised Learning: A New Method for Classifier Parameter Estimation
Liu, Yu, Deng, Li, Chen, Jianshu, Chen, Chang Wen
Many important classification problems, such as object classification, speech recognition, and machine translation, have been tackled by the supervised learning paradigm in the past, where training corpora of parallel input-output pairs are required with high cost. To remove the need for the parallel training corpora has practical significance for real-world applications, and it is one of the main goals of unsupervised learning. Recently, encouraging progress in unsupervised learning for solving such classification problems has been made and the nature of the challenges has been clarified. In this article, we review this progress and disseminate a class of promising new methods to facilitate understanding the methods for machine learning researchers. In particular, we emphasize the key information that enables the success of unsupervised learning - the sequential statistics as the distributional prior in the labels. Exploitation of such sequential statistics makes it possible to estimate parameters of classifiers without the need of paired input-output data. In this paper, we first introduce the concept of Caesar Cipher and its decryption, which motivated the construction of the novel loss function for unsupervised learning we use throughout the paper. Then we use a simple but representative binary classification task as an example to derive and describe the unsupervised learning algorithm in a step-by-step, easy-to-understand fashion. We include two cases, one with Bigram language model as the sequential statistics for use in unsupervised parameter estimation, and another with a simpler Unigram language model. For both cases, detailed derivation steps for the learning algorithm are included. Further, a summary table compares computational steps of the two cases in executing the unsupervised learning algorithm for learning binary classifiers.
Discriminative adversarial networks for positive-unlabeled learning
Liu, Fangqing, Chen, Hui, Zhao, Liyue, Wu, Hao
As an important semi-supervised learning task, positive-unlabeled (PU) learning aims to learn a binary classifier only from positive and unlabeled data. In this article, we develop a novel PU learning framework, called discriminative adversarial networks, which contains two discriminative models represented by deep neural networks. One model $\Phi$ predicts the conditional probability of the positive label for a given sample, which defines a Bayes classifier after training, and the other model $D$ distinguishes labeled positive data from those identified by $\Phi$. The two models are simultaneously trained in an adversarial way like generative adversarial networks, and the equilibrium can be achieved when the output of $\Phi$ is close to the exact posterior probability of the positive class. In contrast with existing deep PU learning approaches, DAN does not require the class prior estimation, and its consistency can be proved under very general conditions. Numerical experiments demonstrate the effectiveness of the proposed framework.
Unsupervised learning and its role in the knowledge discovery process
Unlike supervised learning, unsupervised learning not working with labeled data, it is not showing the machine the correct answer. Instead, it is using different algorithms to let the machine create connections by studying and observing the data. Learning and improving by trial and error is the key to unsupervised learning. However, the Knowledge Discovery process is the field of data mining is concerned with the development of methods, techniques and algorithm which can make sense of the available data. It is useful in finding trends, patterns, correlations and anomalies in the databases which is helpful to make accurate decisions for the future.
K-Means Clustering: Unsupervised Learning for Recommender Systems
Unsupervised Learning has been called the closest thing we have to "actual" Artificial Intelligence, in the sense of General AI, with K-Means Clustering one of its simplest, but most powerful applications. I am not here to discuss whether those claims are true or not, as I am not an expert nor a philosopher. I will however state, that I am often amazed by how well unsupervised learning techniques, even the most rudimentary, capture patterns in the data that I would expect only people to find. Today we'll apply unsupervised learning on a Dataset I gathered myself. It's a database of professional Magic: The Gathering decks that I crawled from mtgtop8.com, an awesome website if you're into Magic: the Gathering. I scraped the MtgTop8 data from a few years of tournaments, and they're all available to be consumed in this GitHub repository.
Uncoupled Regression from Pairwise Comparison Data
Xu, Liyuan, Honda, Junya, Niu, Gang, Sugiyama, Masashi
Uncoupled regression is the problem to learn a model from unlabeled data and the set of target values while the correspondence between them is unknown. Such a situation arises in predicting anonymized targets that involve sensitive information, e.g., one's annual income. Since existing methods for uncoupled regression often require strong assumptions on the true target function, and thus, their range of applications is limited, we introduce a novel framework that does not require such assumptions in this paper. Our key idea is to utilize pairwise comparison data, which consists of pairs of unlabeled data that we know which one has a larger target value. Such pairwise comparison data is easy to collect, as typically discussed in the learning-to-rank scenario, and does not break the anonymity of data. We propose two practical methods for uncoupled regression from pairwise comparison data and show that the learned regression model converges to the optimal model with the optimal parametric convergence rate when the target variable distributes uniformly. Moreover, we empirically show that for linear models the proposed methods are comparable to ordinary supervised regression with labeled data.