Unsupervised or Indirectly Supervised Learning
Machine Learning โ Introduction to Unsupervised Learning Vinod Sharma's Blog
Unsupervised learning helps to find a hidden jewel in data by grouping similar things together. Data have no target attribute. The algorithm takes training examples as the set of attributes/features alone. In this post, I have summarised my whole upcoming book "Unsupervised Learning โ The Unlabelled Data Treasure" on one page. This one-page guide is to know everything about unsupervised learning on a high level.
Large-Scale Semi-Supervised Learning via Graph Structure Learning over High-Dense Points
Wang, Zitong, Wang, Li, Chan, Raymond, Zeng, Tieyong
We focus on developing a novel scalable graph-based semi-supervised learning (SSL) method for a small number of labeled data and a large amount of unlabeled data. Due to the lack of labeled data and the availability of large-scale unlabeled data, existing SSL methods usually encounter either suboptimal performance because of an improper graph or the high computational complexity of the large-scale optimization problem. In this paper, we propose to address both challenging problems by constructing a proper graph for graph-based SSL methods. Different from existing approaches, we simultaneously learn a small set of vertexes to characterize the high-dense regions of the input data and a graph to depict the relationships among these vertexes. A novel approach is then proposed to construct the graph of the input data from the learned graph of a small number of vertexes with some preferred properties. Without explicitly calculating the constructed graph of inputs, two transductive graph-based SSL approaches are presented with the computational complexity in linear with the number of input data. Extensive experiments on synthetic data and real datasets of varied sizes demonstrate that the proposed method is not only scalable for large-scale data, but also achieve good classification performance, especially for extremely small number of labels.
CyberPoint ยท Blog ยท Using Compression to Compare Objects
In my previous blog post, I discussed our endeavor to benefit from unsupervised learning on CyberPoint's malware dataset. One of the more intriguing tools I played with during that effort was the normalized compression distance (NCD). It achieves this by approximating the normalized Kolmogorov distance. The Kolmogorov distance between two objects is actually pretty easy to conceptualize -- it is the length of the shortest program that can transform one object into the other. Unlike many popular similarity measures, this provides a universal notion of similarity by quantifying the difference between two objects without restricting the type of difference.
Flow Contrastive Estimation of Energy-Based Models
Gao, Ruiqi, Nijkamp, Erik, Kingma, Diederik P., Xu, Zhen, Dai, Andrew M., Wu, Ying Nian
This paper studies a training method to jointly estimate an energy-based model and a flow-based model, in which the two models are iteratively updated based on a shared adversarial value function. This joint training method has the following traits. (1) The update of the energy-based model is based on noise contrastive estimation, with the flow model serving as a strong noise distribution. (2) The update of the flow model approximately minimizes the Jensen-Shannon divergence between the flow model and the data distribution. (3) Unlike generative adversarial networks (GAN) which estimates an implicit probability distribution defined by a generator model, our method estimates two explicit probabilistic distributions on the data. Using the proposed method we demonstrate a significant improvement on the synthesis quality of the flow model, and show the effectiveness of unsupervised feature learning by the learned energy-based model. Furthermore, the proposed training method can be easily adapted to semi-supervised learning. We achieve competitive results to the state-of-the-art semi-supervised learning methods.
Is Discriminator a Good Feature Extractor?
Mao, Xin, Su, Zhaoyu, Tan, Pin Siang, Chow, Jun Kang, Wang, Yu-Hsing
Discriminator from generative adversarial nets (GAN) has been used by some research as feature extractor in transfer learning and worked well. But there are also some studies believed that this is a wrong research direction because intuitively the task of discriminator focuses on separating the real samples from the generated ones, making the feature extracted in this way useless for most of the downstream tasks. In this work, we find that the connection between the task of discriminator and the feature is not as strong as people thought, that the main factor restricting the feature learned by the discriminator is not the task of the discriminator itself, but the need to prevent the entire GAN model from mode collapse during the training. From this perspective and combined with further analyses, we find that to avoid mode collapse in the training process of GAN, the features extracted by the discriminator is not guided to be different for the real samples, but divergence without noise is indeed allowed and occupies a large proportion of the feature space. This makes the features learned more robust and helps answer the question about why discriminator can succeed as feature extractor in the related research. After these, we analyze the counterpart of the discriminator extractor, the classifier extractor that assigns the target samples to different categories. We find the performance of the discriminator extractor may be inferior to classifier based extractor when the source classification task is similar to the target task, which is a common case. But the ability to avoid noise prevents discriminator from being replaced by classifier. Last but not least, as our research also reveals a ratio playing an important role in GAN's training to prevent mode collapse, it may contribute to the basic GAN study.
Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels
Song, Shuang, Berthelot, David, Rostamizadeh, Afshin
We propose using active learning based techniques to further improve the state-of-the-art semi-supervised learning MixMatch algorithm. We provide a thorough empirical evaluation of several active-learning and baseline methods, which successfully demonstrate a significant improvement on the benchmark CIFAR-10, CIFAR-100, and SVHN datasets (as much as 1.5% in absolute accuracy). We also provide an empirical analysis of the cost trade-off between incrementally gathering more labeled versus unlabeled data. This analysis can be used to measure the relative value of labeled/unlabeled data at different points of the learning curve, where we find that although the incremental value of labeled data can be as much as 20x that of unlabeled, it quickly diminishes to less than 3x once more than 2,000 labeled example are observed. Code can be found at https://github.com/google-research/mma.
Introduction Generative Adversarial Networks Google Developers
Generative adversarial networks (GANs) are an exciting recent innovation in machine learning. GANs are generative models: they create new data instances that resemble your training data. For example, GANs can create images that look like photographs of human faces, even though the faces don't belong to any real person. Figure 1: Images generated by a GAN created by NVIDIA. GANs achieve this level of realism by pairing a generator, which learns to produce the target output, with a discriminator, which learns to distinguish true data from the output of the generator.
Deep Learning -- Generative Adversarial Network(GAN's)
GAN's is a revolution in the field of deep learning .It has been introduced by the Ian Goodfellow and others in the paper titled "Generative Adversarial Networks" which is available at https://arxiv.org/abs/1406.2661 Let's try to understand what is GAN and How it works? In direction to understand GAN's, we need to understand difference between Supervised and Unsupervised learning techniques and issues with them . Supervised learning are developed based on large quantities of "labeled" samples . The requirement for the supervised learning is large data-sets containing the explainable feature with respect to its labels.
DeepMimic: Mentor-Student Unlabeled Data Based Training
Mosafi, Itay, David, Eli, Netanyahu, Nathan S.
In this paper, we present a deep neural network (DNN) training approach called the "DeepMimic" training method. Enormous amounts of data are available nowadays for training usage. Yet, only a tiny portion of these data is manually labeled, whereas almost all of the data are unlabeled. The training approach presented utilizes, in a most simplified manner, the unlabeled data to the fullest, in order to achieve remarkable (classification) results. Our DeepMimic method uses a small portion of labeled data and a large amount of unlabeled data for the training process, as expected in a real-world scenario. It consists of a mentor model and a student model. Employing a mentor model trained on a small portion of the labeled data and then feeding it only with unlabeled data, we show how to obtain a (simplified) student model that reaches the same accuracy and loss as the mentor model, on the same test set, without using any of the original data labels in the training of the student model. Our experiments demonstrate that even on challenging classification tasks the student network architecture can be simplified significantly with a minor influence on the performance, i.e., we need not even know the original network architecture of the mentor. In addition, the time required for training the student model to reach the mentor's performance level is shorter, as a result of a simplified architecture and more available data. The proposed method highlights the disadvantages of regular supervised training and demonstrates the benefits of a less traditional training approach.