If you aren't familiar with Generative Adversarial Networks (GANs), they are a massively popular generative modeling technique formed by pitting two Deep Neural Networks, a generator and a discriminator, against each other. This adversarial loss has sparked the interest of many Deep Learning and Artificial Intelligence researchers. However, despite the beauty of the GAN formulation and the eye-opening results of the state-of-the-art architectures, GANs are generally very difficult to train. One of the best ways to get better results with GANs are to provide class labels. This is the basis of the conditional-GAN model.
If I understand correctly, both CPC and AlexNet used the same set of training images. CPC just didn't use labels, while AlexNet did. So, what about instances where a self-supervised network can be trained on 10,000x as much data as would be economically feasible to label? In these cases, are supervised learning's days numbered? The application I'm personally most interested in is self-driving cars.
Unsupervised learning is an old and well-understood problem in machine learning; LeCun's choice to replace it as the star in his cake analogy is not something he should take lightly! If you dive into the definition of self-supervised learning, you'll begin to see that it's really just an approach to unsupervised learning. Since many of the breakthroughs in machine learning this decade have been based on supervised learning techniques, successes in unsupervised problems tend to emerge when researchers re-frame an unsupervised problem as a supervised problem. Specifically, in self-supervised learning, we find a clever way to generate labels without human annotators. An easy example is a technique called next-step prediction.
Deep learning methods are successfully used in applications pertaining to ubiquitous computing, health, and well-being. Specifically, the area of human activity recognition (HAR) is primarily transformed by the convolutional and recurrent neural networks, thanks to their ability to learn semantic representations from raw input. However, to extract generalizable features, massive amounts of well-curated data are required, which is a notoriously challenging task; hindered by privacy issues, and annotation costs. Therefore, unsupervised representation learning is of prime importance to leverage the vast amount of unlabeled data produced by smart devices. In this work, we propose a novel self-supervised technique for feature learning from sensory data that does not require access to any form of semantic labels. We learn a multi-task temporal convolutional network to recognize transformations applied on an input signal. By exploiting these transformations, we demonstrate that simple auxiliary tasks of the binary classification result in a strong supervisory signal for extracting useful features for the downstream task. We extensively evaluate the proposed approach on several publicly available datasets for smartphone-based HAR in unsupervised, semi-supervised, and transfer learning settings. Our method achieves performance levels superior to or comparable with fully-supervised networks, and it performs significantly better than autoencoders. Notably, for the semi-supervised case, the self-supervised features substantially boost the detection rate by attaining a kappa score between 0.7-0.8 with only 10 labeled examples per class. We get similar impressive performance even if the features are transferred from a different data source. While this paper focuses on HAR as the application domain, the proposed technique is general and could be applied to a wide variety of problems in other areas.
We present a novel reinforcement learning-based natural media painting algorithm. Our goal is to reproduce a reference image using brush strokes and we encode the objective through observations. Our formulation takes into account that the distribution of the reward in the action space is sparse and training a reinforcement learning algorithm from scratch can be difficult. We present an approach that combines self-supervised learning and reinforcement learning to effectively transfer negative samples into positive ones and change the reward distribution. We demonstrate the benefits of our painting agent to reproduce reference images with brush strokes. The training phase takes about one hour and the runtime algorithm takes about 30 seconds on a GTX1080 GPU reproducing a 1000 800 image with 20,000 strokes.