If you aren't familiar with Generative Adversarial Networks (GANs), they are a massively popular generative modeling technique formed by pitting two Deep Neural Networks, a generator and a discriminator, against each other. This adversarial loss has sparked the interest of many Deep Learning and Artificial Intelligence researchers. However, despite the beauty of the GAN formulation and the eye-opening results of the state-of-the-art architectures, GANs are generally very difficult to train. One of the best ways to get better results with GANs are to provide class labels. This is the basis of the conditional-GAN model.
Researchers from Facebook and the French National Institute for Research in Digital Science and Technology (Inria) have developed a new technique for self-supervised training of convolutional networks used for image classification and other computer vision tasks. The proposed method surpasses supervised techniques on most transfer tasks and outperforms previous self-supervised approaches. "Our approach allows researchers to train efficient, high-performance image classification models with no annotations or metadata," the researchers write in a Facebook blog post. "More broadly, we believe that self-supervised learning is key to building more flexible and useful AI." Recent improvements in self-supervised training methods have established them as a serious alternative to traditional supervised training. Self-supervised approaches however are significantly slower to train compared to their supervised counterparts.
If I understand correctly, both CPC and AlexNet used the same set of training images. CPC just didn't use labels, while AlexNet did. So, what about instances where a self-supervised network can be trained on 10,000x as much data as would be economically feasible to label? In these cases, are supervised learning's days numbered? The application I'm personally most interested in is self-driving cars.
We present an electrocardiogram (ECG) -based emotion recognition system using self-supervised learning. Our proposed architecture consists of two main networks, a signal transformation recognition network and an emotion recognition network. First, unlabelled data are used to successfully train the former network to detect specific pre-determined signal transformations in the self-supervised learning step. Next, the weights of the convolutional layers of this network are transferred to the emotion recognition network, and two dense layers are trained in order to classify arousal and valence scores. We show that our self-supervised approach helps the model learn the ECG feature manifold required for emotion recognition, performing equal or better than the fully-supervised version of the model. Our proposed method outperforms the state-of-the-art in ECG-based emotion recognition with two publicly available datasets, SWELL and AMIGOS. Further analysis highlights the advantage of our self-supervised approach in requiring significantly less data to achieve acceptable results.
Unsupervised learning is an old and well-understood problem in machine learning; LeCun's choice to replace it as the star in his cake analogy is not something he should take lightly! If you dive into the definition of self-supervised learning, you'll begin to see that it's really just an approach to unsupervised learning. Since many of the breakthroughs in machine learning this decade have been based on supervised learning techniques, successes in unsupervised problems tend to emerge when researchers re-frame an unsupervised problem as a supervised problem. Specifically, in self-supervised learning, we find a clever way to generate labels without human annotators. An easy example is a technique called next-step prediction.