Supervised Deep Learning is similar to concept learning in humans and animals, the difference being that the student in the former case is a computational network. Supervised deep learning frameworks are trained using well-labelled data. It teaches the learning algorithm to generalise from the training data and to implement in unseen situations. After completing the training process, the model is tested on a subset of the testing set to predict the output. Thus, datasets containing inputs and correct outputs become critical as they help the model learn faster. Regression and classification are two subfields of supervised machine learning.
Generally, self-supervised representation learning trains a feature extractor by solving a pretext task constructed on a large unlabeled dataset. The learned feature extractor yields generic feature representations for other machine learning tasks such as classification. Recent algorithms help a linear classifier to attain classification accuracy comparable to a supervised method from scratch, especially in a few amount of labeled data regime [Newell and Deng, 2020, Hénaff et al., 2020, Chen et al., 2020b]. For example, SwAV [Caron et al., 2020] with ResNet-50 has a top-1 validation accuracy of 75.3% on the ImageNet-1K classification [Deng et al., 2009] compared with 76.5% by using the fully supervised method. InfoNCE [van den Oord et al., 2018] or its modification is a de facto standard objective used in many state-of-the-art self-supervised methods [Logeswaran and Lee, 2018, Bachman et al., 2019, He et al., 2020, Chen et al., 2020a, Hénaff et al., 2020, Caron et al., 2020].
We use a contrastive self-supervised learning framework to estimate distances to galaxies from their photometric images. We incorporate data augmentations from computer vision as well as an application-specific augmentation accounting for galactic dust. We find that the resulting visual representations of galaxy images are semantically useful and allow for fast similarity searches, and can be successfully fine-tuned for the task of redshift estimation. We show that (1) pretraining on a large corpus of unlabeled data followed by fine-tuning on some labels can attain the accuracy of a fully-supervised model which requires 2-4x more labeled data, and (2) that by fine-tuning our self-supervised representations using all available data labels in the Main Galaxy Sample of the Sloan Digital Sky Survey (SDSS), we outperform the state-of-the-art supervised learning method.
Learning a joint representation of these modalities should yield deeper and more useful representations.Previous generative approaches to multi-modal input either do not learn a joint distribution or require additional computation to handle missing data. Here, we introduce a multimodal variational autoencoder (MVAE) that uses a product-of-experts inference network and a sub-sampled training paradigm to solve the multi-modal inference problem. Notably, our model shares parameters to efficiently learn under any combination of missing modalities. We apply the MVAE on four datasets and match state-of-the-art performance using many fewer parameters. In addition, we show that the MVAE is directly applicable to weakly-supervised learning, and is robust to incomplete supervision.
Technically speaking, the terms supervised and unsupervised learning refer to whether the raw data used to create algorithms has been prelabeled or not. In supervised learning, data scientists feed algorithms with labeled training data and define the variables they want the algorithm to assess for correlations. Both the input and the output of the algorithm is specified in the training data. For example, if you are trying to train an algorithm to infer if a picture has a cat in it using supervised learning, data scientists create a label for each picture used in the training data indicating whether the image contains a cat or not. In an unsupervised learning approach, the algorithm is trained on unlabeled data.