SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer
Rojas-Gomez, Renan A., Singhal, Karan, Etemad, Ali, Bijamov, Alex, Morningstar, Warren R., Mansfield, Philip Andrew
Self-supervised learning relies heavily on data augmentation to extract meaningful representations from unlabeled images. While existing state-of-the-art augmentation pipelines incorporate a wide range of primitive transformations, these often disregard natural image structure. Thus, augmented samples can exhibit degraded semantic information and low stylistic diversity, affecting downstream performance of self-supervised representations. To overcome this, we propose SASSL: Style Augmentations for Self Supervised Learning, a novel augmentation technique based on Neural Style Transfer. The method decouples semantic and stylistic attributes in images and applies transformations exclusively to the style while preserving content, generating diverse augmented samples that better retain their semantic properties. Experimental results show our technique achieves a top-1 classification performance improvement of more than 2% on ImageNet compared to the well-established MoCo v2. Our experiments indicate that decoupling style from content information and transferring style across datasets to diversify augmentations can significantly improve downstream performance of self-supervised representations. Data labelling is a challenging and expensive process, which often serves as a barrier to build machine learning models to solve real-world problems. Self-supervised learning (SSL) is an emerging machine learning paradigm that helps to alleviate the challenges of data labelling, by using large corpora of unlabeled data to pre-train models to learn robust and general representations. These representations can be efficiently transferred to downstream tasks, resulting in performant models which can be constructed without access to large pools of labeled data. SSL methods have shown promising results in recent years, matching and in some cases exceeding the performance of bespoke supervised models with small amounts of labelled data. Given the lack of labels, SSL relies on pretext tasks, i.e., predefined tasks where pseudo-labels can be generated. Some examples include contrastive learning (Chen et al., 2020a; He et al., 2020), clustering (Caron et al., 2021; 2020; Assran et al., 2022), and generative modeling (He et al., 2022; Devlin et al., 2018). Many of these pretext tasks involve training the model to distinguish between different views of the same input and inputs corresponding to different samples. For these tasks, the way input data is augmented is crucial for the network to learn useful invariances and extract robust representations (Chen et al., 2020a). While state-of-the-art augmentations incorporate a wide range of primitive color, spectral and spatial transformations, they often disregard the natural structure of an image.
Jan-1-2024
- Country:
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine (0.47)
- Technology: