RoPAWS: Robust Semi-supervised Representation Learning from Uncurated Data

Mo, Sangwoo, Su, Jong-Chyi, Ma, Chih-Yao, Assran, Mido, Misra, Ishan, Yu, Licheng, Bell, Sean

arXiv.org Artificial Intelligence 

Semi-supervised learning aims to train a model using limited labels. State-of-theart semi-supervised methods for image classification such as PAWS rely on selfsupervised representations learned with large-scale unlabeled but curated data. However, PAWS is often less effective when using real-world unlabeled data that is uncurated, e.g., contains out-of-class data. We propose RoPAWS, a robust extension of PAWS that can work with real-world unlabeled data. From this probabilistic perspective, we calibrate its prediction based on the densities of labeled and unlabeled data, which leads to a simple closed-form solution from the Bayes' rule. Semi-supervised learning aims to address the fundamental challenge of training models with limited labeled data by leveraging large-scale unlabeled data. Recent works exploit the success of selfsupervised learning (He et al., 2020; Chen et al., 2020a) in learning representations from unlabeled data for training large-scale semi-supervised models (Chen et al., 2020b; Cai et al., 2022). Instead of self-supervised pre-training followed by semi-supervised fine-tuning, PAWS (Assran et al., 2021) proposed a single-stage approach that combines supervised and self-supervised learning and achieves state-of-the-art accuracy and convergence speed. While PAWS can leverage curated unlabeled data, we empirically show that it is not robust to realworld uncurated data, which often contains out-of-class data. A common approach to tackle uncurated data in semi-supervised learning is to filter unlabeled data using out-of-distribution (OOD) classification (Chen et al., 2020d; Saito et al., 2021; Liu et al., 2022). However, OOD filtering methods did not fully utilize OOD data, which could be beneficial to learn the representations especially on large-scale realistic datasets. Furthermore, filtering OOD data could be ineffective since in-class and out-of-class data are often hard to discriminate in practical scenarios. To this end, we propose RoPAWS, a robust semi-supervised learning method that can leverage uncurated unlabeled data. PAWS predicts out-of-class data overconfidently in the known classes since it assigns the pseudo-label to nearby labeled data. To handle this, RoPAWS regularizes the pseudolabels by measuring the similarities between labeled and unlabeled data. These pseudo-labels are further calibrated by label propagation between unlabeled data. Figure 1 shows the conceptual illustration of RoPAWS and Figure 4 visualizes the learned representations. We first introduce a new interpretation of PAWS as a generative classifier, modeling densities over representation by kernel density estimation (KDE) (Rosenblatt, 1956).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found