mp-ssl
Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples
Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > California (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.34)
- Law (0.86)
- Information Technology > Security & Privacy (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.74)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Multiple Pretext-Task for Self-Supervised Learning via Mixing Multiple Image Transformations
Yamaguchi, Shin'ya, Kanai, Sekitoshi, Shioda, Tetsuya, Takeda, Shoichiro
Multiple Pretext-T ask for Self-Supervised Learning via Mixing Multiple Image Transformations Shin'ya Y amaguchi, Sekitoshi Kanai, Tetsuya Shioda, Shoichiro Takeda NTT Tokyo, Japan {shinya.yamaguchi.mw,sekitoshi.kanai.fu,tetsuya.shioda.yf,shoichiro.takeda.us}@hco.ntt.co.jp Abstract Self-supervised learning is one of the most promising approaches to learn representations capturing semantic features in images without any manual annotation cost. T o learn useful representations, a self-supervised model solves a pretext-task, which is defined by data itself. Among a number of pretext-tasks, the rotation prediction task (Rotation) achieves better representations for solving various target tasks despite its simplicity of the implementation. However, we found that Rotation can fail to capture semantic features related to image textures and colors. T o tackle this problem, we introduce a learning technique called multiple pretext-task for self-supervised learning (MP-SSL), which solves multiple pretext-task in addition to Rotation simultaneously. In order to capture features of textures and colors, we employ the transformations of image enhancements (e.g., sharpening and solarizing) as the additional pretext-tasks. MP-SSL efficiently trains a model by leveraging a Frank-W olfe based multi-task training algorithm. Our experimental results show MP-SSL models outperform Rotation on multiple standard benchmarks and achieve state-of- the-art performance on Places-205. 1. Introduction Convolutional neural networks (CNNs) [27, 16, 44] are widely adopted to solve many target tasks in applications of computer vision such as object recognition [30], semantic segmentation [4], and object detection [42]. However, these successes depend on supervised training of CNNs with the vast amount of labeled data [43], which is expensive and impractical because of the manual annotation cost. Since the cost of labeled data limits the practical applications of CNNs, a number of researches focus on the training techniques to alleviate the requirement of many labeled data; the techniques include transfer learning, semi-supervised learning, and self-supervised learning . A demonstration describing our motivation to modify self-supervised learning by predicting rotations of images (Rotation).