Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where
Chin, Zhi-Yi, Jiang, Chieh-Ming, Huang, Ching-Chun, Chen, Pin-Yu, Chiu, Wei-Chen
–arXiv.org Artificial Intelligence
The recent renaissance of deep learning techniques has While image data starts to enjoy the simple-but-effective brought a magic leap to various fields, such as computer vision, self-supervised learning scheme built upon masking and natural language processing, and robotics. Learning self-reconstruction objective thanks to the introduction of from a large-scale labeled/supervised dataset, which is one tokenization procedure and vision transformer backbone, of the key factors leading to the success of deep learning, convolutional neural networks as another important and however, has now turned out to be a significant limitation widely-adopted architecture for image data, though having on its extensions to more fields. In addition to the expensive contrastive-learning techniques to drive the self-supervised cost of time and human resources to collect training learning, still face the difficulty of leveraging such straightforward datasets for different tasks and their corresponding labels, and general masking operation to benefit their the supervised learning scenario typically would suffer from learning process significantly. In this work, we aim to the issue of overfitting on the training dataset, thus leading alleviate the burden of including masking operation into to worse generalizability of the learnt models. These problems the contrastive-learning framework for convolutional neural bring challenges for the application of deep learning networks as an extra augmentation method. In addition techniques but also give rise to the research topic of selfsupervised to the additive but unwanted edges (between masked and learning, wherein it aims to learn to extract informative unmasked regions) as well as other adverse effects caused feature representations from an unlabelled dataset by the masking operations for ConvNets, which have been via leveraging the underlying structure of data and building discussed by prior works, we particularly identify the potential the supervisory signals from the data itself. The discovered problem where for one view in a contrastive samplepair representations are typically more general and can be further the randomly-sampled masking regions could be overly utilized or fine-tuned to various downstream tasks.
arXiv.org Artificial Intelligence
Sep-22-2023
- Country:
- Europe > Switzerland > Zürich > Zürich (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Technology: