Is Pre-training Applicable to the Decoder for Dense Prediction?
Ning, Chao, Gan, Wanshui, Xuan, Weihao, Yokoya, Naoto
–arXiv.org Artificial Intelligence
Is Pre-training Applicable to the Decoder for Dense Prediction? Chao Ning The University of tokyo Wanshui Gan The University of tokyo Weihao Xuan The University of tokyo Naoto Y okoya The University of tokyo Abstract Encoder-decoder networks are commonly used model architectures for dense prediction tasks, where the encoder typically employs a model pre-trained on upstream tasks, while the decoder is often either randomly initialized or pre-trained on other tasks. In this paper, we introduce Net, a novel framework that leverages a model pre-trained on upstream tasks as the decoder, fostering a "pre-trained encoder pre-trained decoder" collaboration within the encoder-decoder network. Net effectively address the challenges associated with using pre-trained models in the decoding, applying the learned representations to enhance the decoding process. This enables the model to achieve more precise and high-quality dense predictions. Remarkably, it achieves this without relying on decoding-specific structures or task-specific algorithms. Despite its streamlined design, Net outperforms advanced methods in tasks such as monocular depth estimation and semantic segmentation, achieving state-of-the-art performance particularly in monocular depth estimation. 1. Introduction Since 2015, Jonathan et al. [35] have reinterpreted classification networks as fully convolutional architectures, fine-tuning these models based on their pre-learned representations. Pre-trained models excel at extracting features across multiple scales, from fine to coarse, effectively capturing both local and global information from images.
arXiv.org Artificial Intelligence
Mar-15-2025