Is Pre-training Applicable to the Decoder for Dense Prediction?

Ning, Chao, Gan, Wanshui, Xuan, Weihao, Yokoya, Naoto

Mar-15-2025–arXiv.org Artificial Intelligence

Is Pre-training Applicable to the Decoder for Dense Prediction? Chao Ning The University of tokyo Wanshui Gan The University of tokyo Weihao Xuan The University of tokyo Naoto Y okoya The University of tokyo Abstract Encoder-decoder networks are commonly used model architectures for dense prediction tasks, where the encoder typically employs a model pre-trained on upstream tasks, while the decoder is often either randomly initialized or pre-trained on other tasks. In this paper, we introduce Net, a novel framework that leverages a model pre-trained on upstream tasks as the decoder, fostering a "pre-trained encoder pre-trained decoder" collaboration within the encoder-decoder network. Net effectively address the challenges associated with using pre-trained models in the decoding, applying the learned representations to enhance the decoding process. This enables the model to achieve more precise and high-quality dense predictions. Remarkably, it achieves this without relying on decoding-specific structures or task-specific algorithms. Despite its streamlined design, Net outperforms advanced methods in tasks such as monocular depth estimation and semantic segmentation, achieving state-of-the-art performance particularly in monocular depth estimation. 1. Introduction Since 2015, Jonathan et al. [35] have reinterpreted classification networks as fully convolutional architectures, fine-tuning these models based on their pre-learned representations. Pre-trained models excel at extracting features across multiple scales, from fine to coarse, effectively capturing both local and global information from images.

artificial intelligence, decoder, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Mar-15-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.85)

Genre:
- Research Report > Promising Solution (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (1.00)
    - Vision > Image Understanding (0.90)
  - Sensing and Signal Processing > Image Processing (1.00)