Visual Fourier Prompt Tuning

May-28-2025, 09:48:43 GMT–Neural Information Processing Systems

With the scale of Transformer-based vision models continuing to grow, finetuning these large-scale pretrained models for new tasks has become increasingly parameter-intensive. Visual prompt tuning is introduced as a parameter-efficient finetuning (PEFT) method to this trend. Despite its successes, a notable research challenge persists within almost all PEFT approaches: significant performance degradation is observed when there is a substantial disparity between the datasets used in pretraining and finetuning phases. To address this challenge, we draw inspiration from human visual cognition, and propose the Visual Fourier Prompt Tuning (VFPT) method as an effective and efficient solution for adapting largescale Transformer-based models. Our approach innovatively incorporates the Fast Fourier Transform into prompt embeddings, seamlessly integrating both spatial and frequency domain information. Apart from its inherent simplicity and intuitiveness, VFPT exhibits superior performance across various tasks, offering a general solution to address the data disparity challenge. Empirical results demonstrate that our approach outperforms several state-of-the-art baselines on two benchmarks, with low parameter usage (e.g., 0.57% of model parameters on VTAB-1k) and notable performance enhancements (e.g., 73.20% of mean accuracy on VTAB-1k). Our code is avaliable at https://github.com/runtsang/VFPT.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

May-28-2025, 09:48:43 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - California (0.14)
  - Missouri (0.14)

Genre:
- Research Report
  - Experimental Study > Negative Result (0.45)
  - New Finding (1.00)

Industry:
- Health & Medicine (0.68)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)
    - Natural Language > Large Language Model (0.86)
    - Representation & Reasoning (0.92)
    - Vision (1.00)
  - Data Science > Data Quality
    - Data Transformation (0.88)
  - Sensing and Signal Processing > Image Processing (1.00)