EMVP: Embracing Visual Foundation Model for Visual Place Recognition with Centroid-Free Probing Shun Zhang 3, Haiming Gao 3 Honghui Yang

Mar-27-2025, 11:02:22 GMT–Neural Information Processing Systems

Visual Place Recognition (VPR) is essential for mobile robots as it enables them to retrieve images from a database closest to their current location. The progress of Visual Foundation Models (VFMs) has significantly advanced VPR by capturing representative descriptors in images. However, existing fine-tuning efforts for VFMs often overlook the crucial role of probing in effectively adapting these descriptors for improved image representation. In this paper, we propose the Centroid-Free Probing (CFP) stage, making novel use of second-order features for more effective use of descriptors from VFMs. Moreover, to control the preservation of task-specific information adaptively based on the context of the VPR, we introduce the Dynamic Power Normalization (DPN) module in both the recalibration and CFP stages, forming a novel Parameter Efficiency Fine-Tuning (PEFT) pipeline (EMVP) tailored for the VPR task. Extensive experiments demonstrate the superiority of the proposed CFP over existing probing methods. Moreover, the EMVP pipeline can further enhance fine-tuning performance in terms of accuracy and efficiency. Specifically, it achieves 93.9%, 96.5%, and 94.6% Recall@1 on the MSLS Validation, Pitts250k-test, and SPED datasets, respectively, while saving 64.3% of trainable parameters compared with the existing SOTA PEFT method.

descriptor, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Mar-27-2025, 11:02:22 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.46)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Health & Medicine (0.46)
- Information Technology (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks > Deep Learning (0.67)
      - Statistical Learning (1.00)
    - Natural Language (1.00)
    - Robots (0.68)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (1.00)