PanMatch: Unleashing the Potential of Large Vision Models for Unified Matching Models

Zhang, Yongjian, Wang, Longguang, Li, Kunhong, Zhang, Ye, Wang, Yun, Lin, Liang, Guo, Yulan

arXiv.org Artificial Intelligence 

Abstract--This work presents PanMatch, a versatile foundation model for robust correspondence matching. Unlike previous methods that rely on task-specific architectures and domain-specific fine-tuning to support tasks like stereo matching, optical flow or feature matching, our key insight is that any two-frame correspondence matching task can be addressed within a 2D displacement estimation framework using the same model weights. Such a formulation eliminates the need for designing specialized unified architectures or task-specific ensemble models. Instead, it achieves multi-task integration by endowing displacement estimation algorithms with unprecedented generalization capabilities. T o this end, we highlight the importance of a robust feature extractor applicable across multiple domains and tasks, and propose the feature transformation pipeline that leverage all-purpose features from Large Vision Models to endow matching baselines with zero-shot cross-view matching capabilities. Furthermore, we assemble a cross-domain dataset with near 1.8 million samples from stereo matching, optical flow, and feature matching domains to pretrain PanMatch. We demonstrate the versatility of PanMatch across a wide range of domains and downstream tasks using the same model weights . Our model outperforms UniMatch and Flow-Anything on cross-task evaluations, and achieves comparable performance to most state-of-the-art task-specific algorithms on task-oriented benchmarks. Additionally, PanMatch presents unprecedented zero-shot performance in abnormal scenarios, such as rainy day and satellite imagery, where most existing robust algorithms fail to yield meaningful results. This technique serves as the foundation for various real-world applications, including stereo matching for driving and navigation, optical flow for video editing and action recognition, and feature matching for 3D reconstruction. Previous research developed specialized architectures and model weights for specific correspondence tasks due to significant difference in task settings, as outlined in T able 1. For instance, stereo matching operates on a pair of synchronized, rectified images and identifies correspondences along horizontal epipolar lines. Feature matching focus on finding reliable correspondence of rigid scenes from varying camera poses and times. Optical flow estimates pixel-wise displacements in dynamic scenes over consecutive frames. Using task-specific priors to construct models simplifies network design and enhances inference efficiency . However, these individual pipelines inherently limits the adaptability of the algorithms across tasks, resulting in numerous specialized architectures and weights for different scenarios, which complicates real-world deployment.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found