ETO: Efficient Transformer-based Local Feature Matching by Organizing Multiple Homography Hypotheses Junjie Ni1 Guofeng Zhang 1 Guanglin Li1 Yijin Li

Neural Information Processing Systems 

Recent developments have led to the emergence of transformer-based approaches for local feature matching, resulting in enhanced accuracy of matches. However, the time required for transformer-based feature enhancement is excessively long, which limits their practical application. In this paper, we propose methods to reduce the computational load of transformers during both the coarse matching and refinement stages. During the coarse matching phase, we organize multiple homography hypotheses to approximate continuous matches. Each hypothesis encompasses several features to be matched, significantly reducing the number of features that require enhancement via transformers. In the refinement stage, we reduce the bidirectional self-attention and cross-attention mechanisms to unidirectional cross-attention, thereby substantially decreasing the cost of computation. Overall, our method demonstrates at least 4 times faster compared to other transformerbased feature matching algorithms. Comprehensive evaluations on other open datasets such as Megadepth, YFCC100M, ScanNet, and HPatches demonstrate our method's efficacy, highlighting its potential to significantly enhance a wide array of downstream applications.