TextMamba: Scene Text Detector with Mamba
Zhao, Qiyan, Yan, Yue, Wang, Da-Han
–arXiv.org Artificial Intelligence
In scene text detection, Transformer-based methods have addressed the global feature extraction limitations inherent in traditional convolution neural network-based methods. However, most directly rely on native Transformer attention layers as encoders without evaluating their cross-domain limitations and inherent shortcomings: forgetting important information or focusing on irrelevant representations when modeling long-range dependencies for text detection. The recently proposed state space model Mamba has demonstrated better long-range dependencies modeling through a linear complexity selection mechanism. Therefore, we propose a novel scene text detector based on Mamba that integrates the selection mechanism with attention layers, enhancing the encoder's ability to extract relevant information from long sequences. We adopt the Top\_k algorithm to explicitly select key information and reduce the interference of irrelevant information in Mamba modeling. Additionally, we design a dual-scale feed-forward network and an embedding pyramid enhancement module to facilitate high-dimensional hidden state interactions and multi-scale feature fusion. Our method achieves state-of-the-art or competitive performance on various benchmarks, with F-measures of 89.7\%, 89.2\%, and 78.5\% on CTW1500, TotalText, and ICDAR19ArT, respectively. Codes will be available.
arXiv.org Artificial Intelligence
Dec-9-2025
- Country:
- Asia
- China > Fujian Province
- Xiamen (0.05)
- Japan (0.04)
- China > Fujian Province
- Europe > United Kingdom
- England > South Yorkshire > Sheffield (0.04)
- North America > United States
- Hawaii > Honolulu County > Honolulu (0.04)
- Asia
- Genre:
- Research Report (1.00)
- Technology: