'walk ' Image

Jun-17-2026, 08:12:42 GMT–Neural Information Processing Systems

Scene text retrieval has made significant progress with the assistance of accurate text localization. However, existing approaches typically require costly bounding box annotations for training. Besides, they mostly adopt a customized retrieval strategy but struggle to unify various types of queries to meet diverse retrieval needs. To address these issues, we introduce Multi-query Scene Text retrieval with Attention Recycling (MSTAR), a box-free approach for scene text retrieval. It incorporates progressive vision embedding to dynamically capture the multigrained representation of texts and harmonizes free-style text queries with styleaware instructions. Additionally, a multi-instance matching module is integrated to enhance vision-language alignment. Furthermore, we build the Multi-Query Text Retrieval (MQTR) dataset, the first benchmark designed to evaluate the multiquery scene text retrieval capability of models, comprising four query types and 16k images. Extensive experiments demonstrate the superiority of our method across seven public datasets and the MQTR dataset.

data mining, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-17-2026, 08:12:42 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.93)

Technology:
- Information Technology
  - Information Management > Search (0.66)
  - Data Science > Data Mining (0.66)
  - Artificial Intelligence
    - Natural Language (1.00)
    - Machine Learning > Neural Networks (0.93)
    - Vision (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found