Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Cao, Haoyu, Bao, Changcun, Liu, Chaohu, Chen, Huang, Yin, Kun, Liu, Hao, Liu, Yinsong, Jiang, Deqiang, Sun, Xing
–arXiv.org Artificial Intelligence
We propose a novel end-to-end document understanding model called SeRum (SElective Region Understanding Model) for extracting meaningful information from document images, including document analysis, retrieval, and office automation. Unlike state-of-the-art approaches that rely on multi-stage technical schemes and are computationally expensive, SeRum converts document image understanding and recognition tasks into a local decoding process of the visual tokens of interest, using a content-aware token merge module. This mechanism enables the model to pay more attention to regions of interest generated by the query decoder, improving the model's effectiveness and speeding up the decoding speed of the generative scheme. We also designed several pre-training tasks to enhance the understanding and local awareness of the model. Experimental results demonstrate that SeRum achieves state-of-the-art performance on document understanding tasks and competitive results on text spotting tasks. SeRum represents a substantial advancement towards enabling efficient and effective end-to-end document understanding.
arXiv.org Artificial Intelligence
Sep-3-2023
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America
- United States
- Washington > King County
- Seattle (0.14)
- New York > New York County
- New York City (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California > Los Angeles County
- Long Beach (0.04)
- Washington > King County
- Canada > Quebec
- Montreal (0.04)
- United States
- Europe
- Germany (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Portugal > Lisbon
- Lisbon (0.04)
- Middle East > Malta
- Port Region > Southern Harbour District > Valletta (0.04)
- Italy > Lombardy
- Milan (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- China (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Middle East > Israel
- Tel Aviv District > Tel Aviv (0.04)
- Africa > Central African Republic
- Ombella-M'Poko > Bimbo (0.04)
- Oceania > Australia
- Genre:
- Research Report > New Finding (1.00)
- Technology: