DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs
–Neural Information Processing Systems
These gains are particularly pronounced on high-resolution tasks, e.g ., 4.2, 11.0, and 4.0 improvements on TextVQA, DocVQA, and InfoVQA compared to LLaV A-1.5-7B, respectively.
Neural Information Processing Systems
Nov-15-2025, 05:21:37 GMT
- Country:
- Asia > China
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- Genre:
- Research Report > Experimental Study (0.93)
- Technology: