MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations

May-27-2025, 12:36:51 GMT–Neural Information Processing Systems

Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLONGBENCH-DOC, a long-context, multi- modal benchmark comprising 1,082 expert-annotated questions. Distinct from previous datasets, it is constructed upon 135 lengthy PDF-formatted documents with an average of 47.5 pages and 21,214 textual tokens.

lvlm, mmlongbench-doc, visualization, (2 more...)

Neural Information Processing Systems

May-27-2025, 12:36:51 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language (0.90)