Don't Just Chase " Highlighted Tokens " in MLLMs: Revisiting Visual Holistic Context Retention

Jun-16-2026, 09:37:29 GMT–Neural Information Processing Systems

Despite their powerful capabilities, Multimodal Large Language Models (MLLMs) suffer from considerable computational overhead due to their reliance on massive visual tokens. Recent studies have explored token pruning to alleviate this problem, which typically uses text-vision cross-attention or [CLS] attention to assess and discard redundant visual tokens. In this work, we identify a critical limitation of such attention-first pruning approaches, i.e., they tend to preserve semantically similar tokens, resulting in pronounced performance drops under high pruning ratios. To this end, we propose HoloV, a simple yet effective, plug-and-play visual token pruning framework for efficient inference.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

Neural Information Processing Systems

Jun-16-2026, 09:37:29 GMT

Conferences PDF

Add feedback

Country:
- Asia > China (0.46)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.92)

Industry:
- Information Technology (0.46)
- Health & Medicine (0.46)
- Education (0.45)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found