Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Jun-14-2026, 07:41:45 GMT–Neural Information Processing Systems

In this paper, we study the visual redundancy problem of multimodal large language models (MLLMs) from the perspective of attention behaviors. Via extensive empirical experiments, we observe and conclude three main inference stages of MLLMs: (i) Early fusion between tokens is first accomplished quickly.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Jun-14-2026, 07:41:45 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.30)