Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings
–Neural Information Processing Systems
In this paper, we study the visual redundancy problem of multimodal large language models (MLLMs) from the perspective of attention behaviors. Via extensive empirical experiments, we observe and conclude three main inference stages of MLLMs: (i) Early fusion between tokens is first accomplished quickly.
Neural Information Processing Systems
Jun-14-2026, 07:41:45 GMT
- Technology: