Accelerating Multimodal Large Language Models via Dynamic Visual-Token Exit and the Empirical Findings

Open in new window