Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

Open in new window