Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models

Open in new window