Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models