SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs

Neural Information Processing Systems 

Multimodal Large Language Models (MLLMs) typically process a large number of visual tokens, leading to considerable computational overhead, even though many of these tokens are redundant. Existing visual token pruning methods primarily focus on selecting the most salient tokens based on attention scores, resulting in the semantic incompleteness of the selected tokens.