HoliTom: Holistic Token Merging for Fast Video Large Language Models
–Neural Information Processing Systems
VVideoidelaro Inputge language models (video LLMs) excel at video comprehension but face Vision Encodersignificant computational inefficiency due to redundant video tokens.
Neural Information Processing Systems
Jun-22-2026, 17:07:00 GMT