Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs