Exploring the Role of Explicit Temporal Modeling in Multimodal Large Language Models for Video Understanding

Open in new window