VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation

Open in new window