Slot-VLM: Object-Event Slots for Video-Language Modeling

Neural Information Processing Systems 

Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding.