Temporally-Grounded Language Generation: A Benchmark for Real-Time Vision-Language Models

Open in new window