SPOT! Revisiting Video-Language Models for Event Understanding

Open in new window