Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Neural Information Processing Systems 

We propose a novel multimodal video benchmark - the Perception Test - to evaluate the perception and reasoning skills of pre-trained multimodal models (e.g.