A Benchmark for Modeling Violation-of-Expectation in Physical Reasoning Across Event Categories

Open in new window