A Control-Centric Benchmark for Video Prediction