MARPLE: A Benchmark for Long-Horizon Inference