Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models

Open in new window