idea of the MAGICAL benchmark ("working on better IL benchmarks is a great idea", " sorely needed ", " a lot of 2