AITopics | magical benchmark

d464b5ac99e74462f321c06ccacc4bff-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-10-2026, 13:44:25 GMT

artificial intelligence, benchmark, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

The MAGICAL Benchmark for Robust Imitation

Neural Information Processing SystemsDec-24-2025, 16:42:57 GMT

Imitation Learning (IL) algorithms are typically evaluated in the same environment that was used to create demonstrations. This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially different deployment settings. This paper presents the MAGICAL benchmark suite, which permits systematic evaluation of generalisation by quantifying robustness to different kinds of distribution shift that an IL algorithm is likely to encounter in practice. Using the MAGICAL suite, we confirm that existing IL algorithms overfit significantly to the context in which demonstrations are provided. We also show that standard methods for reducing overfitting are effective at creating narrow perceptual invariances, but are not sufficient to enable transfer to contexts that require substantially different behaviour, which suggests that new approaches will be needed in order to robustly generalise demonstrator intent. Code and data for the MAGICAL suite is available at https://github.com/qxcv/magical/

magical benchmark, name change, robust imitation, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.78)

Add feedback

idea of the MAGICAL benchmark ("working on better IL benchmarks is a great idea", " sorely needed ", " a lot of 2

Neural Information Processing SystemsAug-16-2025, 15:03:11 GMT

We're glad that R3 likes the idea of the paper and believes that the design process and methods are sound.

benchmark, magical benchmark, variant, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.52)

Add feedback

The MAGICAL Benchmark for Robust Imitation

Neural Information Processing SystemsMay-27-2025, 12:51:38 GMT

Imitation Learning (IL) algorithms are typically evaluated in the same environment that was used to create demonstrations. This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially different deployment settings. This paper presents the MAGICAL benchmark suite, which permits systematic evaluation of generalisation by quantifying robustness to different kinds of distribution shift that an IL algorithm is likely to encounter in practice. Using the MAGICAL suite, we confirm that existing IL algorithms overfit significantly to the context in which demonstrations are provided. We also show that standard methods for reducing overfitting are effective at creating narrow perceptual invariances, but are not sufficient to enable transfer to contexts that require substantially different behaviour, which suggests that new approaches will be needed in order to robustly generalise demonstrator intent.

artificial intelligence, machine learning, magical benchmark, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Add feedback

Review for NeurIPS paper: The MAGICAL Benchmark for Robust Imitation

Neural Information Processing SystemsFeb-6-2025, 16:52:49 GMT

It is not clear to me whether the proposed benchmarks are evaluating imitation learning (IL) or robust imitation learning (robust IL). The difference is the standard IL assumes that the expert data and is obtained from an MDP with exactly the same dynamics and the test MDP. Robust IL assumes that we will get a perturbed MDP at test time (where the definition of the perturbation changes depending on the meaning of "robust"). Currently, the paper seems to argue that it is testing imitation learning but is actually testing robust imitation learning. This has consequences in the experiments section.

benchmark, magical benchmark, robust imitation, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The MAGICAL Benchmark for Robust Imitation

Neural Information Processing SystemsOct-11-2024, 11:20:13 GMT

Imitation Learning (IL) algorithms are typically evaluated in the same environment that was used to create demonstrations. This rewards precise reproduction of demonstrations in one particular environment, but provides little information about how robustly an algorithm can generalise the demonstrator's intent to substantially different deployment settings. This paper presents the MAGICAL benchmark suite, which permits systematic evaluation of generalisation by quantifying robustness to different kinds of distribution shift that an IL algorithm is likely to encounter in practice. Using the MAGICAL suite, we confirm that existing IL algorithms overfit significantly to the context in which demonstrations are provided. We also show that standard methods for reducing overfitting are effective at creating narrow perceptual invariances, but are not sufficient to enable transfer to contexts that require substantially different behaviour, which suggests that new approaches will be needed in order to robustly generalise demonstrator intent.

demonstration, magical benchmark, robust imitation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Add feedback