RLiable: towards reliable evaluation and reporting in reinforcement learning


Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville and Marc G. Bellemare won an outstanding paper award at NeurIPS2021 for their paper Deep Reinforcement Learning at the Edge of the Statistical Precipice. In this blog post, Rishabh Agarwal and Pablo Samuel Castro explain this work. Reinforcement learning (RL) is an area of machine learning that focuses on learning from experiences to solve decision making tasks. While the field of RL has made great progress, resulting in impressive empirical results on complex tasks, such as playing video games, flying stratospheric balloons and designing hardware chips, it is becoming increasingly apparent that the current standards for empirical evaluation might give a false sense of fast scientific progress while slowing it down. To that end, in "Deep RL at the Edge of the Statistical Precipice", given as an oral presentation at NeurIPS 2021, we discuss how statistical uncertainty of results needs to be considered, especially when using only a few training runs, in order for evaluation in deep RL to be reliable.

