Replicable Reinforcement Learning

Eaton, Eric, Hussing, Marcel, Kearns, Michael, Sorrell, Jessica

arXiv.org Artificial Intelligence 

The growing prominence of machine learning (ML) and its widespread adoption across industries underscore the need for replicable research [Wagstaff, 2012, Pineau et al., 2021]. Many scientific fields have suffered from this same inability to reproduce the results of published studies [Begley and Ellis, 2012]. Replicability in ML requires not only the ability to reproduce published results [Wagstaff, 2012], as may be partially addressed by sharing code and data [Stodden et al., 2014], but also consistency in the results obtained from successive deployments of an ML algorithm in the same environment. However, the inherent variability and randomness present in ML pose challenges to achieving replicability, as these factors may cause significant variations in results. Building upon foundations of algorithmic stability [Bousquet and Elisseeff, 2002], recent work in learning theory has established rigorous definitions for the study of supervised learning [Impagliazzo et al., 2022] and bandit algorithms [Esfandiari et al., 2023a] that are provably replicable, meaning that algorithms produce identical outputs (with high probability) when executed on distinct data samples from the same underlying distribution. However, these results have not been extended to the study of control problems such as reinforcement learning (RL), that have long been known to suffer from stability issues [White and Eldeib, 1994, Mannor et al., 2004, Islam et al., 2017, Henderson et al., 2018]. These stability issues have already sparked research into robustness for control problems including RL [Khalil et al., 1996, Nilim and Ghaoui, 2005, Iyengar, 2005]. Non-deterministic environments and evaluation benchmarks, the randomness of the exploration process, and the sequential interaction of an RL agent with the environment all complicate the ability to make RL replicable. Our work is orthogonal to that of the robustness literature and our goal is not to reduce the effect of these inherent characteristics, such as by decreasing the amount of exploration that an agent performs, but to develop replicable RL algorithms that support these characteristics.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found