Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning

Zhu, Lingwei, Matsubara, Takamitsu

arXiv.org Artificial Intelligence 

Reinforcement Learning (RL) (Sutton and Barto 2018) has A significant factor causing the complexity might be its excessive recently achieved impressive successes in fields such as generality (Kakade and Langford 2002; Pirotta et al. robotic manipulation (OpenAI 2019), video game playing 2013); Those bounds do not focus on any particular class (Mnih et al. 2015) and the game of Go (Silver et al. 2016). of value-based RL algorithms. In this paper, in order to develop However, compared with supervised learning that has widerange more tractable bounds, we focus on an RL class known of practical applications, RL applications have primarily as entropy-regularized value-based methods (Azar, Gómez, been limited to casual game playing or laboratory and Kappen 2012; Fox, Pakman, and Tishby 2016; Haarnoja based robotics. A crucial reason for limiting applications et al. 2017, 2018), where the entropies of policies are introduced to these environments is that it is not guaranteed that the

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found