Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning

Aug-25-2020–arXiv.org Artificial Intelligence

Reinforcement Learning (RL) (Sutton and Barto 2018) has A significant factor causing the complexity might be its excessive recently achieved impressive successes in fields such as generality (Kakade and Langford 2002; Pirotta et al. robotic manipulation (OpenAI 2019), video game playing 2013); Those bounds do not focus on any particular class (Mnih et al. 2015) and the game of Go (Silver et al. 2016). of value-based RL algorithms. In this paper, in order to develop However, compared with supervised learning that has widerange more tractable bounds, we focus on an RL class known of practical applications, RL applications have primarily as entropy-regularized value-based methods (Azar, Gómez, been limited to casual game playing or laboratory and Kappen 2012; Fox, Pakman, and Tishby 2016; Haarnoja based robotics. A crucial reason for limiting applications et al. 2017, 2018), where the entropies of policies are introduced to these environments is that it is not guaranteed that the

algorithm, artificial intelligence, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

Aug-25-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)

Genre:
- Research Report (0.82)

Industry:
- Leisure & Entertainment > Games (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.34)
    - Reinforcement Learning (1.00)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found