Real-Time Reinforcement Learning

Nov-14-2019–arXiv.org Machine Learning

Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real-time. We then use those insights to create a new algorithm Real-Time Actor-Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor-Critic both in real-time and non-real-time settings. Code and videos can be found at https://github.com/rmst/rtrl.

algorithm, rt ac, rtmdp, (13 more...)

arXiv.org Machine Learning

Nov-14-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Quebec > Montreal (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.51)

Industry:
- Leisure & Entertainment > Games (0.93)

Technology:
- Information Technology
  - Architecture > Real Time Systems (1.00)
  - Artificial Intelligence > Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks (0.68)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.36)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found