Goto

Collaborating Authors

 real-time reinforcement learning


Real-Time Reinforcement Learning

Neural Information Processing Systems

Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real-time. We then use those insights to create a new algorithm Real-Time Actor Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor Critic both in real-time and non-real-time settings.



Reviews: Real-Time Reinforcement Learning

Neural Information Processing Systems

Positive: - Overall, I feel that the paper provides an interesting contribution that may help to work toward applying RL to real-world problems where an agent interacts with the physical world, e.g. in robots. Negative: - One problem I see with the paper is that it is unclear at this point whether this line of work is necessary because with increased computing power on embedded devices such as robots, the inference time of most methods turns out to actually be neglible (millisecond range or faster). I feel that this point might be alleviated by providing a series of experiments (e.g. in the driving experiment proposed in the paper) where the agent is assumed to be super fast, very fast, fast, not fast, really slow - and show how that impacts the performance of the SAC method. Maybe just referring to the figure inline here would already address make this much clearer and prepare the reader better for the rest of the paper. Maybe stick with a? lines 69ff: - t_\pi is not defined (and I read it as the time it takes to evlauate the policy.


Reviews: Real-Time Reinforcement Learning

Neural Information Processing Systems

This paper received two positive and one negative reviews, and the negative one was very short and non-specific, so normally it would be an accept (and so I will ultimately recommend). A weakness of the paper not noted by the reviewers is that the authors are apparently unaware of the related paper by Travnik et al. (see below). The Travnik paper subtracts slightly from the novelty of the current work but adds to the recognition of the importance of the real-time issues. On balance, I don't think that the existence of this prior work diminishes the case for publication of this paper.


Real-Time Reinforcement Learning

Neural Information Processing Systems

Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real-time. We then use those insights to create a new algorithm Real-Time Actor Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor Critic both in real-time and non-real-time settings.


Real-Time Reinforcement Learning

Neural Information Processing Systems

Markov Decision Processes (MDPs), the mathematical framework underlying most algorithms in Reinforcement Learning (RL), are often used in a way that wrongfully assumes that the state of an agent's environment does not change during action selection. As RL systems based on MDPs begin to find application in real-world safety critical situations, this mismatch between the assumptions underlying classical MDPs and the reality of real-time computation may lead to undesirable outcomes. In this paper, we introduce a new framework, in which states and actions evolve simultaneously and show how it is related to the classical MDP formulation. We analyze existing algorithms under the new real-time formulation and show why they are suboptimal when used in real-time. We then use those insights to create a new algorithm Real-Time Actor Critic (RTAC) that outperforms the existing state-of-the-art continuous control algorithm Soft Actor Critic both in real-time and non-real-time settings.