Goto

Collaborating Authors

 Reinforcement Learning




On Learning Intrinsic Rewards for Policy Gradient Methods

Neural Information Processing Systems

In this paper we build on the Optimal Rewards Framework of Singh et al. [2010] that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the


A Lyapunov-based Approach to Safe Reinforcement Learning

Neural Information Processing Systems

In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints. In particular, besides optimizing performance, it is crucial to guarantee the safety of an agent during training as well as deployment (e.g., a robot should avoid taking actions - exploratory or not - which irrevocably harm its hardware). To incorporate safety in RL, we derive algorithms under the framework of constrained Markov decision processes (CMDPs), an extension of the standard Markov decision processes (MDPs) augmented with constraints on expected cumulative costs.




Exponentially Weighted Imitation Learning for Batched Historical Data

Neural Information Processing Systems

We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or "environment oracle" as in most reinforcement learning settings.