Goto

Collaborating Authors

 Reinforcement Learning


Reinforcement Learning with Convex Constraints

Neural Information Processing Systems

In standard reinforcement learning (RL), a learning agent seeks to optimize the overall reward. However, many key aspects of a desired behavior are more naturally expressed as constraints.









Temporal Regularization for Markov Decision Process

Neural Information Processing Systems

Yetinreinforcementlearning,duetothenatureofthe Bellman equation, there isanopportunity toalsoexploit temporal regularization based on smoothness in value estimates over trajectories. This paper explores a class of methods for temporal regularization.


Exponentially Weighted Imitation Learning for Batched Historical Data

Neural Information Processing Systems

We consider deep policy learning with only batched historical trajectories. The main challenge of this problem is that the learner no longer has a simulator or "environment oracle" as in most reinforcement learning settings.