Goto

Collaborating Authors

 Reinforcement Learning



A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

Neural Information Processing Systems

The proximal policy optimization (PPO) algorithm stands as one of the most prosperous methods in the field of reinforcement learning (RL). Despite its success, the theoretical understanding of PPO remains deficient. Specifically, it is unclear whether PPO or its optimistic variants can effectively solve linear Markov decision processes (MDPs), which are arguably the simplest models in RL with function approximation.






Successor-Predecessor Intrinsic Exploration Changmin Y u 1,2 Neil Burgess

Neural Information Processing Systems

Exploration is essential in reinforcement learning, particularly in environments where external rewards are sparse. Here we focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards.