Goto

Collaborating Authors

 linear function approximation


A Theoretical Analysis of Optimistic Proximal Policy Optimization in Linear Markov Decision Processes

Neural Information Processing Systems

The proximal policy optimization (PPO) algorithm stands as one of the most prosperous methods in the field of reinforcement learning (RL). Despite its success, the theoretical understanding of PPO remains deficient. Specifically, it is unclear whether PPO or its optimistic variants can effectively solve linear Markov decision processes (MDPs), which are arguably the simplest models in RL with function approximation.


Dynamic Regret of Adversarial Linear Mixture MDPs

Neural Information Processing Systems

We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-information rewards and the unknown transition kernel. We consider the linear mixture MDPs whose transition kernel is a linear mixture model and choose the dynamic regret as the performance measure.