Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

Karl Krauth, Stephen Tu, Benjamin Recht

Neural Information Processing Systems 

We study the sample complexity of approximate policy iteration (PI) for the Linear Quadratic Regulator (LQR), building on a recent line of work using LQR as a testbed to understand the limits of reinforcement learning (RL) algorithms on continuous control tasks. Our analysis quantifies the tension between policy improvement and policy evaluation, and suggests that policy evaluation is the dominant factor in terms of sample complexity.