Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator
Karl Krauth, Stephen Tu, Benjamin Recht
–Neural Information Processing Systems
We study the sample complexity of approximate policy iteration (PI) for the Linear Quadratic Regulator (LQR), building on a recent line of work using LQR as a testbed to understand the limits of reinforcement learning (RL) algorithms on continuous control tasks. Our analysis quantifies the tension between policy improvement and policy evaluation, and suggests that policy evaluation is the dominant factor in terms of sample complexity.
Neural Information Processing Systems
Jan-26-2025, 09:03:59 GMT