Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

Jan-26-2025, 09:03:59 GMT–Neural Information Processing Systems

We study the sample complexity of approximate policy iteration (PI) for the Linear Quadratic Regulator (LQR), building on a recent line of work using LQR as a testbed to understand the limits of reinforcement learning (RL) algorithms on continuous control tasks. Our analysis quantifies the tension between policy improvement and policy evaluation, and suggests that policy evaluation is the dominant factor in terms of sample complexity.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Jan-26-2025, 09:03:59 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)