Regret Bounds for Model-Free Linear Quadratic Control

Abbasi-Yadkori, Yasin, Lazic, Nevena, Szepesvari, Csaba

Apr-16-2018–arXiv.org Machine Learning

Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based approaches. In this work, we present a model-free algorithm for controlling linear quadratic (LQ) systems, which is the simplest setting for continuous control and widely used in practice. Our approach is based on a reduction of the control of Markov decision processes to an expert prediction problem. We show that the algorithm regret scales as $O(T^{3/4})$, where $T$ is the number of rounds.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

arXiv.org Machine Learning

Apr-16-2018

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found