Regret Minimization in Partially Observable Linear Quadratic Control

Lale, Sahin, Azizzadenesheli, Kamyar, Hassibi, Babak, Anandkumar, Anima

Jan-31-2020–arXiv.org Machine Learning

Controlling unknown discrete-time systems is a fundamenta l problem in adaptive control and reinforcement learning. In this problem, an agent interacts w ith an environment, with unknown dynamics, and aims to minimize the overall average regulati ng costs. To achieve this goal, the agent is required to explore the environment to gain a better understanding of the environment dynamics, which is often called system identification. The a gent then utilizes this understanding to design a set of improved controllers that simultaneously reduces the possible future costs and also enables the agent to explore the important and unknown a spects of the system. In recent decades, this challenging problem has been extensively stu died and resulted in a set of foundational steps to study the stability and asymptotic convergence to o ptimal controllers [Lai et al., 1982, Lai and Wei, 1987]. While asymptotic analyses set the ground for the design of optimal control, understanding the finite time behavior of adaptive algorith ms is critical for real-world applications. In practice, one might prefer an algorithm that guarantees b etter performance on a much shorter horizon. Recent developments in the fields of statistics and machine learning along with control theory [Van Der Vaart and Wellner, 1996, Peña et al., 2009, Lai et al., 1982] empowers us to not only advance the study of the asymptotic efficiency of algorithms b ut also to analyze their finite-time behavior [Fiechter, 1997, Abbasi-Yadkori and Szepesvári, 2011]. In partially observable linear quadratic control, if the ag ent, a priori, is handed the system dynamics, the optimal control/policy has a closed-form in t he presence of Gaussian disturbances.

exp, exp null, expcommit, (17 more...)

arXiv.org Machine Learning

Jan-31-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.04)
  - Massachusetts > Middlesex County
    - Belmont (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.63)

Technology:
- Information Technology
  - Control Systems (0.88)
  - Artificial Intelligence
    - Representation & Reasoning (0.92)
    - Machine Learning > Learning Graphical Models
      - Undirected Networks > Markov Models (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found