Regret Bounds for Model-Free Linear Quadratic Control
Abbasi-Yadkori, Yasin, Lazic, Nevena, Szepesvari, Csaba
Reinforcement learning (RL) algorithms have recently shown impressive performance in many challenging decision making problems, including game playing and various robotic tasks. Model-based RL approaches estimate a model of the transition dynamics and rely on the model to plan future actions using approximate dynamic programming. Model-free approaches aim to find an optimal policy without explicitly modeling the system transitions; they estimate state-action value functions or directly optimize a parameterized policy based only on interactions with the environment. Model-free RL is appealing for a number of reasons: 1) it is an "end-to-end" approach, directly optimizing the cost function of interest, 2) it can be used in settings where a model is not available and the agent only has access to a simulator, and 3) it is easy to implement. However, while model-based algorithms have been studied extensively in RL and control literature and can provide strong theoretical guarantees, model-free algorithms are not as well-understood.
Apr-16-2018