Online Sparse Reinforcement Learning
Hao, Botao, Lattimore, Tor, Szepesvári, Csaba, Wang, Mengdi
Sparse models in classical statistics often yield the best of both worlds: high representation power is achieved by including many features while sparsity leads to efficient estimation. There is a growing interest in applying the tools developed by statisticians to sequential settings such as contextual bandits and reinforcement learning (RL). As we now explore, in online RL this leads to a number of delicate tradeoffs between assumptions and sample complexity. The use of sparsity in reinforcement learning (RL) has been explored before in the context of policy evaluation or policy optimization in the batch setting [Kolter and Ng, 2009, Geist and Scherrer, 2011, Hoffman et al., 2011, Painter-Wakefield and Parr, 2012, Ghavamzadeh et al., 2011]. As far as we know, there has been very little work on the role of sparsity in online RL. In batch RL, the dataset is given a priori and the focus is typically on evaluating a given target policy or learning a near-optimal policy. By contrast, the central question in online RL is how to sequentially interact with the environment to balance the tradeoff between exploration and exploitation, measured here by the cumulative regret. We ask the following question: Under what circumstances does sparsity help when minimizing regret in online RL?
Nov-8-2020
- Country:
- North America > Canada
- Alberta (0.14)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > Canada
- Genre:
- Research Report (0.50)
- Technology: