Online Sparse Reinforcement Learning

Hao, Botao, Lattimore, Tor, Szepesvári, Csaba, Wang, Mengdi

Nov-8-2020–arXiv.org Machine Learning

Sparse models in classical statistics often yield the best of both worlds: high representation power is achieved by including many features while sparsity leads to efficient estimation. There is a growing interest in applying the tools developed by statisticians to sequential settings such as contextual bandits and reinforcement learning (RL). As we now explore, in online RL this leads to a number of delicate tradeoffs between assumptions and sample complexity. The use of sparsity in reinforcement learning (RL) has been explored before in the context of policy evaluation or policy optimization in the batch setting [Kolter and Ng, 2009, Geist and Scherrer, 2011, Hoffman et al., 2011, Painter-Wakefield and Parr, 2012, Ghavamzadeh et al., 2011]. As far as we know, there has been very little work on the role of sparsity in online RL. In batch RL, the dataset is given a priori and the focus is typically on evaluating a given target policy or learning a near-optimal policy. By contrast, the central question in online RL is how to sequentially interact with the environment to balance the tradeoff between exploration and exploitation, measured here by the cumulative regret. We ask the following question: Under what circumstances does sparsity help when minimizing regret in online RL?

exploratory policy, learner, probability, (14 more...)

arXiv.org Machine Learning

Nov-8-2020

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Alberta (0.14)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found