Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation

Neural Information Processing Systems 

We consider the problem of learning an $\varepsilon$-optimal policy in controlled dynamical systems with low-rank latent structure.