Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation

May-26-2025, 21:13:39 GMT–Neural Information Processing Systems

We consider the problem of learning an \varepsilon -optimal policy in controlled dynamical systems with low-rank latent structure. In the latter, the algorithm estimates the low-rank matrix corresponding to the (state, action) value function of the current policy using the following two-phase procedure. The entries of the matrix are first sampled uniformly at random to estimate, via a spectral method, the *leverage scores* of its rows and columns. These scores are then used to extract a few important rows and columns whose entries are further sampled. The algorithm exploits these new samples to complete the matrix estimation using a CUR-like method.

artificial intelligence, machine learning, model-free low-rank reinforcement learning, (7 more...)

Neural Information Processing Systems

May-26-2025, 21:13:39 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)