Model-free Low-Rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation

Neural Information Processing Systems 

In the latter, the algorithm estimates the low-rank matrix corresponding to the (state, action) value function of the current policy using the following two-phase procedure.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found