Review for NeurIPS paper: Sample Efficient Reinforcement Learning via Low-Rank Matrix Estimation

Neural Information Processing Systems 

Weaknesses: Below I list my concerns regarding the setup and reported results. In the finite case, devising an algorithm for the online setup posed more serious challenges than the generative setup. The restriction of the results to the generative setup hides the price to pay for the need to navigate in the MDP. Could you at least elaborate on explaining the potential difficulties and challenges involved in extending the results to the online case? Could one hope for a similar gain in the sample complexity (over structure-oblivious algorithms)?