Review for NeurIPS paper: Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition

Neural Information Processing Systems 

The paper shows a model-free algorithm with an improved regret bound for finite-state finite-horizon MDP problems. The new bound closes the gap with the best model-based result. This is a nice theoretical contribution.