Almost Optimal Model-Free Reinforcement Learning via Reference-Advantage Decomposition

Neural Information Processing Systems 

Our regret bound improves upon the results of [Jin et al., 2018] and

Similar Docs  Excel Report  more

TitleSimilaritySource
None found