Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Neural Information Processing Systems 

More specifically, the discounted MDP is one of the standard MDPs in reinforcement learning to describe sequential tasks without interruption or restart.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found