Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs

Open in new window