Achieving Tractable Minimax Optimal Regret in Average Reward MDPs

Open in new window