Variance-Aware Regret Bounds for Undiscounted Reinforcement Learning in MDPs

Open in new window