Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs

Open in new window