Slowly Changing Adversarial Bandit Algorithms are Efficient for Discounted MDPs

Open in new window