Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs

Open in new window