Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection

Open in new window