Achieving Constant Regret in Linear Markov Decision Processes

Open in new window