Warm-up Free Policy Optimization: Improved Regret in Linear Markov Decision Processes

Open in new window