Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

Open in new window