Provably Efficient Reinforcement Learning for Infinite-Horizon Average-Reward Linear MDPs

Open in new window