Learning Infinite-Horizon Average-Reward Linear Mixture MDPs of Bounded Span

Open in new window