Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation

Open in new window