Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach

Open in new window