A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward

Open in new window