Critic-Actor for Average Reward MDPs with Function Approximation: A Finite-Time Analysis

Open in new window