Convergent Actor-Critic Algorithms Under Off-Policy Training and Function Approximation

Open in new window