Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality

Open in new window