On the Convergence Rate of Off-Policy Policy Optimization Methods with Density-Ratio Correction

Open in new window