Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces

Open in new window