Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies

Open in new window