Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies
–Neural Information Processing Systems
In this paper, we propose several doubly robust off-policy value and gradient estimators for deterministic policies in an RL setting.
Neural Information Processing Systems
Oct-3-2025, 06:46:54 GMT
- Country:
- North America
- Canada (0.04)
- United States
- Florida > Palm Beach County
- Boca Raton (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- Wisconsin (0.04)
- Florida > Palm Beach County
- North America
- Technology: