CoinDICE: Off-Policy Confidence Interval Estimation
–Neural Information Processing Systems
One of the major barriers that hinders the application of reinforcement learning (RL) is the ability to evaluate new policies reliably before deployment, a problem generally known as off-policy evaluation (OPE).
Neural Information Processing Systems
Oct-3-2025, 03:57:32 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada > Alberta (0.14)
- United States
- California > San Francisco County
- San Francisco (0.14)
- Massachusetts (0.04)
- California > San Francisco County
- Asia > Middle East
- Genre:
- Research Report (0.68)
- Technology: