CoinDICE: Off-Policy Confidence Interval Estimation
–Neural Information Processing Systems
One of the major barriers that hinders the application of reinforcement learning (RL) is the ability to evaluate new policies reliably before deployment, a problem generally known as off-policy evaluation (OPE).
Neural Information Processing Systems
Oct-3-2025, 03:57:32 GMT