Off-PolicyEvaluationviatheRegularizedLagrangian
–Neural Information Processing Systems
Although there are many commonalities between the various DICE estimators, their derivations are distinct and seemingly incompatible.
Neural Information Processing Systems
Feb-8-2026, 07:36:40 GMT