SubgaussianandDifferentiableImportanceSampling forOff-PolicyEvaluationandLearning
–Neural Information Processing Systems
Inthispaper,weanalyze the theoretical properties of the IS estimator by deriving a novel anticoncentration bound that formalizes the intuition behind itsundesired behavior. Then, we propose anew class of IS transformations, based on the notion of power mean. To the best of our knowledge, the resulting estimator is the first to achieve, under certainconditions, twokeyproperties: (i)itdisplays asubgaussianconcentration rate; (ii) it preserves the differentiability in the target distribution.
Neural Information Processing Systems
Feb-8-2026, 10:06:36 GMT
- Technology: