SubgaussianandDifferentiableImportanceSampling forOff-PolicyEvaluationandLearning

Neural Information Processing Systems 

Inthispaper,weanalyze the theoretical properties of the IS estimator by deriving a novel anticoncentration bound that formalizes the intuition behind itsundesired behavior. Then, we propose anew class of IS transformations, based on the notion of power mean. To the best of our knowledge, the resulting estimator is the first to achieve, under certainconditions, twokeyproperties: (i)itdisplays asubgaussianconcentration rate; (ii) it preserves the differentiability in the target distribution.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found