Off-policy estimation with adaptively collected data: the power of online learning

May-27-2025, 21:04:36 GMT–Neural Information Processing Systems

We consider estimation of a linear functional of the treatment effect from adaptively collected data. This problem finds a variety of applications including off-policy evaluation in contextual bandits, and estimation of the average treatment effect in causal inference. While a certain class of augmented inverse propensity weighting (AIPW) estimators enjoys desirable asymptotic properties including the semi-parametric efficiency, much less is known about their non-asymptotic theory with adaptively collected data. To fill in the gap, we first present generic upper bounds on the mean-squared error of the class of AIPW estimators that crucially depends on a sequentially weighted error between the treatment effect and its estimates. Motivated by this, we propose a general reduction scheme that allows one to produce a sequence of estimates for the treatment effect via online learning to minimize the sequentially weighted estimation error.

adaptively, estimation, treatment effect, (4 more...)

Neural Information Processing Systems

May-27-2025, 21:04:36 GMT

Conferences Web Page

Add feedback

Industry:
- Education > Educational Setting > Online (0.69)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (0.61)
  - Enterprise Applications > Human Resources
    - Learning Management (0.69)