Entropy Regularization for Population Estimation

Chugg, Ben, Henderson, Peter, Goldin, Jacob, Ho, Daniel E.

arXiv.org Artificial Intelligence 

While most frameworks for online sequential decision-making focus on the objective of maximizing reward, in practice this is rarely the sole objective. Other considerations may involve budget constraints, ensuring fair treatment, or estimating various population characteristics. There has been growing recognition that these other objectives must be formally integrated into sequential decision-making frameworks, especially if such algorithms are to be used in sensitive application areas [21]. In this work, we focus on the problem of maximizing reward while simultaneously estimating the population total (equivalently, mean) in a structured bandit setting. The most natural approach to this problem from a machine learning perspective is to use a model to predict the mean. However, this method is subject to the problem that adaptively collected data are subject to bias, which in turn biases the model estimates [29].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found