Entropy Regularization for Population Estimation
Chugg, Ben, Henderson, Peter, Goldin, Jacob, Ho, Daniel E.
–arXiv.org Artificial Intelligence
While most frameworks for online sequential decision-making focus on the objective of maximizing reward, in practice this is rarely the sole objective. Other considerations may involve budget constraints, ensuring fair treatment, or estimating various population characteristics. There has been growing recognition that these other objectives must be formally integrated into sequential decision-making frameworks, especially if such algorithms are to be used in sensitive application areas [21]. In this work, we focus on the problem of maximizing reward while simultaneously estimating the population total (equivalently, mean) in a structured bandit setting. The most natural approach to this problem from a machine learning perspective is to use a model to predict the mean. However, this method is subject to the problem that adaptively collected data are subject to bias, which in turn biases the model estimates [29].
arXiv.org Artificial Intelligence
Aug-24-2022
- Country:
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Genre:
- Research Report (1.00)
- Industry:
- Government (0.68)
- Technology: