Online allocation and homogeneous partitioning for constant mean approximation
–Neural Information Processing Systems
In the setting of active learning for the multi-armed bandit, where the goal of a learner is to estimate with equal precision the mean of a finite number of arms, recent results show that it is possible to derive strategies based on finite-time confidence bounds that are competitive with the best possible strategy.
Neural Information Processing Systems
Mar-14-2024, 20:34:31 GMT