Reviews: The Multi-fidelity Multi-armed Bandit
–Neural Information Processing Systems
The paper in my opinion studies an interesting and relevant problem - one of modelling the tradeoff between information, cost and reward (whether to choose low information that is cheap or high information that is expensive) - in online learning, specifically stochastic bandits. In this sense it may be useful as a benchmark to improve upon. Though the paper seems technically solid, a key shortcoming is the lack of adequate explanation about its results and assumptions. The regret definition adopted seems unnatural at least from one angle - why not penalize resource consumption (or'cost') additively instead of multiplicatively as done here? The authors' example of ad-display motivates their definition, but may not be the most general.
Neural Information Processing Systems
Jan-20-2025, 08:35:21 GMT