Balancing Performance and Costs in Best Arm Identification
–Neural Information Processing Systems
We consider the problem of identifying the best arm in a multi-armed bandit model. Despite a wealth of literature in the traditional fixed budget and fixed confidence regimes of the best arm identification problem, it still remains a mystery to most practitioners as to how to choose an approach and corresponding budget or confidence parameter. We propose a new formalism to avoid this dilemma altogether by minimizing a risk functional which explicitly balances the performance of the recommended arm and the cost incurred by learning this arm. In this framework, a cost is incurred for each observation during the sampling phase, and upon recommending an arm, a performance penalty is incurred for identifying a suboptimal arm. The learner's goal is to minimize the sum of the penalty and cost. This new regime mirrors the priorities of many practitioners, e.g.
Neural Information Processing Systems
Jun-21-2026, 22:44:32 GMT
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Technology: