Multi-Armed Bandits with Metric Movement Costs
Tomer Koren, Roi Livni, Yishay Mansour
–Neural Information Processing Systems
We consider the non-stochastic Multi-Armed Bandit problem in a setting where there is a fixed and known metric on the action space that determines a cost for switching between any pair of actions. The loss of the online learner has two components: the first is the usual loss of the selected actions, and the second is an additional loss due to switching between actions. Our main contribution gives a tight characterization of the expected minimax regret in this setting, in terms of a complexity measure C of the underlying metric which depends on its covering numbers.
Neural Information Processing Systems
Oct-4-2024, 04:11:11 GMT
- Country:
- Asia > Middle East
- Israel > Tel Aviv District > Tel Aviv (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- California > Los Angeles County
- Long Beach (0.04)
- Vermont > Chittenden County
- Burlington (0.04)
- California > Los Angeles County
- Asia > Middle East
- Technology: