The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information
–Neural Information Processing Systems
Epoch-Greedy has the following properties: No knowledge of a time horizon T is necessary. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. Here S is the complexity term in a sample complexity bound for standard supervised learning.
Neural Information Processing Systems
Feb-16-2024, 13:37:47 GMT
- Technology: