Closing the Computational-Statistical Gap in Best Arm Identification for Combinatorial Semi-bandits
–Neural Information Processing Systems
An efficient method to design statistically optimal algorithms solving active learning tasks (e.g., regret minimization or pure exploration in bandits and reinforcement learning) consists in the following two-step procedure.
Neural Information Processing Systems
Mar-21-2025, 08:48:37 GMT