PRODuctive bandits: Importance Weighting No More
–Neural Information Processing Systems
Prod is a seminal algorithm in full-information online learning, which has been conjectured to be fundamentally sub-optimal for multi-armed bandits.By leveraging the interpretation of Prod as a first-order OMD approximation, we present the following surprising results:1. Variants of Prod can obtain optimal regret for adversarial multi-armed bandits.
Neural Information Processing Systems
Dec-26-2025, 17:20:14 GMT
- Technology: