Beyond Primal-Dual Methods in Bandits with Stochastic and Adversarial Constraints Andrea Celli Federico Fusco
–Neural Information Processing Systems
We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform optimally under both stochastic and adversarial constraints. Previous works address this problem via primal-dual methods, and require some stringent assumptions, namely the Slater's condition, and in adversarial settings, they either assume knowledge of a lower bound on the Slater's parameter, or impose strong requirements on the primal and dual regret minimizers such as requiring weak adaptivity. We propose an alternative and more natural approach based on optimistic estimations of the constraints. Surprisingly, we show that estimating the constraints with an UCBlike approach guarantees optimal performances.
Neural Information Processing Systems
Mar-18-2025, 09:38:43 GMT
- Technology: