An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints

Jan-19-2025, 04:29:37 GMT–Neural Information Processing Systems

This paper considers stochastic linear bandits with general nonlinear constraints. The objective is to maximize the expected cumulative reward over horizon T subject to a set of constraints in each round \tau\leq T . We propose a pessimistic-optimistic algorithm for this problem, which is efficient in two aspects. First, the algorithm yields \tilde{\cal O}\left(\left(\frac{K {0.75}}{\delta} d\right)\sqrt{\tau}\right) (pseudo) regret in round \tau\leq T, where K is the number of constraints, d is the dimension of the reward feature space, and \delta is a Slater's constant; and {\em zero} constraint violation in any round \tau \tau', where \tau' is {\em independent} of horizon T. Second, the algorithm is computationally efficient. Our algorithm is based on the primal-dual approach in optimization and includes two components.

artificial intelligence, constraint-based reasoning, efficient pessimistic-optimistic algorithm, (7 more...)

Neural Information Processing Systems

Jan-19-2025, 04:29:37 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.62)