Goto

Collaborating Authors

 approximation oracle




The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle

Neural Information Processing Systems

Thompson sampling (TS) has attracted a lot of interest in the bandit area. It was introduced in the 1930s but has not been theoretically proven until recent years. All of its analysis in the combinatorial multi-armed bandit (CMAB) setting requires an exact oracle to provide optimal solutions with any input. However, such an oracle is usually not feasible since many combinatorial optimization problems are NP-hard and only approximation oracles are available. An example \cite{WangC18} has shown the failure of TS to learn with an approximation oracle. However, this oracle is uncommon and is designed only for a specific problem instance.




Online Improper Learning with an Approximation Oracle

Neural Information Processing Systems

We study the following question: given an efficient approximation algorithm for an optimization problem, can we learn efficiently in the same setting? We give a formal affirmative answer to this question in the form of a reduction from online learning to offline approximate optimization using an efficient algorithm that guarantees near optimal regret. The algorithm is efficient in terms of the number of oracle calls to a given approximation oracle - it makes only logarithmically many such calls per iteration.



Blackwell's Approachability with Approximation Algorithms

Garber, Dan, Massalha, Mhna

arXiv.org Artificial Intelligence

We revisit Blackwell's celebrated approachability problem which considers a repeated vector-valued game between a player and an adversary. Motivated by settings in which the action set of the player or adversary (or both) is difficult to optimize over, for instance when it corresponds to the set of all possible solutions to some NP-Hard optimization problem, we ask what can the player guarantee \textit{efficiently}, when only having access to these sets via approximation algorithms with ratios $\alpha_{\mX} \geq 1$ and $ 1 \geq \alpha_{\mY} > 0$, respectively. Assuming the player has monotone preferences, in the sense that he does not prefer a vector-valued loss $\ell_1$ over $\ell_2$ if $\ell_2 \leq \ell_1$, we establish that given a Blackwell instance with an approachable target set $S$, the downward closure of the appropriately-scaled set $\alpha_{\mX}\alpha_{\mY}^{-1}S$ is \textit{efficiently} approachable with optimal rate. In case only the player's or adversary's set is equipped with an approximation algorithm, we give simpler and more efficient algorithms.


The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle

Neural Information Processing Systems

Thompson sampling (TS) has attracted a lot of interest in the bandit area. It was introduced in the 1930s but has not been theoretically proven until recent years. All of its analysis in the combinatorial multi-armed bandit (CMAB) setting requires an exact oracle to provide optimal solutions with any input. However, such an oracle is usually not feasible since many combinatorial optimization problems are NP-hard and only approximation oracles are available. An example \cite{WangC18} has shown the failure of TS to learn with an approximation oracle. However, this oracle is uncommon and is designed only for a specific problem instance.


Efficient Online Linear Optimization with Approximation Algorithms

Dan Garber

Neural Information Processing Systems

We revisit the problem of online linear optimization in case the set of feasible actions is accessible through an approximated linear optimization oracle with a factor multiplicative approximation guarantee. This setting is in particular interesting since it captures natural online extensions of well-studied offline linear optimization problems which are NP-hard, yet admit efficient approximation algorithms. The goal here is to minimize the -regret which is the natural extension of the standard regret in online learning to this setting. We present new algorithms with significantly improved oracle complexity for both the full information and bandit variants of the problem.