The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle
–Neural Information Processing Systems
Thompson sampling (TS) has attracted a lot of interest in the bandit area. It was introduced in the 1930s but has not been theoretically proven until recent years. All of its analysis in the combinatorial multi-armed bandit (CMAB) setting requires an exact oracle to provide optimal solutions with any input. However, such an oracle is usually not feasible since many combinatorial optimization problems are NP-hard and only approximation oracles are available. An example \cite{WangC18} has shown the failure of TS to learn with an approximation oracle. However, this oracle is uncommon and is designed only for a specific problem instance.
Neural Information Processing Systems
Dec-25-2025, 02:00:28 GMT
- Technology: