From Contextual Combinatorial Semi-Bandits to Bandit List Classification: Improved Sample Complexity with Sparse Rewards

Jun-14-2026, 06:02:37 GMT–Neural Information Processing Systems

We study the problem of contextual combinatorial semi-bandits, where input contexts are mapped into subsets of size $m$ of a collection of $K$ possible actions. In each round of the interaction, the learner observes feedback consisting of the realized reward of the predicted actions. Motivated by prototypical applications of contextual bandits, we focus on the $s$-sparse regime where we assume that the sum of rewards is bounded by some value $s \ll K$. For example, in recommendation systems the number of products purchased by any customer is significantly smaller than the total number of available products. Our main result is for the $(\varepsilon,\delta)$-PAC variant of the problem for which we design an algorithm that returns an $\varepsilon$-optimal policy with high probability using a sample complexity of $\widetilde{O}\big( (\mathrm{poly}(K/m) + sm / \varepsilon^2) \log (|\Pi|/\delta) \big)$ where $\Pi$ is the underlying (finite) class and $s$ is the sparsity parameter.

artificial intelligence, machine learning, proceedings, (10 more...)

Neural Information Processing Systems

Jun-14-2026, 06:02:37 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)