True Impact of Cascade Length in Contextual Cascading Bandits
–Neural Information Processing Systems
We revisit the contextual cascading bandit, where a learning agent recommends an ordered list (\emph{cascade}) of items, and a user scans the list sequentially, stopping at the first attractive item. Although cascading bandits underpin various applications including recommender systems and search engines, the role of the cascade length $K$ in shaping regret has remained unclear. Contrary to prior results that regret grows with $K$, we prove that regret actually \emph{decreases} once $K$ is large enough. Leveraging this insight, we design a new upper-confidence-bound algorithm built on online mirror descent that attains the sharpest known regret upper bound, $\tilde{\mathcal{O}}\bigl(\min \lbrace K\bar{p}^{K-1}, 1 \rbrace d \sqrt{T}\bigr)$ for contextual cascading bandits.
Neural Information Processing Systems
Jun-12-2026, 22:48:07 GMT
- Technology: