Copeland Dueling Bandits

Masrour Zoghi, Zohar S. Karnin, Shimon Whiteson, Maarten de Rijke

Neural Information Processing Systems 

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found