general tournament solution
Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions
Recent work on deriving $O(\log T)$ anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show $O(\log T)$ anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest.
Reviews: Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions
This paper deals with the dueling bandit problem in situations where the goal is to find other notions of winners than the Condorcet one (which is not guaranteed to exist) or the Copeland one (which does always exist, but has certain flaws). Unlike the MAB problem, the question of what constitutes the best arm(s) is not completely straightforward to answer in the dueling setting, in particular when there is no single arm that is preferred to all other arms (aka the Condorcet winner). Indeed, the field of social choice theory has devised a near infinite supply of such notions of winners: the reason for this overabundance of definitions is that in the absence of a Condorcet winner, it is difficult to find a definition that satisfies all reasonable properties that one might hope for, so various definitions have been proposed that satisfy various sets of desirable properties. Given that, it is natural to seek algorithms that can find these other winners, since depending on the application domain, it might very well be that the Condorcet winner does not exist and that the Copeland winner is undesirable. With this in mind, the authors propose a flexible algorithm that is capable of accommodating such needs, given certain rather weak assumptions on the sought after notion of winner.
- Information Technology > Artificial Intelligence > Cognitive Science (0.58)
- Information Technology > Data Science > Data Mining > Big Data (0.38)
Dueling Bandits: Beyond Condorcet Winners to General Tournament Solutions
Ramamohan, Siddartha Y., Rajkumar, Arun, Agarwal, Shivani, Agarwal, Shivani
Recent work on deriving $O(\log T)$ anytime regret bounds for stochastic dueling bandit problems has considered mostly Condorcet winners, which do not always exist, and more recently, winners defined by the Copeland set, which do always exist. In this work, we consider a broad notion of winners defined by tournament solutions in social choice theory, which include the Copeland set as a special case but also include several other notions of winners such as the top cycle, uncovered set, and Banks set, and which, like the Copeland set, always exist. We develop a family of UCB-style dueling bandit algorithms for such general tournament solutions, and show $O(\log T)$ anytime regret bounds for them. Experiments confirm the ability of our algorithms to achieve low regret relative to the target winning set of interest. Papers published at the Neural Information Processing Systems Conference.