Double Thompson Sampling for Dueling Bandits Huasen Wu
–Neural Information Processing Systems
In this paper, we propose a Double Thompson Sampling (D-TS) algorithm for dueling bandit problems. As its name suggests, D-TS selects both the first and the second candidates according to Thompson Sampling. Specifically, D-TS maintains a posterior distribution for the preference matrix, and chooses the pair of arms for comparison according to two sets of samples independently drawn from the posterior distribution. This simple algorithm applies to general Copeland dueling bandits, including Condorcet dueling bandits as a special case.
Neural Information Processing Systems
Mar-12-2024, 15:14:28 GMT
- Country:
- North America > United States
- California > Yolo County > Davis (0.04)
- Europe
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Spain > Catalonia
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- North America > United States
- Technology: