Double Thompson Sampling for Dueling Bandits Huasen Wu

Mar-12-2024, 15:14:28 GMT–Neural Information Processing Systems

In this paper, we propose a Double Thompson Sampling (D-TS) algorithm for dueling bandit problems. As its name suggests, D-TS selects both the first and the second candidates according to Thompson Sampling. Specifically, D-TS maintains a posterior distribution for the preference matrix, and chooses the pair of arms for comparison according to two sets of samples independently drawn from the posterior distribution. This simple algorithm applies to general Copeland dueling bandits, including Condorcet dueling bandits as a special case.

algorithm, bandit, dueling bandit, (14 more...)

Neural Information Processing Systems

Mar-12-2024, 15:14:28 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - California > Yolo County > Davis (0.04)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.69)