Sequential Test for the Lowest Mean: From Thompson to Murphy Sampling
Kaufmann, Emilie, Koolen, Wouter, Garivier, Aurelien
Learning the minimum/maximum mean among a finite set of distributions is a fundamental sub-task in planning, game tree search and reinforcement learning. We formalize this learning task as the problem of sequentially testing how the minimum mean among a finite set of distributions compares to a given threshold. We develop refined non-asymptotic lower bounds, which show that optimality mandates very different sampling behavior for a low vs high true minimum. We show that Thompson Sampling and the intuitive Lower Confidence Bounds policy each nail only one of these cases. We develop a novel approach that we call Murphy Sampling. Even though it entertains exclusively low true minima, we prove that MS is optimal for both possibilities. We then design advanced self-normalized deviation inequalities, fueling more aggressive stopping rules. We complement our theoretical guarantees by experiments showing that MS works best in practice.
Jun-4-2018
- Country:
- Asia > Japan
- Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Europe
- France
- Hauts-de-France > Nord
- Lille (0.04)
- Occitanie > Haute-Garonne
- Toulouse (0.04)
- Hauts-de-France > Nord
- Netherlands > North Holland
- Amsterdam (0.04)
- France
- North America > United States
- California > Los Angeles County
- Long Beach (0.04)
- New York
- Bronx County > New York City (0.04)
- Kings County > New York City (0.04)
- New York County > New York City (0.04)
- Queens County > New York City (0.04)
- Richmond County > New York City (0.04)
- California > Los Angeles County
- Asia > Japan
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment > Games (0.68)
- Technology: