The Batch Complexity of Bandit Pure Exploration
Tuynman, Adrienne, Degenne, Rémy
A Multi Armed Bandit (MAB) is a model of a sequential interaction that was introduced in (Thompson, 1933) to create better medical trials. This framework has since been expanded to various fields, and has seen applications to online advertising and recommendation systems. In a MAB, an algorithm chooses at each time an arm among a finite number (it pulls it) and then observes a sample from a probability distribution associated with the arm. The goal of the interaction will be to identify quickly which arm has the distribution with highest mean. By making use of past observed rewards to continuously update the way they sample, MAB algorithms reach their objective faster than traditional fixed randomized trials. For applications like online advertising, obtaining feedback can be quick, if for example the feedback is a click on an advertisement.
Feb-3-2025
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > Experimental Study (0.34)
- Industry:
- Marketing (0.74)
- Technology: