AITopics | track-and-stop

In pure exploration problems, a statistician sequentially collects information to answer a question about some stochastic and unknown environme nt. The probability of returning a wrong answer should not exceed a maximum risk p arameter δ and good algorithms make as few queries to the environment as pos sible. The Track-and-Stop algorithm is a pioneering method to solve these pro blems. Specifically, it is well-known that it enjoys asymptotic optimality sampl e complexity guarantees for δ 0 whenever the map from the environment to its correct answers is single-valued ( e.g., best-arm identification with a unique optimal arm). The Sticky Track-and-Stop algorithm extends these results to s ettings where, for each environment, there might exist multiple correct answers ( e.g., ǫ -optimal arm identification). Although both methods are optimal in the asympt otic regime, their non-asymptotic guarantees remain unknown. In this work, we fill this gap and provide non-asymptotic guarantees for both algorithms.

algorithm, artificial intelligence, track-and-stop, (14 more...)

arXiv.org Artificial Intelligence

2505.22475

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence (0.93)

Add feedback

Pure Exploration with Infinite Answers

Poiani, Riccardo, Bernasconi, Martino, Celli, Andrea

arXiv.org Artificial IntelligenceMay-29-2025

We study pure exploration problems where the set of correct answers is possibly infinite, e.g., the regression of any continuous function of the means of the bandit. We derive an instance-dependent lower bound for these problems. By analyzing it, we discuss why existing methods (i.e., Sticky Track-and-Stop) for finite answer problems fail at being asymptotically optimal in this more general setting. Finally, we present a framework, Sticky-Sequence Track-and-Stop, which generalizes both Track-and-Stop and Sticky Track-and-Stop, and that enjoys asymptotic optimality. Due to its generality, our analysis also highlights special cases where existing methods enjoy optimality.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.22473

Genre:

Research Report (0.49)
Workflow (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits

Barrier, Antoine, Garivier, Aurélien, Kocák, Tomáš

arXiv.org Machine LearningMay-27-2021

We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy called Exploration-Biased Sampling is not only asymptotically optimal: we also prove non-asymptotic bounds occurring with high probability. To the best of our knowledge, this is the first strategy with such guarantees. But the main advantage over other algorithms like Track-and-Stop is an improved behavior regarding exploration: Exploration-Biased Sampling is slightly biased in favor of exploration in a subtle but natural way that makes it more stable and interpretable. These improvements are allowed by a new analysis of the sample complexity optimization problem, which yields a faster numerical resolution scheme and several quantitative regularity results that we believe of high independent interest.

best-arm identification, exploration-biased sampling, non-asymptotic approach, (13 more...)

arXiv.org Machine Learning

2105.12978

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Non-Asymptotic Pure Exploration by Solving Games

Degenne, Rémy, Koolen, Wouter M., Ménard, Pierre

arXiv.org Machine LearningJun-25-2019

Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds (for multi-armed bandit models with arms in an exponential family) reveal that the sample complexity is determined by the solution to an optimisation problem. The existing state of the art algorithms achieve asymptotic optimality by solving a plug-in estimate of that optimisation problem at each step. We interpret the optimisation problem as an unknown game, and propose sampling rules based on iterative strategies to estimate and converge to its saddle point. We apply no-regret learners to obtain the first finite confidence guarantees that are adapted to the exponential family and which apply to any pure exploration query and bandit structure. Moreover, our algorithms only use a best response oracle instead of fully solving the optimisation problem.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1906.10431

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

Pure Exploration with Multiple Correct Answers

Degenne, Rémy, Koolen, Wouter M.

arXiv.org Machine LearningFeb-9-2019

We determine the sample complexity of pure exploration bandit problems with multiple good answers. We derive a lower bound using a new game equilibrium argument. We show how continuity and convexity properties of single-answer problems ensures that the Track-and-Stop algorithm has asymptotically optimal sample complexity. However, that convexity is lost when going to the multiple-answer setting. We present a new algorithm which extends Track-and-Stop to the multiple-answer case and has asymptotic sample complexity matching the lower bound.

algorithm, pure exploration, track-and-stop, (13 more...)

arXiv.org Machine Learning

1902.03475

Country:

North America > United States > New York > New York County > New York City (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
(8 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

Learning the distribution with largest mean: two bandit frameworks

Kaufmann, Emilie, Garivier, Aurélien

arXiv.org Machine LearningNov-7-2017

Over the past few years, the multi-armed bandit model has become increasingly popular in the machine learning community, partly because of applications including online content optimization. This paper reviews two different sequential learning tasks that have been considered in the bandit literature ; they can be formulated as (sequentially) learning which distribution has the highest mean among a set of distributions, with some constraints on the learning process. For both of them (regret minimization and best arm identification) we present recent, asymptotically optimal algorithms. We compare the behaviors of the sampling rule of each algorithm as well as the complexity terms associated to each problem.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1702.00001

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology: