AITopics

2502.01425

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.34)

Industry: Marketing (0.74)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.67)

arXiv.org Artificial IntelligenceNov-4-2024

Best-Arm Identification in Unimodal Bandits

Poiani, Riccardo, Jourdan, Marc, Kaufmann, Emilie, Degenne, Rémy

We study the fixed-confidence best-arm identification problem in unimodal bandits, in which the means of the arms increase with the index of the arm up to their maximum, then decrease. We derive two lower bounds on the stopping time of any algorithm. The instance-dependent lower bound suggests that due to the unimodal structure, only three arms contribute to the leading confidence-dependent cost. However, a worst-case lower bound shows that a linear dependence on the number of arms is unavoidable in the confidence-independent cost. We propose modifications of Track-and-Stop and a Top Two algorithm that leverage the unimodal structure. Both versions of Track-and-Stop are asymptotically optimal for one-parameter exponential families. The Top Two algorithm is asymptotically near-optimal for Gaussian distributions and we prove a non-asymptotic guarantee matching the worse-case lower bound. The algorithms can be implemented efficiently and we demonstrate their competitive empirical performance.

algorithm, artificial intelligence, machine learning, (16 more...)

2411.01898

Country:

Europe > Switzerland (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJun-5-2024

Optimal Multi-Fidelity Best-Arm Identification

Poiani, Riccardo, Degenne, Rémy, Kaufmann, Emilie, Metelli, Alberto Maria, Restelli, Marcello

In bandit best-arm identification, an algorithm is tasked with finding the arm with highest mean reward with a specified accuracy as fast as possible. We study multi-fidelity best-arm identification, in which the algorithm can choose to sample an arm at a lower fidelity (less accurate mean estimate) for a lower cost. Several methods have been proposed for tackling this problem, but their optimality remain elusive, notably due to loose lower bounds on the total cost needed to identify the best arm. Our first contribution is a tight, instance-dependent lower bound on the cost complexity. The study of the optimization problem featured in the lower bound provides new insights to devise computationally efficient algorithms, and leads us to propose a gradient-based approach with asymptotically optimal cost complexity. We demonstrate the benefits of the new algorithm compared to existing methods in experiments. Our theoretical and empirical findings also shed light on an intriguing concept of optimal fidelity for each arm.

algorithm, artificial intelligence, machine learning, (17 more...)

2406.03033

Country:

Europe > Italy (0.14)
Europe > France (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

arXiv.org Artificial IntelligenceMay-27-2024

Finding good policies in average-reward Markov Decision Processes without prior knowledge

Tuynman, Adrienne, Degenne, Rémy, Kaufmann, Emilie

We revisit the identification of an $\varepsilon$-optimal policy in average-reward Markov Decision Processes (MDP). In such MDPs, two measures of complexity have appeared in the literature: the diameter, $D$, and the optimal bias span, $H$, which satisfy $H\leq D$. Prior work have studied the complexity of $\varepsilon$-optimal policy identification only when a generative model is available. In this case, it is known that there exists an MDP with $D \simeq H$ for which the sample complexity to output an $\varepsilon$-optimal policy is $\Omega(SAD/\varepsilon^2)$ where $S$ and $A$ are the sizes of the state and action spaces. Recently, an algorithm with a sample complexity of order $SAH/\varepsilon^2$ has been proposed, but it requires the knowledge of $H$. We first show that the sample complexity required to estimate $H$ is not bounded by any function of $S,A$ and $H$, ruling out the possibility to easily make the previous algorithm agnostic to $H$. By relying instead on a diameter estimation procedure, we propose the first algorithm for $(\varepsilon,\delta)$-PAC policy identification that does not need any form of prior knowledge on the MDP. Its sample complexity scales in $SAD/\varepsilon^2$ in the regime of small $\varepsilon$, which is near-optimal. In the online setting, our first contribution is a lower bound which implies that a sample complexity polynomial in $H$ cannot be achieved in this setting. Then, we propose an online algorithm with a sample complexity in $SAD^2/\varepsilon^2$, as well as a novel approach based on a data-dependent stopping rule that we believe is promising to further reduce this bound.

algorithm, artificial intelligence, machine learning, (15 more...)

2405.17108

Country:

North America > United States (0.18)
Europe > France (0.14)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

arXiv.org Machine LearningNov-6-2023

Non-Asymptotic Analysis of a UCB-based Top Two Algorithm

Jourdan, Marc, Degenne, Rémy

A Top Two sampling rule for bandit identification is a method which selects the next arm to sample from among two candidate arms, a leader and a challenger. Due to their simplicity and good empirical performance, they have received increased attention in recent years. However, for fixed-confidence best arm identification, theoretical guarantees for Top Two methods have only been obtained in the asymptotic regime, when the error level vanishes. In this paper, we derive the first non-asymptotic upper bound on the expected sample complexity of a Top Two algorithm, which holds for any error level. Our analysis highlights sufficient properties for a regret minimization algorithm to be used as leader. These properties are satisfied by the UCB algorithm, and our proposed UCB-based Top Two algorithm simultaneously enjoys non-asymptotic guarantees and competitive empirical performance.

artificial intelligence, data mining, machine learning, (19 more...)

2210.05431

Country: Europe > France (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)

arXiv.org Machine LearningNov-6-2023

An $\varepsilon$-Best-Arm Identification Algorithm for Fixed-Confidence and Beyond

Jourdan, Marc, Degenne, Rémy, Kaufmann, Emilie

We propose EB-TC$\varepsilon$, a novel sampling rule for $\varepsilon$-best arm identification in stochastic bandits. It is the first instance of Top Two algorithm analyzed for approximate best arm identification. EB-TC$\varepsilon$ is an *anytime* sampling rule that can therefore be employed without modification for fixed confidence or fixed budget identification (without prior knowledge of the budget). We provide three types of theoretical guarantees for EB-TC$\varepsilon$. First, we prove bounds on its expected sample complexity in the fixed confidence setting, notably showing its asymptotic optimality in combination with an adaptive tuning of its exploration parameter. We complement these findings with upper bounds on its probability of error at any time and for any error parameter, which further yield upper bounds on its simple regret at any time. Finally, we show through numerical simulations that EB-TC$\varepsilon$ performs favorably compared to existing algorithms, in different settings.

artificial intelligence, data mining, machine learning, (19 more...)

2305.16041

Country: Europe > France (0.13)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.46)

arXiv.org Artificial IntelligenceJun-30-2023

On the Existence of a Complexity in Fixed Budget Bandit Identification

Degenne, Rémy

In fixed budget bandit identification, an algorithm sequentially observes samples from several distributions up to a given final time. It then answers a query about the set of distributions. A good algorithm will have a small probability of error. While that probability decreases exponentially with the final time, the best attainable rate is not known precisely for most identification tasks. We show that if a fixed budget task admits a complexity, defined as a lower bound on the probability of error which is attained by the same algorithm on all bandit problems, then that complexity is determined by the best non-adaptive sampling procedure for that problem. We show that there is no such complexity for several fixed budget identification tasks including Bernoulli best arm identification with two arms: there is no single algorithm that attains everywhere the best possible rate.

artificial intelligence, data mining, machine learning, (17 more...)

2303.09468

Country: Europe > France > Hauts-de-France (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.67)

arXiv.org Artificial IntelligenceJan-23-2023

Dealing with Unknown Variances in Best-Arm Identification

Jourdan, Marc, Degenne, Rémy, Kaufmann, Emilie

The problem of identifying the best arm among a collection of items having Gaussian rewards distribution is well understood when the variances are known. Despite its practical relevance for many applications, few works studied it for unknown variances. In this paper we introduce and analyze two approaches to deal with unknown variances, either by plugging in the empirical variance or by adapting the transportation costs. In order to calibrate our two stopping rules, we derive new time-uniform concentration inequalities, which are of independent interest. Then, we illustrate the theoretical and empirical performances of our two sampling rule wrappers on Track-and-Stop and on a Top Two algorithm. Moreover, by quantifying the impact on the sample complexity of not knowing the variances, we reveal that it is rather small.

artificial intelligence, data mining, machine learning, (19 more...)

2210.00974

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.45)

arXiv.org Artificial IntelligenceNov-2-2021

Dealing With Misspecification In Fixed-Confidence Linear Top-m Identification

Réda, Clémence, Tirinzoni, Andrea, Degenne, Rémy

We study the problem of the identification of m arms with largest means under a fixed error rate $\delta$ (fixed-confidence Top-m identification), for misspecified linear bandit models. This problem is motivated by practical applications, especially in medicine and recommendation systems, where linear models are popular due to their simplicity and the existence of efficient algorithms, but in which data inevitably deviates from linearity. In this work, we first derive a tractable lower bound on the sample complexity of any $\delta$-correct algorithm for the general Top-m identification problem. We show that knowing the scale of the deviation from linearity is necessary to exploit the structure of the problem. We then describe the first algorithm for this setting, which is both practical and adapts to the amount of misspecification. We derive an upper bound to its sample complexity which confirms this adaptivity and that matches the lower bound when $\delta$ $\rightarrow$ 0. Finally, we evaluate our algorithm on both synthetic and real-world data, showing competitive performance with respect to existing baselines.

artificial intelligence, data mining, machine learning, (19 more...)

2111.01479

Country: Europe > France (0.46)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.34)

arXiv.org Machine LearningJul-2-2020

Gamification of Pure Exploration for Linear Bandits

Degenne, Rémy, Ménard, Pierre, Shang, Xuedong, Valko, Michal

We investigate an active pure-exploration setting, that includes best-arm identification, in the context Since the early work of Robbins (1952), a great amount of of linear stochastic bandits. While asymptotically literature explores MAB in their standard stochastic setting optimal algorithms exist for standard multiarm with its numerous extensions and variants. Even-Dar et al. bandits, the existence of such algorithms for (2002) and Bubeck et al. (2009) are among the first to study the best-arm identification in linear bandits has the pure exploration setting for stochastic bandits. A nonexhaustive been elusive despite several attempts to address list of pure exploration game includes best-arm it. First, we provide a thorough comparison and identification (BAI), top-m identification (Kalyanakrishnan new insight over different notions of optimality in & Stone, 2010), threshold bandits (Locatelli et al., 2016), the linear case, including G-optimality, transductive minimum threshold (Kaufmann et al., 2018), signed bandits optimality from optimal experimental design (Ménard, 2019), pure exploration combinatorial bandits and asymptotic optimality.

algorithm, big data, computer game, (18 more...)

2007.00953

Country:

Europe > France (0.45)
North America > United States (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Computer Games (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)