AITopics | Ménard, Pierre

Collaborating Authors

Ménard, Pierre

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

Domingues, Omar Darwiche, Ménard, Pierre, Kaufmann, Emilie, Valko, Michal

arXiv.org Machine LearningOct-7-2020

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a novel lower bound of $\Omega((H^3SA/\epsilon^2)\log(1/\delta))$ on the sample complexity of an $(\varepsilon,\delta)$-PAC algorithm for best policy identification in a non-stationary MDP. This lower bound relies on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the $\Omega(\sqrt{H^3SAT})$ regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.

algorithm, artificial intelligence, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2010.03531

Country: Europe > France (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

Optimal Strategies for Graph-Structured Bandits

Saber, Hassan, Ménard, Pierre, Maillard, Odalric-Ambrym

arXiv.org Machine LearningJul-10-2020

We study a structured variant of the multi-armed bandit problem specified by a set of Bernoulli distributions $ \nu \!= \!(\nu\_{a,b})\_{a \in \mathcal{A}, b \in \mathcal{B}}$ with means $(\mu\_{a,b})\_{a \in \mathcal{A}, b \in \mathcal{B}}\!\in\![0,1]^{\mathcal{A}\times\mathcal{B}}$ and by a given weight matrix $\omega\!=\! (\omega\_{b,b'})\_{b,b' \in \mathcal{B}}$, where $ \mathcal{A}$ is a finite set of arms and $ \mathcal{B} $ is a finite set of users. The weight matrix $\omega$ is such that for any two users $b,b'\!\in\!\mathcal{B}, \text{max}\_{a\in\mathcal{A}}|\mu\_{a,b} \!-\! \mu\_{a,b'}| \!\leq\! \omega\_{b,b'} $. This formulation is flexible enough to capture various situations, from highly-structured scenarios ($\omega\!\in\!\{0,1\}^{\mathcal{B}\times\mathcal{B}}$) to fully unstructured setups ($\omega\!\equiv\! 1$).We consider two scenarios depending on whether the learner chooses only the actions to sample rewards from or both users and actions. We first derive problem-dependent lower bounds on the regret for this generic graph-structure that involves a structure dependent linear programming problem. Second, we adapt to this setting the Indexed Minimum Empirical Divergence (IMED) algorithm introduced by Honda and Takemura (2015), and introduce the IMED-GS$^\star$ algorithm. Interestingly, IMED-GS$^\star$ does not require computing the solution of the linear programming problem more than about $\log(T)$ times after $T$ steps, while being provably asymptotically optimal. Also, unlike existing bandit strategies designed for other popular structures, IMED-GS$^\star$ does not resort to an explicit forced exploration scheme and only makes use of local counts of empirical events. We finally provide numerical illustration of our results that confirm the performance of IMED-GS$^\star$.

big data, imed-g, optimization problem, (19 more...)

arXiv.org Machine Learning

2007.03224

Country: Europe > France (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces

Domingues, Omar Darwiche, Ménard, Pierre, Pirotta, Matteo, Kaufmann, Emilie, Valko, Michal

arXiv.org Machine LearningJul-9-2020

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a metric. Using a non-parametric model of the MDP built with time-dependent kernels, we prove a regret bound that scales with the covering dimension of the state-action space and the total variation of the MDP with time, which quantifies its level of non-stationarity. Our method generalizes previous approaches based on sliding windows and exponential discounting used to handle changing environments. We further propose a practical implementation of KeRNS, we analyze its regret and validate it experimentally.

artificial intelligence, reinforcement learning, rs-kern, (18 more...)

arXiv.org Machine Learning

2007.05078

Country: Europe > France (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.40)

Add feedback

Gamification of Pure Exploration for Linear Bandits

Degenne, Rémy, Ménard, Pierre, Shang, Xuedong, Valko, Michal

arXiv.org Machine LearningJul-2-2020

We investigate an active pure-exploration setting, that includes best-arm identification, in the context Since the early work of Robbins (1952), a great amount of of linear stochastic bandits. While asymptotically literature explores MAB in their standard stochastic setting optimal algorithms exist for standard multiarm with its numerous extensions and variants. Even-Dar et al. bandits, the existence of such algorithms for (2002) and Bubeck et al. (2009) are among the first to study the best-arm identification in linear bandits has the pure exploration setting for stochastic bandits. A nonexhaustive been elusive despite several attempts to address list of pure exploration game includes best-arm it. First, we provide a thorough comparison and identification (BAI), top-m identification (Kalyanakrishnan new insight over different notions of optimality in & Stone, 2010), threshold bandits (Locatelli et al., 2016), the linear case, including G-optimality, transductive minimum threshold (Kaufmann et al., 2018), signed bandits optimality from optimal experimental design (Ménard, 2019), pure exploration combinatorial bandits and asymptotic optimality.

algorithm, big data, computer game, (18 more...)

arXiv.org Machine Learning

2007.00953

Country:

Europe > France (0.45)
North America > United States (0.28)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.84)

Industry: Leisure & Entertainment > Games > Computer Games (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Forced-exploration free Strategies for Unimodal Bandits

Saber, Hassan, Ménard, Pierre, Maillard, Odalric-Ambrym

arXiv.org Machine LearningJun-30-2020

We consider a multi-armed bandit problem specified by a set of Gaussian or Bernoulli distributions endowed with a unimodal structure. Although this problem has been addressed in the literature (Combes and Proutiere, 2014), the state-of-the-art algorithms for such structure make appear a forced-exploration mechanism. We introduce IMED-UB, the first forced-exploration free strategy that exploits the unimodal-structure, by adapting to this setting the Indexed Minimum Empirical Divergence (IMED) strategy introduced by Honda and Takemura (2015). This strategy is proven optimal. We then derive KLUCB-UB, a KLUCB version of IMED-UB, which is also proven optimal. Owing to our proof technique, we are further able to provide a concise finite-time analysis of both strategies in an unified way. Numerical experiments show that both IMED-UB and KLUCB-UB perform similarly in practice and outperform the state-of-the-art algorithms.

artificial intelligence, big data, imed-ub, (18 more...)

arXiv.org Machine Learning

2006.16569

Country: Europe (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

Jonsson, Anders, Kaufmann, Emilie, Ménard, Pierre, Domingues, Omar Darwiche, Leurent, Edouard, Valko, Michal

arXiv.org Machine LearningJun-10-2020

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.

algorithm, artificial intelligence, planning & scheduling, (15 more...)

arXiv.org Machine Learning

2006.05879

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

Fixed-Confidence Guarantees for Bayesian Best-Arm Identification

Shang, Xuedong, de Heide, Rianne, Kaufmann, Emilie, Ménard, Pierre, Valko, Michal

arXiv.org Machine LearningOct-28-2019

In particular, we justify its use for fixed-confidence best-arm identification . We further propose a variant of TTTS called Top-Two Transportation Cost ( T3C), which disposes of the computational burden of TTTS . As our main contribution, we provide the first sample complexity analysis of TTTS and T3C when coupled with a very natural Bayesian stopping rule, for bandits with Gaussian rewards, solving one of the open questions raised by Russo (2016). We also provide new posterior convergence results for TTTS under two models that are commonly used in practice: bandits with Gaussian and Bernoulli rewards and conjugate priors. 1 Introduction In multi-armed bandits, a learner repeatedly chooses an arm to play, and receives a reward from the associated unknown probability distribution. When the task is best-arm identification (BAI), the learner is not only asked to sample an arm at each stage, but is also asked to output a recommendation (i.e., a guess for the arm with the largest mean reward) after a certain period.

artificial intelligence, big data, null, (20 more...)

arXiv.org Machine Learning

1910.10945

Genre:

Workflow (0.46)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)
Information Technology > Data Science > Data Mining > Big Data (0.66)

Add feedback

Non-Asymptotic Pure Exploration by Solving Games

Degenne, Rémy, Koolen, Wouter M., Ménard, Pierre

arXiv.org Machine LearningJun-25-2019

Pure exploration (aka active testing) is the fundamental task of sequentially gathering information to answer a query about a stochastic environment. Good algorithms make few mistakes and take few samples. Lower bounds (for multi-armed bandit models with arms in an exponential family) reveal that the sample complexity is determined by the solution to an optimisation problem. The existing state of the art algorithms achieve asymptotic optimality by solving a plug-in estimate of that optimisation problem at each step. We interpret the optimisation problem as an unknown game, and propose sampling rules based on iterative strategies to estimate and converge to its saddle point. We apply no-regret learners to obtain the first finite confidence guarantees that are adapted to the exponential family and which apply to any pure exploration query and bandit structure. Moreover, our algorithms only use a best response oracle instead of fully solving the optimisation problem.

algorithm, artificial intelligence, big data, (17 more...)

arXiv.org Machine Learning

1906.10431

Country:

Europe (0.28)
North America > United States > New York (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.48)

Add feedback

Gradient Ascent for Active Exploration in Bandit Problems

Ménard, Pierre

arXiv.org Machine LearningMay-20-2019

We present a new algorithm based on an gradient ascent for a general Active Exploration bandit problem in the fixed confidence setting. This problem encompasses several well studied problems such that the Best Arm Identification or Thresholding Bandits. It consists of a new sampling rule based on an online lazy mirror ascent. We prove that this algorithm is asymptotically optimal and, most importantly, computationally efficient.

algorithm, artificial intelligence, big data, (20 more...)

arXiv.org Machine Learning

1905.08165

Country: North America > United States > New York (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

A minimax and asymptotically optimal algorithm for stochastic bandits

Ménard, Pierre, Garivier, Aurélien

arXiv.org Machine LearningSep-20-2017

We propose the kl-UCB ++ algorithm for regret minimization in stochastic bandit models with exponential families of distributions. We prove that it is simultaneously asymptotically optimal (in the sense of Lai and Robbins' lower bound) and minimax optimal. This is the first algorithm proved to enjoy these two properties at the same time. This work thus merges two different lines of research with simple and clear proofs.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1702.07211

Country:

Europe > France (0.14)
Europe > Hungary (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.63)
Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback