AITopics | Boone, Victor

Collaborating Authors

Boone, Victor

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Logarithmic Regret of Exploration in Average Reward Markov Decision Processes

Boone, Victor, Gaujal, Bruno

arXiv.org Machine LearningFeb-10-2025

In average reward Markov decision processes, state-of-the-art algorithms for regret minimization follow a well-established framework: They are model-based, optimistic and episodic. First, they maintain a confidence region from which optimistic policies are computed using a well-known subroutine called Extended Value Iteration (EVI). Second, these policies are used over time windows called episodes, each ended by the Doubling Trick (DT) rule or a variant thereof. In this work, without modifying EVI, we show that there is a significant advantage in replacing (DT) by another simple rule, that we call the Vanishing Multiplicative (VM) rule. When managing episodes with (VM), the algorithm's regret is, both in theory and in practice, as good if not better than with (DT), while the one-shot behavior is greatly improved. More specifically, the management of bad episodes (when sub-optimal policies are being used) is much better under (VM) than (DT) by making the regret of exploration logarithmic rather than linear. These results are made possible by a new in-depth understanding of the contrasting behaviors of confidence regions during good and bad episodes.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2502.0648

Country: North America > United States > Virginia (0.14)

Genre:

Research Report (0.49)
Workflow (0.47)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.62)

Add feedback

The regret lower bound for communicating Markov Decision Processes

Boone, Victor, Maillard, Odalric-Ambrym

arXiv.org Machine LearningJan-22-2025

This paper is devoted to the extension of the regret lower bound beyond ergodic Markov decision processes (MDPs) in the problem dependent setting. While the regret lower bound for ergodic MDPs is well-known and reached by tractable algorithms, we prove that the regret lower bound becomes significatively more complex in communicating MDPs. Our lower bound revisits the necessary explorative behavior of consistent learning agents and further explains that all optimal regions of the environment must be overvisited compared to sub-optimal ones, a phenomenon that we refer to as co-exploration. In tandem, we show that these two explorative and co-explorative behaviors are intertwined with navigation constraints obtained by scrutinizing the navigation structure at logarithmic scale. The resulting lower bound is expressed as the solution of an optimization problem that, in many standard classes of MDPs, can be specialized to recover existing results. From a computational perspective, it is provably $\Sigma_2^\textrm{P}$-hard in general and as a matter of fact, even testing the membership to the feasible region is coNP-hard. We further provide an algorithm to approximate the lower bound in a constructive way.

artificial intelligence, machine learning, markov decision process, (17 more...)

arXiv.org Machine Learning

2501.13013

Country: Europe > France (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

Add feedback

The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies

Boone, Victor

arXiv.org Machine LearningNov-30-2023

This paper studies the one-shot behavior of no-regret algorithms for stochastic bandits. Although many algorithms are known to be asymptotically optimal with respect to the expected regret, over a single run, their pseudo-regret seems to follow one of two tendencies: it is either smooth or bumpy. To measure this tendency, we introduce a new notion: the sliding regret, that measures the worst pseudo-regret over a time-window of fixed length sliding to infinity. We show that randomized methods (e.g. Thompson Sampling and MED) have optimal sliding regret, while index policies, although possibly asymptotically optimal for the expected regret, have the worst possible sliding regret under regularity conditions on their index (e.g. UCB, UCB-V, KL-UCB, MOSS, IMED etc.). We further analyze the average bumpiness of the pseudo-regret of index policies via the regret of exploration, that we show to be suboptimal as well.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

2311.18437

Country:

Europe (0.28)
North America > United States > Arizona (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

The equivalence of dynamic and strategic stability under regularized learning in games

Boone, Victor, Mertikopoulos, Panayotis

arXiv.org Artificial IntelligenceNov-4-2023

In this paper, we examine the long-run behavior of regularized, no-regret learning in finite games. A well-known result in the field states that the empirical frequencies of no-regret play converge to the game's set of coarse correlated equilibria; however, our understanding of how the players' actual strategies evolve over time is much more limited - and, in many cases, non-existent. This issue is exacerbated further by a series of recent results showing that only strict Nash equilibria are stable and attracting under regularized learning, thus making the relation between learning and pointwise solution concepts particularly elusive. In lieu of this, we take a more general approach and instead seek to characterize the \emph{setwise} rationality properties of the players' day-to-day play. To that end, we focus on one of the most stringent criteria of setwise strategic stability, namely that any unilateral deviation from the set in question incurs a cost to the deviator - a property known as closedness under better replies (club). In so doing, we obtain a far-reaching equivalence between strategic and dynamic stability: a product of pure strategies is closed under better replies if and only if its span is stable and attracting under regularized learning. In addition, we estimate the rate of convergence to such sets, and we show that methods based on entropic regularization (like the exponential weights algorithm) converge at a geometric rate, while projection-based methods converge within a finite number of iterations, even with bandit, payoff-based feedback.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2311.02407

Country:

North America > United States > Kansas > Cowley County (0.24)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Workflow (0.68)
Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Mathematics of Computing (0.64)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback