AITopics

2503.04518

Country: Europe > France (0.46)

Genre: Research Report (1.00)

Industry:

Food & Agriculture > Agriculture (0.68)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Data Science > Data Mining > Big Data (0.67)

arXiv.org Machine LearningJan-22-2025

The regret lower bound for communicating Markov Decision Processes

Boone, Victor, Maillard, Odalric-Ambrym

This paper is devoted to the extension of the regret lower bound beyond ergodic Markov decision processes (MDPs) in the problem dependent setting. While the regret lower bound for ergodic MDPs is well-known and reached by tractable algorithms, we prove that the regret lower bound becomes significatively more complex in communicating MDPs. Our lower bound revisits the necessary explorative behavior of consistent learning agents and further explains that all optimal regions of the environment must be overvisited compared to sub-optimal ones, a phenomenon that we refer to as co-exploration. In tandem, we show that these two explorative and co-explorative behaviors are intertwined with navigation constraints obtained by scrutinizing the navigation structure at logarithmic scale. The resulting lower bound is expressed as the solution of an optimization problem that, in many standard classes of MDPs, can be specialized to recover existing results. From a computational perspective, it is provably $\Sigma_2^\textrm{P}$-hard in general and as a matter of fact, even testing the membership to the feasible region is coNP-hard. We further provide an algorithm to approximate the lower bound in a constructive way.

artificial intelligence, machine learning, markov decision process, (17 more...)

2501.13013

Country: Europe > France (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.72)

arXiv.org Artificial IntelligenceJan-13-2025

Kriging and Gaussian Process Interpolation for Georeferenced Data Augmentation

Ferber, Frédérick Fabre, Gay, Dominique, Soulié, Jean-Christophe, Diatta, Jean, Maillard, Odalric-Ambrym

Data augmentation is a crucial step in the development of robust supervised learning models, especially when dealing with limited datasets. This study explores interpolation techniques for the augmentation of geo-referenced data, with the aim of predicting the presence of Commelina benghalensis L. in sugarcane plots in La R{\'e}union. Given the spatial nature of the data and the high cost of data collection, we evaluated two interpolation approaches: Gaussian processes (GPs) with different kernels and kriging with various variograms. The objectives of this work are threefold: (i) to identify which interpolation methods offer the best predictive performance for various regression algorithms, (ii) to analyze the evolution of performance as a function of the number of observations added, and (iii) to assess the spatial consistency of augmented datasets. The results show that GP-based methods, in particular with combined kernels (GP-COMB), significantly improve the performance of regression algorithms while requiring less additional data. Although kriging shows slightly lower performance, it is distinguished by a more homogeneous spatial coverage, a potential advantage in certain contexts.

artificial intelligence, interpolation method, machine learning, (14 more...)

2501.07183

Country: Africa > La Réunion (0.15)

Genre: Research Report > New Finding (0.35)

Industry: Energy > Oil & Gas > Upstream (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

arXiv.org Machine LearningJan-10-2025

Interpolation pour l'augmentation de donnees : Application \`a la gestion des adventices de la canne a sucre a la Reunion

Ferber, Frederick Fabre, Gay, Dominique, Soulie, Jean-Christophe, Diatta, Jean, Maillard, Odalric-Ambrym

Data augmentation is a crucial step in the development of robust supervised learning models, especially when dealing with limited datasets. This study explores interpolation techniques for the augmentation of geo-referenced data, with the aim of predicting the presence of Commelina benghalensis L. in sugarcane plots in La R\'eunion. Given the spatial nature of the data and the high cost of data collection, we evaluated two interpolation approaches: Gaussian processes (GPs) with different kernels and kriging with various variograms. The objectives of this work are threefold: (i) to identify which interpolation methods offer the best predictive performance for various regression algorithms, (ii) to analyze the evolution of performance as a function of the number of observations added, and (iii) to assess the spatial consistency of augmented datasets. The results show that GP-based methods, in particular with combined kernels (GP-COMB), significantly improve the performance of regression algorithms while requiring less additional data. Although kriging shows slightly lower performance, it is distinguished by a more homogeneous spatial coverage, a potential advantage in certain contexts.

artificial intelligence, augmentation, machine learning, (15 more...)

2501.124

Country: Africa > La Réunion (0.51)

Genre: Research Report (0.70)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceDec-26-2024

Provably Efficient Exploration in Reward Machines with Low Regret

Bourel, Hippolyte, Jonsson, Anders, Maillard, Odalric-Ambrym, Ma, Chenxiao, Talebi, Mohammad Sadegh

We study reinforcement learning (RL) for decision processes with non-Markovian reward, in which high-level knowledge of the task in the form of reward machines is available to the learner. We consider probabilistic reward machines with initially unknown dynamics, and investigate RL under the average-reward criterion, where the learning performance is assessed through the notion of regret. Our main algorithmic contribution is a model-based RL algorithm for decision processes involving probabilistic reward machines that is capable of exploiting the structure induced by such machines. We further derive high-probability and non-asymptotic bounds on its regret and demonstrate the gain in terms of regret over existing algorithms that could be applied, but obliviously to the structure. We also present a regret lower bound for the studied setting. To the best of our knowledge, the proposed algorithm constitutes the first attempt to tailor and analyze regret specifically for RL with probabilistic reward machines.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2412.19194

Country: Europe > France (0.27)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

arXiv.org Artificial IntelligenceDec-19-2024

Hierarchical Subspaces of Policies for Continual Offline Reinforcement Learning

Kobanda, Anthony, Portelas, Rémy, Maillard, Odalric-Ambrym, Denoyer, Ludovic

In dynamic domains such as autonomous robotics and video game simulations, agents must continuously adapt to new tasks while retaining previously acquired skills. This ongoing process, known as Continual Reinforcement Learning, presents significant challenges, including the risk of forgetting past knowledge and the need for scalable solutions as the number of tasks increases. To address these issues, we introduce HIerarchical LOW-rank Subspaces of Policies (HILOW), a novel framework designed for continual learning in offline navigation settings. HILOW leverages hierarchical policy subspaces to enable flexible and efficient adaptation to new tasks while preserving existing knowledge. We demonstrate, through a careful experimental study, the effectiveness of our method in both classical MuJoCo maze environments and complex video game-like simulations, showcasing competitive performance and satisfying adaptability according to classical continual learning metrics, in particular regarding memory usage. Our work provides a promising framework for real-world applications where continuous learning from pre-collected data is essential.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2412.14865

Country: Europe > France (0.46)

Genre: Research Report > New Finding (0.66)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-4-2024

Power Mean Estimation in Stochastic Monte-Carlo Tree_Search

Dam, Tuan, Maillard, Odalric-Ambrym, Kaufmann, Emilie

Monte-Carlo Tree Search (MCTS) is a widely-used strategy for online planning that combines Monte-Carlo sampling with forward tree search. Its success relies on the Upper Confidence bound for Trees (UCT) algorithm, an extension of the UCB method for multi-arm bandits. However, the theoretical foundation of UCT is incomplete due to an error in the logarithmic bonus term for action selection, leading to the development of Fixed-Depth-MCTS with a polynomial exploration bonus to balance exploration and exploitation~\citep{shah2022journal}. Both UCT and Fixed-Depth-MCTS suffer from biased value estimation: the weighted sum underestimates the optimal value, while the maximum valuation overestimates it~\citep{coulom2006efficient}. The power mean estimator offers a balanced solution, lying between the average and maximum values. Power-UCT~\citep{dam2019generalized} incorporates this estimator for more accurate value estimates but its theoretical analysis remains incomplete. This paper introduces Stochastic-Power-UCT, an MCTS algorithm using the power mean estimator and tailored for stochastic MDPs. We analyze its polynomial convergence in estimating root node values and show that it shares the same convergence rate of $\mathcal{O}(n^{-1/2})$, with $n$ is the number of visited trajectories, as Fixed-Depth-MCTS, with the latter being a special case of the former. Our theoretical results are validated with empirical tests across various stochastic MDP environments.

artificial intelligence, planning & scheduling, stochastic-power-uct, (16 more...)

2406.02235

Country:

North America > United States (0.28)
Europe > France > Hauts-de-France (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation (0.46)
Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)

arXiv.org Machine LearningSep-28-2023

CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption

Agrawal, Shubhada, Mathieu, Timothée, Basu, Debabrota, Maillard, Odalric-Ambrym

We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.

artificial intelligence, data mining, machine learning, (19 more...)

2309.16563

Country:

Europe > France (0.67)
Asia (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.92)
Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining > Big Data (0.67)

arXiv.org Artificial IntelligenceSep-19-2023

Monte-Carlo tree search with uncertainty propagation via optimal transport

Dam, Tuan, Stenger, Pascal, Schneider, Lukas, Pajarinen, Joni, D'Eramo, Carlo, Maillard, Odalric-Ambrym

This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS) designed for highly stochastic and partially observable Markov decision processes. We adopt a probabilistic approach, modeling both value and action-value nodes as Gaussian distributions. We introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node. We study our novel backup operator when using a novel combination of $L^1$-Wasserstein barycenter with $\alpha$-divergence, by drawing a notable connection to the generalized mean backup operator. We complement our probabilistic backup operator with two sampling strategies, based on optimistic selection and Thompson sampling, obtaining our Wasserstein MCTS algorithm. We provide theoretical guarantees of asymptotic convergence to the optimal policy, and an empirical evaluation on several stochastic and partially observable environments, where our approach outperforms well-known related baselines.

artificial intelligence, machine learning, monte-carlo tree search, (17 more...)

2309.10737

Country:

Europe > Germany (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceJul-13-2023

Bregman Deviations of Generic Exponential Families

Chowdhury, Sayak Ray, Saux, Patrick, Maillard, Odalric-Ambrym, Gopalan, Aditya

We revisit the method of mixture technique, also known as the Laplace method, to study the concentration phenomenon in generic exponential families. Combining the properties of Bregman divergence associated with log-partition function of the family with the method of mixtures for super-martingales, we establish a generic bound controlling the Bregman divergence between the parameter of the family and a finite sample estimate of the parameter. Our bound is time-uniform and makes appear a quantity extending the classical information gain to exponential families, which we call the Bregman information gain. For the practitioner, we instantiate this novel bound to several classical families, e.g., Gaussian, Bernoulli, Exponential, Weibull, Pareto, Poisson and Chi-square yielding explicit forms of the confidence sets and the Bregman information gain. We further numerically compare the resulting confidence bounds to state-of-the-art alternatives for time-uniform concentration and show that this novel method yields competitive results. Finally, we highlight the benefit of our concentration bounds on some illustrative applications.

data mining, exponential family, machine learning, (18 more...)

2201.07306

Country: Europe > France (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.45)