AITopics

doi: 10.3150/21-BEJ1338

1912.05737

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Machine LearningDec-24-2024

Convergence of Statistical Estimators via Mutual Information Bounds

Khribch, El Mahdi, Alquier, Pierre

Recent advances in statistical learning theory have revealed profound connections between mutual information (MI) bounds, PAC-Bayesian theory, and Bayesian nonparametrics. This work introduces a novel mutual information bound for statistical models. The derived bound has wide-ranging applications in statistical inference. It yields improved contraction rates for fractional posteriors in Bayesian nonparametrics. It can also be used to study a wide range of estimation methods, such as variational inference or Maximum Likelihood Estimation (MLE). By bridging these diverse areas, this work advances our understanding of the fundamental limits of statistical inference and the role of information in learning from data. We hope that these results will not only clarify connections between statistical inference and information theory but also help to develop a new toolbox to study a wide range of estimators.

artificial intelligence, assumption 1, machine learning, (18 more...)

2412.18539

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

arXiv.org Machine LearningOct-28-2024

Minimax optimality of deep neural networks on dependent data via PAC-Bayes bounds

Alquier, Pierre, Kengne, William

In a groundbreaking work, Schmidt-Hieber (2020) proved the minimax optimality of deep neural networks with ReLu activation for least-square regression estimation over a large class of functions defined by composition. In this paper, we extend these results in many directions. First, we remove the i.i.d. assumption on the observations, to allow some time dependence. The observations are assumed to be a Markov chain with a non-null pseudo-spectral gap. Then, we study a more general class of machine learning problems, which includes least-square and logistic regression as special cases. Leveraging on PAC-Bayes oracle inequalities and a version of Bernstein inequality due to Paulin (2015), we derive upper bounds on the estimation risk for a generalized Bayesian estimator. In the case of least-square regression, this bound matches (up to a logarithmic factor) the lower bound of Schmidt-Hieber (2020). We establish a similar lower bound for classification with the logistic loss, and prove that the proposed DNN estimator is optimal in the minimax sense.

artificial intelligence, machine learning, neural network, (18 more...)

2410.21702

Country: North America > United States > New York (0.14)

Genre: Research Report > New Finding (0.34)

arXiv.org Machine LearningMay-23-2024

Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning

Sakhi, Otmane, Aouali, Imad, Alquier, Pierre, Chopin, Nicolas

This work investigates the offline formulation of the contextual bandit problem, where the goal is to leverage past interactions collected under a behavior policy to evaluate, select, and learn new, potentially better-performing, policies. Motivated by critical applications, we move beyond point estimators. Instead, we adopt the principle of pessimism where we construct upper bounds that assess a policy's worst-case performance, enabling us to confidently select and learn improved policies. Precisely, we introduce novel, fully empirical concentration bounds for a broad class of importance weighting risk estimators. These bounds are general enough to cover most existing estimators and pave the way for the development of new ones. In particular, our pursuit of the tightest bound within this class motivates a novel estimator (LS), that logarithmically smooths large importance weights. The bound for LS is provably tighter than all its competitors, and naturally results in improved policy selection and learning strategies. Extensive policy evaluation, selection, and learning experiments highlight the versatility and favorable performance of LS.

artificial intelligence, data mining, machine learning, (17 more...)

2405.14335

Country:

North America > United States > Virginia (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

arXiv.org Artificial IntelligenceMay-27-2023

PAC-Bayesian Offline Contextual Bandits With Guarantees

Sakhi, Otmane, Alquier, Pierre, Chopin, Nicolas

This paper introduces a new principled approach for off-policy learning in contextual bandits. Unlike previous work, our approach does not derive learning principles from intractable or loose bounds. We analyse the problem through the PAC-Bayesian lens, interpreting policies as mixtures of decision rules. This allows us to propose novel generalization bounds and provide tractable algorithms to optimize them. We prove that the derived bounds are tighter than their competitors, and can be optimized directly to confidently improve upon the logging policy offline. Our approach learns policies with guarantees, uses all available data and does not require tuning additional hyperparameters on held-out sets. We demonstrate through extensive experiments the effectiveness of our approach in providing performance guarantees in practical scenarios.

artificial intelligence, data mining, machine learning, (15 more...)

2210.13132

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Artificial IntelligenceFeb-22-2023

Bayes meets Bernstein at the Meta Level: an Analysis of Fast Rates in Meta-Learning with PAC-Bayes

Riou, Charles, Alquier, Pierre, Chérief-Abdellatif, Badr-Eddine

Bernstein's condition is a key assumption that guarantees fast rates in machine learning. For example, the Gibbs algorithm with prior $\pi$ has an excess risk in $O(d_{\pi}/n)$, as opposed to the standard $O(\sqrt{d_{\pi}/n})$, where $n$ denotes the number of observations and $d_{\pi}$ is a complexity parameter which depends on the prior $\pi$. In this paper, we examine the Gibbs algorithm in the context of meta-learning, i.e., when learning the prior $\pi$ from $T$ tasks (with $n$ observations each) generated by a meta distribution. Our main result is that Bernstein's condition always holds at the meta level, regardless of its validity at the observation level. This implies that the additional cost to learn the Gibbs prior $\pi$, which will reduce the term $d_\pi$ across tasks, is in $O(1/T)$, instead of the expected $O(1/\sqrt{T})$. We further illustrate how this result improves on standard rates in three different settings: discrete priors, Gaussian priors and mixture of Gaussians priors.

artificial intelligence, bayesian inference, machine learning, (14 more...)

2302.11709

Country:

North America > United States (0.67)
Asia (0.67)
Europe > United Kingdom > England (0.14)
Europe > Austria > Vienna (0.14)

Genre:

Research Report (0.49)
Overview (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Machine LearningOct-27-2021

User-friendly introduction to PAC-Bayes bounds

Alquier, Pierre

Aggregated predictors are obtained by making a set of basic predictors vote according to some weights, that is, to some probability distribution. Randomized predictors are obtained by sampling in a set of basic predictors, according to some prescribed probability distribution. Thus, aggregated and randomized predictors have in common that they are not defined by a minimization problem, but by a probability distribution on the set of predictors. In statistical learning theory, there is a set of tools designed to understand the generalization ability of such procedures: PAC-Bayesian or PAC-Bayes bounds. Since the original PAC-Bayes bounds of D. McAllester, these tools have been considerably improved in many directions (we will for example describe a simplified version of the localization technique of O. Catoni that was missed by the community, and later rediscovered as "mutual information bounds"). Very recently, PAC-Bayes bounds received a considerable attention: for example there was workshop on PAC-Bayes at NIPS 2017, "(Almost) 50 Shades of Bayesian Learning: PAC-Bayesian trends and insights", organized by B. Guedj, F. Bach and P. Germain. One of the reason of this recent success is the successful application of these bounds to neural networks by G. Dziugaite and D. Roy. An elementary introduction to PAC-Bayes theory is still missing. This is an attempt to provide such an introduction.

artificial intelligence, bayesian inference, machine learning, (18 more...)

2110.11216

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)
Asia > Japan > Honshū (0.14)

Genre:

Instructional Material > Course Syllabus & Notes (0.92)
Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

arXiv.org Machine LearningFeb-17-2021

Deviation inequalities for stochastic approximation by averaging

Fan, Xiequan, Alquier, Pierre, Doukhan, Paul

A large amount of probability inequalities under dependence may be found in the literature, see [13] and more recently [15], [17] as well as in [24], [25], [6], [7], [11], or [12]. Many papers involve inequalities for Markov chains and recent martingale based techniques provide reasonable ones for contractive Markov chains as in [8]; such contractive Markov chains are weakly dependent. The above references mainly correspond to the time homogeneous contractive cases, and we aim at proving results for time non-homogeneous Markov chains. This is the setting of the large class of models introduced in Section 1.1. Different situations of stochastic algorithms [19] and unit roots [20] correspond to such varying contraction coefficients tending either to 0 or to 1 as n . Several relevant models fitting such conditions are considered in Section 1.2.

artificial intelligence, inequality, machine learning, (16 more...)

2102.08685

Country:

Europe > France (0.14)
Asia > China (0.14)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningFeb-4-2021

Meta-strategy for Learning Tuning Parameters with Guarantees

Meunier, Dimitri, Alquier, Pierre

In many applications of modern supervised learning, such as medical imaging or robotics, a large number of tasks is available but many of them are associated with a small amount of data. With few datapoints per task, learning them in isolation would give poor results. In this paper, we consider the problem of learning from a (large) sequence of regression or classification tasks with small sample size. By exploiting their similarities we seek to design algorithms that can utilize previous experience to rapidly learn new skills or adapt to new environments. Inspired by human ingenuity in solving new problems by leveraging prior experience, meta-learning is a subfield of machine learning whose goal is to automatically adapt a learning mechanism from past experiences to rapidly learn new tasks with little available data. Since it "learns the learning mechanism" it is also referred to as learning-to-learn [34].

algorithm, educational setting, isolation, (14 more...)

2102.02504

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.34)

Industry: Education (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.35)

arXiv.org Artificial IntelligenceOct-7-2020

A Theoretical Analysis of Catastrophic Forgetting through the NTK Overlap Matrix

Doan, Thang, Bennani, Mehdi, Mazoure, Bogdan, Rabusseau, Guillaume, Alquier, Pierre

Continual learning (CL) is a setting in which an agent has to learn from an incoming stream of data during its entire lifetime. Although major advances have been made in the field, one recurring problem which remains unsolved is that of Catastrophic Forgetting (CF). While the issue has been extensively studied empirically, little attention has been paid from a theoretical angle. In this paper, we show that the impact of CF increases as two tasks increasingly align. We introduce a measure of task similarity called the NTK overlap matrix which is at the core of CF. We analyze common projected gradient algorithms and demonstrate how they mitigate forgetting. Then, we propose a variant of Orthogonal Gradient Descent (OGD) which leverages structure of the data through Principal Component Analysis (PCA). Experiments support our theoretical findings and show how our method reduces CF on classical CL datasets.

artificial intelligence, catastrophic forgetting, survey article, (15 more...)

2010.04003

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)