AITopics | statistical learning

This manuscript contains preprint of a chapter under consideration for inclusion in the forthcoming third edition of {\em Cover and Thomas's Elements of Information Theory}, posted with permission from Wiley. The table of contents EIT-3 ToC of the new edition can be found at: https://docs.google.com/document/d/1L-m4oQEJw1PJhoxBeMwrrBD8S_HmvzMEkPbYvS24980/edit?usp=sharing . For feedback, please contact abbas@ee.stanford.edu Learning and information theory intersect in both model training and the characterization of fundamental performance limits. This manuscript provides a concise and accessible treatment of the first intersection, requiring only basic background in information theory and statistics at the senior undergraduate or first-year graduate level. End-of-chapter exercises make the material well suited for classroom use as well as self-study. The chapter focuses on the role of divergence measures in model training, with examples ranging from linear and logistic regression to autoregressive models, variational autoencoders, diffusion models, generative adversarial networks, and score-based models. It introduces the evidence lower bound (ELBO), $f$\!-divergences, and the Fisher divergence. In particular, the treatment of the generative diffusion model provides a more systematic and explicit derivation than is typical in the literature.

artificial intelligence, fdata, machine learning, (17 more...)

arXiv.org Machine Learning

2605.02989

Country:

Europe (0.46)
North America > United States > California > Santa Clara County > Palo Alto (0.24)

Genre:

Research Report > New Finding (0.34)
Research Report > Experimental Study (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

ed7b8e1312f6ba8af6e4316dcd28bb3d-Paper-Conference.pdf

Neural Information Processing SystemsApr-28-2026, 07:41:26 GMT

artificial intelligence, machine learning, regularization, (17 more...)

Neural Information Processing Systems

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.71)

Add feedback

From Stochastic Mixability to Fast Rates

Nishant A. Mehta, Robert C. Williamson

Neural Information Processing SystemsFeb-18-2026, 20:37:12 GMT

We also show that when stochastic mixability does not hold in a certain sense (described in Section 5), then the risk minimizer is not unique in a bad way.

artificial intelligence, machine learning, stochastic mixability, (16 more...)

Neural Information Processing Systems

Country: Oceania > Australia > Australian Capital Territory > Canberra (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

Minimax statistical learning with Wasserstein distances

Neural Information Processing SystemsFeb-15-2026, 03:24:57 GMT

Recently, however, an alternative viewpoint has emerged, inspired by ideas from robust statistics and robust stochastic optimization.

artificial intelligence, hypothesis, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Champaign County > Urbana (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

ed7b8e1312f6ba8af6e4316dcd28bb3d-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 17:22:34 GMT

Wefindalarge range of behavior that can be precisely characterized by a new measure ofconfounding strength.

artificial intelligence, machine learning, regularization, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

PAC-Bayes Un-Expected Bernstein Inequality

Zakaria Mhammedi, Peter Grünwald, Benjamin Guedj

Neural Information Processing SystemsFeb-11-2026, 23:38:33 GMT

Ournew bound consistently beats state-of-the-art bounds both on a toy example and on UCI datasets (with large enoughn).

artificial intelligence, machine learning, pac-bayes un-expectedbernsteininequality, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

d02e9bdc27a894e882fa0c9055c99722-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 11:33:42 GMT

concentration inequality, random variable, risk measure, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Nevada (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Neural Information Processing SystemsDec-26-2025, 13:46:19 GMT

Conventional statistical wisdom established a well-understood relationship between model complexity and prediction error, typically presented as a _U-shaped curve_ reflecting a transition between under-and overfitting regimes. However, motivated by the success of overparametrized neural networks, recent influential work has suggested this theory to be generally incomplete, introducing an additional regime that exhibits a second descent in test error as the parameter count $p$ grows past sample size $n$ -- a phenomenon dubbed _double descent_. While most attention has naturally been given to the deep-learning setting, double descent was shown to emerge more generally across non-neural models: known cases include _linear regression, trees, and boosting_. In this work, we take a closer look at the evidence surrounding these more classical statistical machine learning methods and challenge the claim that observed cases of double descent truly extend the limits of a traditional U-shaped complexity-generalization curve therein. We show that once careful consideration is given to _what is being plotted_ on the x-axes of their double descent plots, it becomes apparent that there are implicitly multiple, distinct complexity axes along which the parameter count grows. We demonstrate that the second descent appears exactly (and _only_) when and where the transition between these underlying axes occurs, and that its location is thus _not_ inherently tied to the interpolation threshold $p=n$. We then gain further insight by adopting a classical nonparametric statistics perspective. We interpret the investigated methods as _smoothers_ and propose a generalized measure for the _effective_ number of parameters they use _on unseen examples_, using which we find that their apparent double descent curves do indeed fold back into more traditional convex shapes -- providing a resolution to the ostensible tension between double descent and traditional statistical intuition.

double descent, name change, rethinking parameter, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.38)

Add feedback

Interpolation and Regularization for Causal Learning

Neural Information Processing SystemsDec-25-2025, 16:01:00 GMT

Recent work shows that in complex model classes, interpolators can achieve statistical generalization and even be optimal for statistical learning. However, despite increasing interest in learning models with good causal properties, there is no understanding of whether such interpolators can also achieve . To address this gap, we study causal learning from observational data through the lens of interpolation and its counterpart---regularization. Under a simple linear causal model, we derive precise asymptotics for the causal risk of the min-norm interpolator and ridge regressors in the high-dimensional regime. We find a large range of behavior that can be precisely characterized by a new measure of . When confounding strength is positive, which holds under independent causal mechanisms---a standard assumption in causal learning---we find that interpolators cannot be optimal. Indeed, causal learning requires stronger regularization than statistical learning. Beyond this assumption, when confounding is negative, we observe a phenomenon of self-induced regularization due to positive alignment between statistical and causal signals. Here, causal learning requires weaker regularization than statistical learning, interpolators can be optimal, and optimal regularization can even be negative.

interpolation and regularization, name change, statistical learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.87)

Add feedback

Risk Monotonicity in Statistical Learning

Neural Information Processing SystemsDec-24-2025, 04:01:43 GMT

Acquisition of data is a difficult task in many applications of machine learning, and it is only natural that one hopes and expects the population risk to decrease (better performance) monotonically with increasing data points. It turns out, somewhat surprisingly, that this is not the case even for the most standard algorithms that minimize the empirical risk. Non-monotonic behavior of the risk and instability in training have manifested and appeared in the popular deep learning paradigm under the description of double descent.

name change, risk monotonicity, statistical learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

statistical learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Information Theory and Statistical Learning

ed7b8e1312f6ba8af6e4316dcd28bb3d-Paper-Conference.pdf

From Stochastic Mixability to Fast Rates

Minimax statistical learning with Wasserstein distances

ed7b8e1312f6ba8af6e4316dcd28bb3d-Paper-Conference.pdf

PAC-Bayes Un-Expected Bernstein Inequality

d02e9bdc27a894e882fa0c9055c99722-Paper.pdf

A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning

Interpolation and Regularization for Causal Learning

Risk Monotonicity in Statistical Learning