AITopics | statistical theory

Collaborating Authors

statistical theory

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics

Lin, Licong, Mei, Song

arXiv.org Machine LearningMar-21-2025

Contrastive learning -- a modern approach to extract useful representations from unlabeled data by training models to distinguish similar samples from dissimilar ones -- has driven significant progress in foundation models. In this work, we develop a new theoretical framework for analyzing data augmentation-based contrastive learning, with a focus on SimCLR as a representative example. Our approach is based on the concept of \emph{approximate sufficient statistics}, which we extend beyond its original definition in \cite{oko2025statistical} for contrastive language-image pretraining (CLIP) using KL-divergence. We generalize it to equivalent forms and general f-divergences, and show that minimizing SimCLR and other contrastive losses yields encoders that are approximately sufficient. Furthermore, we demonstrate that these near-sufficient encoders can be effectively adapted to downstream regression and classification tasks, with performance depending on their sufficiency and the error induced by data augmentation in contrastive learning. Concrete examples in linear regression and topic classification are provided to illustrate the broad applicability of our results.

approximate sufficient statistics, artificial intelligence, machine learning, (2 more...)

arXiv.org Machine Learning

2503.17538

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Statistical Theory of Regularization-Based Continual Learning

Zhao, Xuyang, Wang, Huiyuan, Huang, Weiran, Lin, Wei

arXiv.org Machine LearningJun-10-2024

We provide a statistical analysis of regularization-based continual learning on a sequence of linear regression tasks, with emphasis on how different regularization terms affect the model performance. We first derive the convergence rate for the oracle estimator obtained as if all data were available simultaneously. Next, we consider a family of generalized $\ell_2$-regularization algorithms indexed by matrix-valued hyperparameters, which includes the minimum norm estimator and continual ridge regression as special cases. As more tasks are introduced, we derive an iterative update formula for the estimation error of generalized $\ell_2$-regularized estimators, from which we determine the hyperparameters resulting in the optimal algorithm. Interestingly, the choice of hyperparameters can effectively balance the trade-off between forward and backward knowledge transfer and adjust for data heterogeneity. Moreover, the estimation error of the optimal algorithm is derived explicitly, which is of the same order as that of the oracle estimator. In contrast, our lower bounds for the minimum norm estimator and continual ridge regression show their suboptimality. A byproduct of our theoretical analysis is the equivalence between early stopping and generalized $\ell_2$-regularization in continual learning, which may be of independent interest. Finally, we conduct experiments to complement our theory.

continual learning, estimation error, estimator, (15 more...)

arXiv.org Machine Learning

2406.06213

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

Neural Information Processing SystemsApr-6-2023, 18:21:07 GMT

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback(cid:173) Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stop(cid:173) ping, even if we have access to the optimal stopping time. Consider(cid:173) ing cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in or(cid:173) der to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the general(cid:173) ization error.

cid, overtraining, statistical theory

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.67)

Add feedback

Living in the wilderness: hypothesis testing in a world that disagrees with statistical theory

#artificialintelligenceMay-23-2021, 17:10:35 GMT

Sometimes it seems paradoxical to call the famous bell curve "normal". Among all the assumptions made by traditional statistical theory, the normality assumption is notorious for the frequency it doesn't hold. My aim in this article is to show a way to test hypotheses when the normality assumption of traditional hypothesis tests is violated. In this scenario, we can't rely on theoretical results, so we need to depart from theory's ivory tower and double the bet on our data. To get there, first I briefly review what hypothesis testing is, focusing on an intuitive grasp of the reasoning behind it (no equations allowed!). Then I proceed to a case study motivated by a business problem where the normality assumption doesn't hold. This makes matters concrete and will direct our discussion. After the problem is explained, I will show that bootstrapping is a good way to fill the gaps left by theory without changing anything in the reasoning at the heart of hypothesis testing. In particular, I will show that bootstrapping leads to the right conclusion about the test. I conclude this article with a critical evaluation of bootstrapping and similar methods, pointing out their pros and cons. Many data scientists have trouble understanding hypothesis testing.

credit score, hypothesis, test statistic, (17 more...)

#artificialintelligence

Genre: Research Report > Experimental Study (0.47)

Industry: Banking & Finance (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.75)

Add feedback

A statistical theory of cold posteriors in deep neural networks

Aitchison, Laurence

arXiv.org Machine LearningAug-13-2020

To get Bayesian neural networks to perform comparably to standard neural networks it is usually necessary to artificially reduce uncertainty using a "tempered" or "cold" posterior. This is extremely concerning: if the prior is accurate, Bayes inference/decision theory is optimal, and any artificial changes to the posterior should harm performance. While this suggests that the prior may be at fault, here we argue that in fact, BNNs for image classification use the wrong likelihood. In particular, standard image benchmark datasets such as CIFAR-10 are carefully curated. We develop a generative model describing curation which gives a principled Bayesian account of cold posteriors, because the likelihood under this new generative model closely matches the tempered likelihoods used in past work.

cold posterior, neural network, posterior, (15 more...)

arXiv.org Machine Learning

2008.05912

Country: Europe > United Kingdom > England > Bristol (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Add feedback

A statistical theory of semi-supervised learning

Aitchison, Laurence

arXiv.org Machine LearningAug-13-2020

We currently lack a solid statistical understanding of semi-supervised learning methods, instead treating them as a collection of highly effective tricks. This precludes the principled combination e.g. of Bayesian methods and semi-supervised learning, as semi-supervised learning objectives are not currently formulated as likelihoods for an underlying generative model of the data. Here, we note that standard image benchmark datasets such as CIFAR-10 are carefully curated, and we provide a generative model describing the curation process. Under this generative model, several state-of-the-art semi-supervised learning techniques, including entropy minimization, pseudo-labelling and the FixMatch family emerge naturally as variational lower-bounds on the log-likelihood.

artificial intelligence, machine learning, semi-supervised learning, (19 more...)

arXiv.org Machine Learning

2008.05913

Country: Europe > United Kingdom > England > Bristol (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Challenge of the week - Continued fractions for predictive modeling

@machinelearnbotApr-3-2016, 05:50:53 GMT

Continued fractions is a fascinating subject, see picture below. They are extremely stable from a numerical point of view, have tons of useful properties, and are thus more robust against over-fitting, compared with standard linear regression. They have been extensively studied in the context of approximation and numerical analysis. Why is there no statistical theory of continued fractions? Why aren't these beautiful and powerful mathematical objects not used in data science?

artificial intelligence, fraction, machine learning, (3 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Foundations of Statistical Theory Being Questioned

@machinelearnbotMar-28-2016, 18:05:49 GMT

Over the same period, but especially since the 1990s, there has been an increasing disconnect between the traditional Fisher-Neyman-Pearson (FNP) math statistics course and the demands for complex analysis in many application areas. The failure of classical maximum likelihood methods to deal effectively with complex models and the success of MCMC-based methods has led to a similar situation: The undergraduate FNP course does not prepare students for these models, and Bayesian MCMC retraining courses are needed to prepare graduates for these applications.

artificial intelligence, machine learning, statistical theory, (1 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

Amari, Shun-ichi, Murata, Noboru, Müller, Klaus-Robert, Finke, Michael, Yang, Howard Hua

Neural Information Processing SystemsDec-31-1996

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

early stopping, generalization error, stopping, (17 more...)

Neural Information Processing Systems

Country: