AITopics | Statistical Learning

We consider the problem of embedding a dynamic network, to obtain time-evolving vector representations of each node, which can then be used to describe changes in behaviour of individual nodes, communities, or the entire graph. Given this open-ended remit, we argue that two types of stability in the spatio-temporal positioning of nodes are desirable: to assign the same position, up to noise, to nodes behaving similarly at a given time (cross-sectional stability) and a constant position, up to noise, to a single node behaving similarly across different times (longitudinal stability). Similarity in behaviour is defined formally using notions of exchangeability under a dynamic latent position network model. By showing how this model can be recast as a multilayer random dot product graph, we demonstrate that unfolded adjacency spectral embedding satisfies both stability conditions. We also show how two alternative methods, omnibus and independent spectral embedding, alternately lack one or the other form of stability.

artificial intelligence, data mining, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Supplementary information for Learning Gaussian Mixtures with Generalised Linear Models Precise Asymptotics in High dimensions

Neural Information Processing SystemsApr-25-2026, 22:47:25 GMT

This appendix presents the proof of the main technical result, Theorem 1. Throughout the whole proof, we assume that the set of conditions from Sec. 2 is verified. A.1 Required background In this Section, we give an overview of the main concepts and tools on approximate message passing algorithms which will be required for the proof. We start with some definitions that commonly appear in the approximate message-passing literature, see e.g. The main regularity class of functions we will use is that of pseudo-Lipschitz functions, which roughly amounts to functions with polynomially bounded first derivatives. We include the required scaling w.r.t. the dimensions in the definition for convenience. Since K will be kept finite, it can be absorbed in any of the constants. For example, the function f: Rn R,x7 1nkxk22 is pseudo-Lipshitz of order 2. Moreau envelopes and Bregman proximal operators -- In our proof, we will also frequently use the notions of Moreau envelopes and proximal operators, see e.g.

artificial intelligence, equation, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions

Neural Information Processing SystemsApr-25-2026, 22:47:21 GMT

Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of KGaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM estimator in high-dimensions, extending several previous results about Gaussian mixture classification in the literature. We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of `1 penalty with respect to `2; b) max-margin multiclass classification, where we characterise the phase transition on the existence of the multi-class logistic maximum likelihood estimator for K >2. Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets.

artificial intelligence, classification, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

What Knowledge Gets Distilled in Knowledge Distillation? Utkarsh Ojha Yuheng Li Anirudh Sundara Rajan Yingyu Liang Yong Jae Lee University of Wisconsin-Madison

Neural Information Processing SystemsApr-25-2026, 22:47:11 GMT

Knowledge distillation aims to transfer useful information from a teacher network to a student network, with the primary goal of improving the student's performance for the task at hand. Over the years, there has a been a deluge of novel techniques and use cases of knowledge distillation. Yet, despite the various improvements, there seems to be a glaring gap in the community's fundamental understanding of the process. Specifically, what is the knowledge that gets distilled in knowledge distillation? In other words, in what ways does the student become similar to the teacher?

artificial intelligence, machine learning, student, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.40)

Genre: