generalisation error
Different Statistical Perspectives for Understanding Generalisation in Graph Neural Networks
Ayday, Nil, Sabanayagam, Mahalakshmi, Ghoshdastidar, Debarghya
Graph Neural Networks (GNN) are currently the most popular approach for learning and prediction on graph-structured data and are deployed in various fields, from social network analysis to drug discovery. However, there is limited mathematical understanding of the performance of GNNs. We discuss the various perspectives used to study statistical generalisation in GNNs. We identify three broad frameworks. The first approach, rooted in learning theory, relies on uniform convergence bounds and the complexity of the hypothesis class of specific GNN architectures. This approach also builds on the expressivity of GNNs, typically studied through the lens of graph isomorphism tests. The second principle is to simplify the neural architecture by analysing GNNs under the asymptotics of infinitely many parameters or infinite graph size. This approach approximates GNNs using Gaussian processes, neural tangent kernels or graphon neural network operators, which allow studying the generalisation or stability of trained GNNs. The third framework studies GNNs under random graph models, often the contextual stochastic block model, and derives non-asymptotic error rates using tools from high-dimensional statistics. We highlight some key theoretical results and discuss a few limitations and open research questions for each perspective.
Learning Gaussian Mixtures with Generalised Linear Models: Precise Asymptotics in High-dimensions
Generalised linear models for multi-class classification problems are one of the fundamental building blocks of modern machine learning tasks. In this manuscript, we characterise the learning of a mixture of KGaussians with generic means and covariances via empirical risk minimisation (ERM) with any convex loss and regularisation. In particular, we prove exact asymptotics characterising the ERM estimator in high-dimensions, extending several previous results about Gaussian mixture classification in the literature. We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, where we study the efficiency of `1 penalty with respect to `2; b) max-margin multiclass classification, where we characterise the phase transition on the existence of the multi-class logistic maximum likelihood estimator for K >2. Finally, we discuss how our theory can be applied beyond the scope of synthetic data, showing that in different cases Gaussian mixtures capture closely the learning curve of classification tasks in real data sets.
Locality defeats the curse of dimensionality in convolutional teacher-student scenarios
Convolutional neural networks perform a local and translationally-invariant treatment of the data: quantifying which of these two aspects is central to their success remains a challenge. We study this problem within a teacher-student framework for kernel regression, using'convolutional' kernels inspired by the neural tangent kernel of simple convolutional architectures of given filter size. Using heuristic methods from physics, we find in the ridgeless case that locality is key in determining the learning curve exponent β (that relates the test error t P β to the size of the training set P), whereas translational invariance is not. In particular, if the filter size of the teacher tis smaller than that of the student s, β is a function of s only and does not depend on the input dimension. We confirm our predictions on β empirically. We conclude by proving, under a natural universality assumption, that performing kernel regression with a ridge that decreases with the size of the training set leads to similar learning curve exponents to those we obtain in the ridgeless case.
LearningGaussianMixtureswithGeneralisedLinear Models: PreciseAsymptoticsinHigh-dimensions
We exemplify our result in two tasks of interest in statistical learning: a) classification for a mixture with sparse means, wherewestudytheefficiencyof `1penaltywithrespectto `2;b)max-marginmulticlass classification, where we characterise the phase transition on the existence ofthemulti-class logistic maximum likelihood estimator forK >2.