AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

What functions can Graph Neural Networks compute on random graphs? The role of Positional Encoding

Neural Information Processing SystemsOct-10-2024, 14:10:44 GMT

We aim to deepen the theoretical understanding of Graph Neural Networks (GNNs) on large graphs, with a focus on their expressive power.Existing analyses relate this notion to the graph isomorphism problem, which is mostly relevant for graphs of small sizes, or studied graph classification or regression tasks, while prediction tasks on \emph{nodes} are far more relevant on large graphs. Recently, several works showed that, on very general random graphs models, GNNs converge to certains functions as the number of nodes grows.In this paper, we provide a more complete and intuitive description of the function space generated by equivariant GNNs for node-tasks, through general notions of convergence that encompass several previous examples. We emphasize the role of input node features, and study the impact of \emph{node Positional Encodings} (PEs), a recent line of work that has been shown to yield state-of-the-art results in practice. Through the study of several examples of PEs on large random graphs, we extend previously known universality results to significantly more general models. Our theoretical results hint at some normalization tricks, which is shown numerically to have a positive impact on GNN generalization on synthetic and real data.

graph neural network compute, positional encoding, random graph, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.64)

Add feedback

Bayesian Spline Learning for Equation Discovery of Nonlinear Dynamics with Quantified Uncertainty

Neural Information Processing SystemsOct-10-2024, 12:11:21 GMT

Nonlinear dynamics are ubiquitous in science and engineering applications, but the physics of most complex systems is far from being fully understood. Discovering interpretable governing equations from measurement data can help us understand and predict the behavior of complex dynamic systems. Although extensive work has recently been done in this field, robustly distilling explicit model forms from very sparse data with considerable noise remains intractable. Moreover, quantifying and propagating the uncertainty of the identified system from noisy data is challenging, and relevant literature is still limited. To bridge this gap, we develop a novel Bayesian spline learning framework to identify parsimonious governing equations of nonlinear (spatio)temporal dynamics from sparse, noisy data with quantified uncertainty.

bayesian spline learning, equation discovery, quantified uncertainty, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.64)

Add feedback

Dynamical mean-field theory for stochastic gradient descent in Gaussian mixture classification

Neural Information Processing SystemsOct-10-2024, 11:22:21 GMT

We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for a single layer neural network classifying a high-dimensional Gaussian mixture where each cluster is assigned one of two labels. This problem provides a prototype of a non-convex loss landscape with interpolating regimes and a large generalization gap. We define a particular stochastic process for which SGD can be extended to a continuous-time limit that we call stochastic gradient flow. In the full-batch limit we recover the standard gradient flow. We apply dynamical mean-field theory from statistical physics to track the dynamics of the algorithm in the high-dimensional limit via a self-consistent stochastic process.

dynamical mean-field theory, gaussian mixture classification, stochastic gradient descent, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Impossibility Results for Grammar-Compressed Linear Algebra

Neural Information Processing SystemsOct-10-2024, 09:15:05 GMT

To handle vast amounts of data, it is natural and popular to compress vectors and matrices. When we compress a vector from size N down to size n N, it certainly makes it easier to store and transmit efficiently, but does it also make it easier to process? In this paper we consider lossless compression schemes, and ask if we can run our computations on the compressed data as efficiently as if the original data was that small. That is, if an operation has time complexity T(input-size), can we perform it on the compressed representation in time T(n) rather than T(N)? We consider the most basic linear algebra operations: inner product, matrix-vector multiplication, and matrix multiplication.

artificial intelligence, grammar-compressed linear algebra, impossibility result, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.97)

Add feedback

Faster Randomized Infeasible Interior Point Methods for Tall/Wide Linear Programs

Neural Information Processing SystemsOct-10-2024, 09:12:57 GMT

Linear programming (LP) is used in many machine learning applications, such as \ell_1 -regularized SVMs, basis pursuit, nonnegative matrix factorization, etc. Interior Point Methods (IPMs) are one of the most popular methods to solve LPs both in theory and in practice. Their underlying complexity is dominated by the cost of solving a system of linear equations at each iteration. In this paper, we consider \emph{infeasible} IPMs for the special case where the number of variables is much larger than the number of constraints (i.e., wide), or vice-versa (i.e., tall) by taking the dual. Using tools from Randomized Linear Algebra, we present a preconditioning technique that, when combined with the Conjugate Gradient iterative solver, provably guarantees that infeasible IPM algorithms (suitably modified to account for the error incurred by the approximate solver), converge to a feasible, approximately optimal solution, without increasing their iteration complexity. Our empirical evaluations verify our theoretical results on both real and synthetic data.

complexity, randomized infeasible interior point method, tall wide linear program, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Add feedback

Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

Neural Information Processing SystemsOct-10-2024, 06:45:01 GMT

A fundamental question in neuroscience is how the brain creates an internal model of the world to guide actions using sequences of ambiguous sensory information. This is naturally formulated as a reinforcement learning problem under partial observations, where an agent must estimate relevant latent variables in the world from its evidence, anticipate possible future states, and choose actions that optimize total expected reward. This problem can be solved by control theory, which allows us to find the optimal actions for a given system dynamics and objective function. However, animals often appear to behave suboptimally. We hypothesize that animals have their own flawed internal model of the world, and choose actions with the highest expected subjective reward according to that flawed model.

agent, inverse rational control, observable continuous nonlinear dynamic, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.37)

Add feedback

Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows

Neural Information Processing SystemsOct-10-2024, 06:43:07 GMT

Normalizing flows transform a simple base distribution into a complex target distribution and have proved to be powerful models for data generation and density estimation. In this work, we propose a novel type of normalizing flow driven by a differential deformation of the continuous-time Wiener process. As a result, we obtain a rich time series model whose observable process inherits many of the appealing properties of its base process, such as efficient computation of likelihoods and marginals. Furthermore, our continuous treatment provides a natural framework for irregular time series with an independent arrival process, including straightforward interpolation. We illustrate the desirable properties of the proposed model on popular stochastic processes and demonstrate its superior flexibility to variational RNN and latent ODE baselines in a series of experiments on synthetic and real-world data.

dynamic normalizing flow, modeling continuous stochastic process

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Add feedback

Minimal Variance Sampling in Stochastic Gradient Boosting

Neural Information Processing SystemsOct-10-2024, 02:51:08 GMT

Stochastic Gradient Boosting (SGB) is a widely used approach to regularization of boosting models based on decision trees. It was shown that, in many cases, random sampling at each iteration can lead to better generalization performance of the model and can also decrease the learning time. Different sampling approaches were proposed, where probabilities are not uniform, and it is not currently clear which approach is the most effective. In this paper, we formulate the problem of randomization in SGB in terms of optimization of sampling probabilities to maximize the estimation accuracy of split scoring used to train decision trees.This optimization problem has a closed-form nearly optimal solution, and it leads to a new sampling technique, which we call Minimal Variance Sampling (MVS).The method both decreases the number of examples needed for each iteration of boosting and increases the quality of the model significantly as compared to the state-of-the art sampling methods. The superiority of the algorithm was confirmed by introducing MVS as a new default option for subsampling in CatBoost, a gradient boosting library achieving state-of-the-art quality on various machine learning tasks.

decision tree, minimal variance sampling, probability, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.63)

Add feedback

On the Universality of Graph Neural Networks on Large Random Graphs

Neural Information Processing SystemsOct-10-2024, 01:34:43 GMT

We study the approximation power of Graph Neural Networks (GNNs) on latent position random graphs. In the large graph limit, GNNs are known to converge to certain continuous'' models known as c-GNNs, which directly enables a study of their approximation power on random graph models. In the absence of input node features however, just as GNNs are limited by the Weisfeiler-Lehman isomorphism test, c-GNNs will be severely limited on simple random graph models. For instance, they will fail to distinguish the communities of a well-separated Stochastic Block Model (SBM) with constant degree function. Thus, we consider recently proposed architectures that augment GNNs with unique node identifiers, referred to as Structural GNNs here (SGNNs).

graph neural network, random graph model, universality, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

Quasi-Newton Methods for Saddle Point Problems

Neural Information Processing SystemsOct-10-2024, 01:33:48 GMT

We propose random Broyden family updates, which have explicit local superlinear convergence rate of {\mathcal O}\big(\big(1-1/(d\varkappa 2)\big) {k(k-1)/2}\big), where d is the dimension of the problem, \varkappa is the condition number and k is the number of iterations. The design and analysis of proposed algorithm are based on estimating the square of indefinite Hessian matrix, which is different from classical quasi-Newton methods in convex optimization. We also present two specific Broyden family algorithms with BFGS-type and SR1-type updates, which enjoy the faster local convergence rate of \mathcal O\big(\big(1-1/d\big) {k(k-1)/2}\big) . Our numerical experiments show proposed algorithms outperform classical first-order methods.

convergence rate, quasi-newton method, saddle point problem, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.73)

Add feedback