AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

A note on the unique properties of the Kullback--Leibler divergence for sampling via gradient flows

arXiv.org Artificial IntelligenceJul-8-2025

Sampling from a target probability distribution whose density is known up to a normalisation constant is a fundamental task in computational statistics and machine learning. A natural way to formulate this task is optimisation of a functional measuring the dissimilarity to the target probability distribution. Following this point of view, one can derive many popular sampling frameworks including variational inference [Blei et al., 2017], algorithms based on diffusions [Roberts and Tweedie, 1996, Durmus et al., 2019] and deterministic flows [Liu, 2017], and algorithms based on importance sampling [Chopin et al., 2024, Crucinio and Pathiraja, 2025]. The connection between minimisation of a divergence and Monte Carlo algorithms is established through gradient flows over the space of probability measures (see, e.g., Chewi et al. [2025], Carrillo et al. [2024] for a recent review); with different metrics over this space leading to different differential equations whose discretisations correspond to many popular Monte Carlo algorithms. The most widely used divergence is the reverse Kullback-Leibler (KL) divergence whose gradient flow w.r.t. the Wasserstein-2 metric can be implemented by a Langevin diffusion [Jordan et al., 1998] and easily discretised in time, resulting in the Unadjusted Langevin algorithm [Roberts and Tweedie, 1996].

artificial intelligence, divergence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2507.0433

Country:

Asia > Middle East > Jordan (0.25)
Europe > Italy (0.15)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback

Duality and Policy Evaluation in Distributionally Robust Bayesian Diffusion Control

Blanchet, Jose, Cheng, Jiayi, Liu, Hao, Liu, Yang

arXiv.org Machine LearningJul-2-2025

We consider a Bayesian diffusion control problem of expected terminal utility maximization. The controller imposes a prior distribution on the unknown drift of an underlying diffusion. The Bayesian optimal control, tracking the posterior distribution of the unknown drift, can be characterized explicitly. However, in practice, the prior will generally be incorrectly specified, and the degree of model misspecification can have a significant impact on policy performance. To mitigate this and reduce overpessimism, we introduce a distributionally robust Bayesian control (DRBC) formulation in which the controller plays a game against an adversary who selects a prior in divergence neighborhood of a baseline prior. The adversarial approach has been studied in economics and efficient algorithms have been proposed in static optimization settings. We develop a strong duality result for our DRBC formulation. Combining these results together with tools from stochastic analysis, we are able to derive a loss that can be efficiently trained (as we demonstrate in our numerical experiments) using a suitable neural network architecture. As a result, we obtain an effective algorithm for computing the DRBC optimal strategy. The methodology for computing the DRBC optimal strategy is greatly simplified, as we show, in the important case in which the adversary chooses a prior from a Kullback-Leibler distributional uncertainty set.

artificial intelligence, machine learning, nullnull, (19 more...)

arXiv.org Machine Learning

2506.19294

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
(8 more...)

Genre:

Research Report (1.00)
Overview (0.67)

Industry: Banking & Finance > Trading (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

On Universality of Non-Separable Approximate Message Passing Algorithms

Lovig, Max, Wang, Tianhao, Fan, Zhou

arXiv.org Artificial IntelligenceJul-1-2025

Mean-field characterizations of first-order iterative algorithms -- including Approximate Message Passing (AMP), stochastic and proximal gradient descent, and Langevin diffusions -- have enabled a precise understanding of learning dynamics in many statistical applications. For algorithms whose non-linearities have a coordinate-separable form, it is known that such characterizations enjoy a degree of universality with respect to the underlying data distribution. However, mean-field characterizations of non-separable algorithm dynamics have largely remained restricted to i.i.d. Gaussian or rotationally-invariant data. In this work, we initiate a study of universality for non-separable AMP algorithms. We identify a general condition for AMP with polynomial non-linearities, in terms of a Bounded Composition Property (BCP) for their representing tensors, to admit a state evolution that holds universally for matrices with non-Gaussian entries. We then formalize a condition of BCP-approximability for Lipschitz AMP algorithms to enjoy a similar universal guarantee. We demonstrate that many common classes of non-separable non-linearities are BCP-approximable, including local denoisers, spectral denoisers for generic signals, and compositions of separable functions with generic linear maps, implying the universality of state evolution for AMP algorithms employing these non-linearities.

artificial intelligence, machine learning, universality, (15 more...)

arXiv.org Artificial Intelligence

2506.2301

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra

Amin, Alan N., Potapczynski, Andres, Wilson, Andrew Gordon

arXiv.org Artificial IntelligenceJul-1-2025

To understand how genetic variants in human genomes manifest in phenotypes -- traits like height or diseases like asthma -- geneticists have sequenced and measured hundreds of thousands of individuals. Geneticists use this data to build models that predict how a genetic variant impacts phenotype given genomic features of the variant, like DNA accessibility or the presence of nearby DNA-bound proteins. As more data and features become available, one might expect predictive models to improve. Unfortunately, training these models is bottlenecked by the need to solve expensive linear algebra problems because variants in the genome are correlated with nearby variants, requiring inversion of large matrices. Previous methods have therefore been restricted to fitting small models, and fitting simplified summary statistics, rather than the full likelihood of the statistical model. In this paper, we leverage modern fast linear algebra techniques to develop DeepWAS (Deep genome Wide Association Studies), a method to train large and flexible neural network predictive models to optimize likelihood. Notably, we find that larger models only improve performance when using our full likelihood approach; when trained by fitting traditional summary statistics, larger models perform no better than small ones. We find larger models trained on more features make better predictions, potentially improving disease predictions and therapeutic target identification.

artificial intelligence, chip-seq, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.19598

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

Tyurin, Alexander

arXiv.org Artificial IntelligenceJul-1-2025

We consider centralized distributed optimization in the classical federated learning setup, where $n$ workers jointly find an $\varepsilon$-stationary point of an $L$-smooth, $d$-dimensional nonconvex function $f$, having access only to unbiased stochastic gradients with variance $σ^2$. Each worker requires at most $h$ seconds to compute a stochastic gradient, and the communication times from the server to the workers and from the workers to the server are $τ_{s}$ and $τ_{w}$ seconds per coordinate, respectively. One of the main motivations for distributed optimization is to achieve scalability with respect to $n$. For instance, it is well known that the distributed version of SGD has a variance-dependent runtime term $\frac{h σ^2 L Δ}{n \varepsilon^2},$ which improves with the number of workers $n,$ where $Δ= f(x^0) - f^*,$ and $x^0 \in R^d$ is the starting point. Similarly, using unbiased sparsification compressors, it is possible to reduce both the variance-dependent runtime term and the communication runtime term. However, once we account for the communication from the server to the workers $τ_{s}$, we prove that it becomes infeasible to design a method using unbiased random sparsification compressors that scales both the server-side communication runtime term $τ_{s} d \frac{L Δ}{\varepsilon}$ and the variance-dependent runtime term $\frac{h σ^2 L Δ}{\varepsilon^2},$ better than poly-logarithmically in $n$, even in the homogeneous (i.i.d.) case, where all workers access the same distribution. To establish this result, we construct a new "worst-case" function and develop a new lower bound framework that reduces the analysis to the concentration of a random sum, for which we prove a concentration bound. These results reveal fundamental limitations in scaling distributed optimization, even under the homogeneous assumption.

artificial intelligence, machine learning, server, (15 more...)

arXiv.org Artificial Intelligence

2506.23836

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.55)

Add feedback

Design of A* based heuristic algorithm for efficient interdiction in multi-Layer networks

Samanta, Sukanya

arXiv.org Artificial IntelligenceJun-30-2025

Intercepting a criminal using limited police resources presents a significant challenge in dynamic crime environments, where the criminal's location continuously changes over time. The complexity is further heightened by the vastness of the transportation network. To tackle this problem, we propose a layered graph representation, in which each time step is associated with a duplicate of the transportation network. For any given set of attacker strategies, a near-optimal defender strategy is computed using the A-Star heuristic algorithm applied to the layered graph. The defender's goal is to maximize the probability of successful interdiction. We evaluate the performance of the proposed method by comparing it with a Mixed-Integer Linear Programming (MILP) approach used for the defender. The comparison considers both computational efficiency and solution quality. The results demonstrate that our approach effectively addresses the complexity of the problem and delivers high-quality solutions within a short computation time.

artificial intelligence, defender, node, (16 more...)

arXiv.org Artificial Intelligence

2506.10017

Country: Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)

Genre: Research Report (0.70)

Industry:

Information Technology (0.68)
Transportation > Infrastructure & Services (0.57)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.47)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)

Add feedback

Scalable Machine Learning Algorithms using Path Signatures

Tóth, Csaba

arXiv.org Machine LearningJun-26-2025

The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures - iterated integrals that provide faithful, hierarchical representations of paths - offering a principled and universal feature map for sequential and structured data. Rooted in rough path theory, path signatures are invariant to reparameterization and well-suited for modelling evolving dynamics, long-range dependencies, and irregular sampling - common challenges in real-world time series and graph data. This thesis investigates how to harness the expressive power of path signatures within scalable machine learning pipelines. It introduces a suite of models that combine theoretical robustness with computational efficiency, bridging rough path theory with probabilistic modelling, deep learning, and kernel methods. Key contributions include: Gaussian processes with signature kernel-based covariance functions for uncertainty-aware time series modelling; the Seq2Tens framework, which employs low-rank tensor structure in the weight space for scalable deep modelling of long-range dependencies; and graph-based models where expected signatures over graphs induce hypo-elliptic diffusion processes, offering expressive yet tractable alternatives to standard graph neural networks. Further developments include Random Fourier Signature Features, a scalable kernel approximation with theoretical guarantees, and Recurrent Sparse Spectrum Signature Gaussian Processes, which combine Gaussian processes, signature kernels, and random features with a principled forgetting mechanism for multi-horizon time series forecasting with adaptive context length. We hope this thesis serves as both a methodological toolkit and a conceptual bridge, and provides a useful reference for the current state of the art in scalable, signature-based learning for sequential and structured data.

artificial intelligence, bayesian inference, deep learning, (13 more...)

arXiv.org Machine Learning

2506.17634

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.13)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
(11 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)
Research Report > New Finding (0.45)

Industry:

Information Technology (1.00)
Energy (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Convergent Methods for Koopman Operators on Reproducing Kernel Hilbert Spaces

Boullé, Nicolas, Colbrook, Matthew J., Conradie, Gustav

arXiv.org Machine LearningJun-23-2025

Data-driven spectral analysis of Koopman operators is a powerful tool for understanding numerous real-world dynamical systems, from neuronal activity to variations in sea surface temperature. The Koopman operator acts on a function space and is most commonly studied on the space of square-integrable functions. However, defining it on a suitable reproducing kernel Hilbert space (RKHS) offers numerous practical advantages, including pointwise predictions with error bounds, improved spectral properties that facilitate computations, and more efficient algorithms, particularly in high dimensions. We introduce the first general, provably convergent, data-driven algorithms for computing spectral properties of Koopman and Perron--Frobenius operators on RKHSs. These methods efficiently compute spectra and pseudospectra with error control and spectral measures while exploiting the RKHS structure to avoid the large-data limits required in the $L^2$ settings. The function space is determined by a user-specified kernel, eliminating the need for quadrature-based sampling as in $L^2$ and enabling greater flexibility with finite, externally provided datasets. Using the Solvability Complexity Index hierarchy, we construct adversarial dynamical systems for these problems to show that no algorithm can succeed in fewer limits, thereby proving the optimality of our algorithms. Notably, this impossibility extends to randomized algorithms and datasets. We demonstrate the effectiveness of our algorithms on challenging, high-dimensional datasets arising from real-world measurements and high-fidelity numerical simulations, including turbulent channel flow, molecular dynamics of a binding protein, Antarctic sea ice concentration, and Northern Hemisphere sea surface height. The algorithms are publicly available in the software package $\texttt{SpecRKHS}$.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

arXiv.org Machine Learning

2506.15782

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Malaysia (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Verifiable Safety Q-Filters via Hamilton-Jacobi Reachability and Multiplicative Q-Networks

Li, Jiaxing, Hu, Hanjiang, Yang, Yujie, Liu, Changliu

arXiv.org Artificial IntelligenceJun-23-2025

-- Recent learning-based safety filters have outperformed conventional methods, such as hand-crafted Control Barrier Functions (CBFs), by effectively adapting to complex constraints. However, these learning-based approaches lack formal safety guarantees. In this work, we introduce a verifiable model-free safety filter based on Hamilton-Jacobi reachability analysis. Our primary contributions include: 1) extending verifiable self-consistency properties for Q value functions, 2) proposing a multiplicative Q-network structure to mitigate zero-sublevel-set shrinkage issues, and 3) developing a verification pipeline capable of soundly verifying these self-consistency properties. Our proposed approach successfully synthesizes formally verified, model-free safety certificates across four standard safe-control benchmarks.

artificial intelligence, machine learning, model-free safety filter, (15 more...)

arXiv.org Artificial Intelligence

2506.15693

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Identifiability Challenges in Sparse Linear Ordinary Differential Equations

Casolo, Cecilia, Becker, Sören, Kilbertus, Niki

arXiv.org Artificial IntelligenceJun-13-2025

Dynamical systems modeling is a core pillar of scientific inquiry across natural and life sciences. Increasingly, dynamical system models are learned from data, rendering identifiability a paramount concept. For systems that are not identifiable from data, no guarantees can be given about their behavior under new conditions and inputs, or about possible control mechanisms to steer the system. It is known in the community that "linear ordinary differential equations (ODE) are almost surely identifiable from a single trajectory." However, this only holds for dense matrices. The sparse regime remains underexplored, despite its practical relevance with sparsity arising naturally in many biological, social, and physical systems. In this work, we address this gap by characterizing the identifiability of sparse linear ODEs. Contrary to the dense case, we show that sparse systems are unidentifiable with a positive probability in practically relevant sparsity regimes and provide lower bounds for this probability. We further study empirically how this theoretical unidentifiability manifests in state-of-the-art methods to estimate linear ODEs from data. Our results corroborate that sparse systems are also practically unidentifiable. Theoretical limitations are not resolved through inductive biases or optimization dynamics. Our findings call for rethinking what can be expected from data-driven dynamical system modeling and allows for quantitative assessments of how much to trust a learned linear ODE.

artificial intelligence, machine learning, trajectory, (19 more...)

arXiv.org Artificial Intelligence

2506.09816

Country:

Europe (0.46)
North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.94)
Health & Medicine > Epidemiology (0.93)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)
(2 more...)

Add feedback