Goto

Collaborating Authors

 Mathematical & Statistical Methods


Stochastic Alignments: Matching an Observed Trace to Stochastic Process Models

arXiv.org Artificial Intelligence

Process mining leverages event data extracted from IT systems to generate insights into the business processes of organizations. Such insights benefit from explicitly considering the frequency of behavior in business processes, which is captured by stochastic process models. Given an observed trace and a stochastic process model, conventional alignment-based conformance checking techniques face a fundamental limitation: They prioritize matching the trace to a model path with minimal deviations, which may, however, lead to selecting an unlikely path. In this paper, we study the problem of matching an observed trace to a stochastic process model by identifying a likely model path with a low edit distance to the trace. We phrase this as an optimization problem and develop a heuristic-guided path-finding algorithm to solve it. Our open-source implementation demonstrates the feasibility of the approach and shows that it can provide new, useful diagnostic insights for analysts.


Combining Graph Neural Networks and Mixed Integer Linear Programming for Molecular Inference under the Two-Layered Model

arXiv.org Artificial Intelligence

Recently, a novel two-phase framework named mol-infer for inference of chemical compounds with prescribed abstract structures and desired property values has been proposed. The framework mol-infer is primarily based on using mixed integer linear programming (MILP) to simulate the computational process of machine learning methods and describe the necessary and sufficient conditions to ensure such a chemical graph exists. The existing approaches usually first convert the chemical compounds into handcrafted feature vectors to construct prediction functions, but because of the limit on the kinds of descriptors originated from the need for tractability in the MILP formulation, the learning performances on datasets of some properties are not good enough. A lack of good learning performance can greatly lower the quality of the inferred chemical graphs, and thus improving learning performance is of great importance. On the other hand, graph neural networks (GNN) offer a promising machine learning method to directly utilize the chemical graphs as the input, and many existing GNN-based approaches to the molecular property prediction problem have shown that they can enjoy better learning performances compared to the traditional approaches that are based on feature vectors. In this study, we develop a molecular inference framework based on mol-infer, namely mol-infer-GNN, that utilizes GNN as the learning method while keeping the great flexibility originated from the two-layered model on the abstract structure of the chemical graph to be inferred. We conducted computational experiments on the QM9 dataset to show that our proposed GNN model can obtain satisfying learning performances for some properties despite its simple structure, and can infer small chemical graphs comprising up to 20 non-hydrogen atoms within reasonable computational time.


A note on the unique properties of the Kullback--Leibler divergence for sampling via gradient flows

arXiv.org Artificial Intelligence

Sampling from a target probability distribution whose density is known up to a normalisation constant is a fundamental task in computational statistics and machine learning. A natural way to formulate this task is optimisation of a functional measuring the dissimilarity to the target probability distribution. Following this point of view, one can derive many popular sampling frameworks including variational inference [Blei et al., 2017], algorithms based on diffusions [Roberts and Tweedie, 1996, Durmus et al., 2019] and deterministic flows [Liu, 2017], and algorithms based on importance sampling [Chopin et al., 2024, Crucinio and Pathiraja, 2025]. The connection between minimisation of a divergence and Monte Carlo algorithms is established through gradient flows over the space of probability measures (see, e.g., Chewi et al. [2025], Carrillo et al. [2024] for a recent review); with different metrics over this space leading to different differential equations whose discretisations correspond to many popular Monte Carlo algorithms. The most widely used divergence is the reverse Kullback-Leibler (KL) divergence whose gradient flow w.r.t. the Wasserstein-2 metric can be implemented by a Langevin diffusion [Jordan et al., 1998] and easily discretised in time, resulting in the Unadjusted Langevin algorithm [Roberts and Tweedie, 1996].


Duality and Policy Evaluation in Distributionally Robust Bayesian Diffusion Control

arXiv.org Machine Learning

We consider a Bayesian diffusion control problem of expected terminal utility maximization. The controller imposes a prior distribution on the unknown drift of an underlying diffusion. The Bayesian optimal control, tracking the posterior distribution of the unknown drift, can be characterized explicitly. However, in practice, the prior will generally be incorrectly specified, and the degree of model misspecification can have a significant impact on policy performance. To mitigate this and reduce overpessimism, we introduce a distributionally robust Bayesian control (DRBC) formulation in which the controller plays a game against an adversary who selects a prior in divergence neighborhood of a baseline prior. The adversarial approach has been studied in economics and efficient algorithms have been proposed in static optimization settings. We develop a strong duality result for our DRBC formulation. Combining these results together with tools from stochastic analysis, we are able to derive a loss that can be efficiently trained (as we demonstrate in our numerical experiments) using a suitable neural network architecture. As a result, we obtain an effective algorithm for computing the DRBC optimal strategy. The methodology for computing the DRBC optimal strategy is greatly simplified, as we show, in the important case in which the adversary chooses a prior from a Kullback-Leibler distributional uncertainty set.


On Universality of Non-Separable Approximate Message Passing Algorithms

arXiv.org Artificial Intelligence

Mean-field characterizations of first-order iterative algorithms -- including Approximate Message Passing (AMP), stochastic and proximal gradient descent, and Langevin diffusions -- have enabled a precise understanding of learning dynamics in many statistical applications. For algorithms whose non-linearities have a coordinate-separable form, it is known that such characterizations enjoy a degree of universality with respect to the underlying data distribution. However, mean-field characterizations of non-separable algorithm dynamics have largely remained restricted to i.i.d. Gaussian or rotationally-invariant data. In this work, we initiate a study of universality for non-separable AMP algorithms. We identify a general condition for AMP with polynomial non-linearities, in terms of a Bounded Composition Property (BCP) for their representing tensors, to admit a state evolution that holds universally for matrices with non-Gaussian entries. We then formalize a condition of BCP-approximability for Lipschitz AMP algorithms to enjoy a similar universal guarantee. We demonstrate that many common classes of non-separable non-linearities are BCP-approximable, including local denoisers, spectral denoisers for generic signals, and compositions of separable functions with generic linear maps, implying the universality of state evolution for AMP algorithms employing these non-linearities.


Training Flexible Models of Genetic Variant Effects from Functional Annotations using Accelerated Linear Algebra

arXiv.org Artificial Intelligence

To understand how genetic variants in human genomes manifest in phenotypes -- traits like height or diseases like asthma -- geneticists have sequenced and measured hundreds of thousands of individuals. Geneticists use this data to build models that predict how a genetic variant impacts phenotype given genomic features of the variant, like DNA accessibility or the presence of nearby DNA-bound proteins. As more data and features become available, one might expect predictive models to improve. Unfortunately, training these models is bottlenecked by the need to solve expensive linear algebra problems because variants in the genome are correlated with nearby variants, requiring inversion of large matrices. Previous methods have therefore been restricted to fitting small models, and fitting simplified summary statistics, rather than the full likelihood of the statistical model. In this paper, we leverage modern fast linear algebra techniques to develop DeepWAS (Deep genome Wide Association Studies), a method to train large and flexible neural network predictive models to optimize likelihood. Notably, we find that larger models only improve performance when using our full likelihood approach; when trained by fitting traditional summary statistics, larger models perform no better than small ones. We find larger models trained on more features make better predictions, potentially improving disease predictions and therapeutic target identification.


Proving the Limited Scalability of Centralized Distributed Optimization via a New Lower Bound Construction

arXiv.org Artificial Intelligence

We consider centralized distributed optimization in the classical federated learning setup, where $n$ workers jointly find an $\varepsilon$-stationary point of an $L$-smooth, $d$-dimensional nonconvex function $f$, having access only to unbiased stochastic gradients with variance $σ^2$. Each worker requires at most $h$ seconds to compute a stochastic gradient, and the communication times from the server to the workers and from the workers to the server are $τ_{s}$ and $τ_{w}$ seconds per coordinate, respectively. One of the main motivations for distributed optimization is to achieve scalability with respect to $n$. For instance, it is well known that the distributed version of SGD has a variance-dependent runtime term $\frac{h σ^2 L Δ}{n \varepsilon^2},$ which improves with the number of workers $n,$ where $Δ= f(x^0) - f^*,$ and $x^0 \in R^d$ is the starting point. Similarly, using unbiased sparsification compressors, it is possible to reduce both the variance-dependent runtime term and the communication runtime term. However, once we account for the communication from the server to the workers $τ_{s}$, we prove that it becomes infeasible to design a method using unbiased random sparsification compressors that scales both the server-side communication runtime term $τ_{s} d \frac{L Δ}{\varepsilon}$ and the variance-dependent runtime term $\frac{h σ^2 L Δ}{\varepsilon^2},$ better than poly-logarithmically in $n$, even in the homogeneous (i.i.d.) case, where all workers access the same distribution. To establish this result, we construct a new "worst-case" function and develop a new lower bound framework that reduces the analysis to the concentration of a random sum, for which we prove a concentration bound. These results reveal fundamental limitations in scaling distributed optimization, even under the homogeneous assumption.


Design of A* based heuristic algorithm for efficient interdiction in multi-Layer networks

arXiv.org Artificial Intelligence

Intercepting a criminal using limited police resources presents a significant challenge in dynamic crime environments, where the criminal's location continuously changes over time. The complexity is further heightened by the vastness of the transportation network. To tackle this problem, we propose a layered graph representation, in which each time step is associated with a duplicate of the transportation network. For any given set of attacker strategies, a near-optimal defender strategy is computed using the A-Star heuristic algorithm applied to the layered graph. The defender's goal is to maximize the probability of successful interdiction. We evaluate the performance of the proposed method by comparing it with a Mixed-Integer Linear Programming (MILP) approach used for the defender. The comparison considers both computational efficiency and solution quality. The results demonstrate that our approach effectively addresses the complexity of the problem and delivers high-quality solutions within a short computation time.


Scalable Machine Learning Algorithms using Path Signatures

arXiv.org Machine Learning

The interface between stochastic analysis and machine learning is a rapidly evolving field, with path signatures - iterated integrals that provide faithful, hierarchical representations of paths - offering a principled and universal feature map for sequential and structured data. Rooted in rough path theory, path signatures are invariant to reparameterization and well-suited for modelling evolving dynamics, long-range dependencies, and irregular sampling - common challenges in real-world time series and graph data. This thesis investigates how to harness the expressive power of path signatures within scalable machine learning pipelines. It introduces a suite of models that combine theoretical robustness with computational efficiency, bridging rough path theory with probabilistic modelling, deep learning, and kernel methods. Key contributions include: Gaussian processes with signature kernel-based covariance functions for uncertainty-aware time series modelling; the Seq2Tens framework, which employs low-rank tensor structure in the weight space for scalable deep modelling of long-range dependencies; and graph-based models where expected signatures over graphs induce hypo-elliptic diffusion processes, offering expressive yet tractable alternatives to standard graph neural networks. Further developments include Random Fourier Signature Features, a scalable kernel approximation with theoretical guarantees, and Recurrent Sparse Spectrum Signature Gaussian Processes, which combine Gaussian processes, signature kernels, and random features with a principled forgetting mechanism for multi-horizon time series forecasting with adaptive context length. We hope this thesis serves as both a methodological toolkit and a conceptual bridge, and provides a useful reference for the current state of the art in scalable, signature-based learning for sequential and structured data.


Convergent Methods for Koopman Operators on Reproducing Kernel Hilbert Spaces

arXiv.org Machine Learning

Data-driven spectral analysis of Koopman operators is a powerful tool for understanding numerous real-world dynamical systems, from neuronal activity to variations in sea surface temperature. The Koopman operator acts on a function space and is most commonly studied on the space of square-integrable functions. However, defining it on a suitable reproducing kernel Hilbert space (RKHS) offers numerous practical advantages, including pointwise predictions with error bounds, improved spectral properties that facilitate computations, and more efficient algorithms, particularly in high dimensions. We introduce the first general, provably convergent, data-driven algorithms for computing spectral properties of Koopman and Perron--Frobenius operators on RKHSs. These methods efficiently compute spectra and pseudospectra with error control and spectral measures while exploiting the RKHS structure to avoid the large-data limits required in the $L^2$ settings. The function space is determined by a user-specified kernel, eliminating the need for quadrature-based sampling as in $L^2$ and enabling greater flexibility with finite, externally provided datasets. Using the Solvability Complexity Index hierarchy, we construct adversarial dynamical systems for these problems to show that no algorithm can succeed in fewer limits, thereby proving the optimality of our algorithms. Notably, this impossibility extends to randomized algorithms and datasets. We demonstrate the effectiveness of our algorithms on challenging, high-dimensional datasets arising from real-world measurements and high-fidelity numerical simulations, including turbulent channel flow, molecular dynamics of a binding protein, Antarctic sea ice concentration, and Northern Hemisphere sea surface height. The algorithms are publicly available in the software package $\texttt{SpecRKHS}$.