Mathematical & Statistical Methods
Stochastic Processes with Modified Lognormal Distribution Featuring Flexible Upper Tail
Hristopulos, Dionissios T., Baxevani, Anastassia, Kaniadakis, Giorgio
Asymmetric, non-Gaussian probability distributions are often observed in the analysis of natural and engineering datasets. The lognormal distribution is a standard model for data with skewed frequency histograms and fat tails. However, the lognormal law severely restricts the asymptotic dependence of the probability density and the hazard function for high values. Herein we present a family of three-parameter non-Gaussian probability density functions that are based on generalized kappa-exponential and kappa-logarithm functions and investigate its mathematical properties. These kappa-lognormal densities represent continuous deformations of the lognormal with lighter right tails, controlled by the parameter kappa. In addition, bimodal distributions are obtained for certain parameter combinations. We derive closed-form analytic expressions for the main statistical functions of the kappa-lognormal distribution. For the moments, we derive bounds that are based on hypergeometric functions as well as series expansions. Explicit expressions for the gradient and Hessian of the negative log-likelihood are obtained to facilitate numerical maximum-likelihood estimates of the kappa-lognormal parameters from data. We also formulate a joint probability density function for kappa-lognormal stochastic processes by applying Jacobi's multivariate theorem to a latent Gaussian process. Estimation of the kappa-lognormal distribution based on synthetic and real data is explored. Furthermore, we investigate applications of kappa-lognormal processes with different covariance kernels in time series forecasting and spatial interpolation using warped Gaussian process regression. Our results are of practical interest for modeling skewed distributions in various scientific and engineering fields.
Panda: A pretrained forecast model for universal representation of chaotic dynamics
Lai, Jeffrey, Bao, Anthony, Gilpin, William
Chaotic systems are intrinsically sensitive to small errors, challenging efforts to construct predictive data-driven models of real-world dynamical systems such as fluid flows or neuronal activity. Prior efforts comprise either specialized models trained separately on individual time series, or foundation models trained on vast time series databases with little underlying dynamical structure. Motivated by dynamical systems theory, we present Panda, Patched Attention for Nonlinear DynAmics. We train Panda on a novel synthetic, extensible dataset of $2 \times 10^4$ chaotic dynamical systems that we discover using an evolutionary algorithm. Trained purely on simulated data, Panda exhibits emergent properties: zero-shot forecasting of unseen real world chaotic systems, and nonlinear resonance patterns in cross-channel attention heads. Despite having been trained only on low-dimensional ordinary differential equations, Panda spontaneously develops the ability to predict partial differential equations without retraining. We demonstrate a neural scaling law for differential equations, underscoring the potential of pretrained models for probing abstract mathematical domains like nonlinear dynamics.
Efficient Optimization with Orthogonality Constraint: a Randomized Riemannian Submanifold Method
Han, Andi, Poirion, Pierre-Louis, Takeda, Akiko
Optimization with orthogonality constraints frequently arises in various fields such as machine learning. Riemannian optimization offers a powerful framework for solving these problems by equipping the constraint set with a Riemannian manifold structure and performing optimization intrinsically on the manifold. This approach typically involves computing a search direction in the tangent space and updating variables via a retraction operation. However, as the size of the variables increases, the computational cost of the retraction can become prohibitively high, limiting the applicability of Riemannian optimization to large-scale problems. To address this challenge and enhance scalability, we propose a novel approach that restricts each update on a random submanifold, thereby significantly reducing the per-iteration complexity. We introduce two sampling strategies for selecting the random submanifolds and theoretically analyze the convergence of the proposed methods. We provide convergence results for general nonconvex functions and functions that satisfy Riemannian Polyak-Lojasiewicz condition as well as for stochastic optimization settings. Additionally, we demonstrate how our approach can be generalized to quotient manifolds derived from the orthogonal manifold. Extensive experiments verify the benefits of the proposed method, across a wide variety of problems.
The Stochastic Occupation Kernel (SOCK) Method for Learning Stochastic Differential Equations
Wells, Michael L., Lahouel, Kamel, Jedynak, Bruno
We present a novel kernel-based method for learning multivariate stochastic differential equations (SDEs). The method follows a two-step procedure: we first estimate the drift term function, then the (matrix-valued) diffusion function given the drift. Occupation kernels are integral functionals on a reproducing kernel Hilbert space (RKHS) that aggregate information over a trajectory. Our approach leverages vector-valued occupation kernels for estimating the drift component of the stochastic process. For diffusion estimation, we extend this framework by introducing operator-valued occupation kernels, enabling the estimation of an auxiliary matrix-valued function as a positive semi-definite operator, from which we readily derive the diffusion estimate. This enables us to avoid common challenges in SDE learning, such as intractable likelihoods, by optimizing a reconstruction-error-based objective. We propose a simple learning procedure that retains strong predictive accuracy while using Fenchel duality to promote efficiency. We validate the method on simulated benchmarks and a real-world dataset of Amyloid imaging in healthy and Alzheimer's disease (AD) subjects.
RGNMR: A Gauss-Newton method for robust matrix completion with theoretical guarantees
Laufer, Eilon Vaknin, Nadler, Boaz
Recovering a low rank matrix from a subset of its entries, some of which may be corrupted, is known as the robust matrix completion (RMC) problem. Existing RMC methods have several limitations: they require a relatively large number of observed entries; they may fail under overparametrization, when their assumed rank is higher than the correct one; and many of them fail to recover even mildly ill-conditioned matrices. In this paper we propose a novel RMC method, denoted $\texttt{RGNMR}$, which overcomes these limitations. $\texttt{RGNMR}$ is a simple factorization-based iterative algorithm, which combines a Gauss-Newton linearization with removal of entries suspected to be outliers. On the theoretical front, we prove that under suitable assumptions, $\texttt{RGNMR}$ is guaranteed exact recovery of the underlying low rank matrix. Our theoretical results improve upon the best currently known for factorization-based methods. On the empirical front, we show via several simulations the advantages of $\texttt{RGNMR}$ over existing RMC methods, and in particular its ability to handle a small number of observed entries, overparameterization of the rank and ill-conditioned matrices.
Fast and Simple Densest Subgraph with Predictions
We study the densest subgraph problem and its variants through the lens of learning-augmented algorithms. For this problem, the greedy algorithm by Charikar (APPROX 2000) provides a linear-time $ 1/2 $-approximation, while computing the exact solution typically requires solving a linear program or performing maximum flow computations.We show that given a partial solution, i.e., one produced by a machine learning classifier that captures at least a $ (1 - ฮต) $-fraction of nodes in the optimal subgraph, it is possible to design an extremely simple linear-time algorithm that achieves a provable $ (1 - ฮต) $-approximation. Our approach also naturally extends to the directed densest subgraph problem and several NP-hard variants.An experiment on the Twitch Ego Nets dataset shows that our learning-augmented algorithm outperforms Charikar's greedy algorithm and a baseline that directly returns the predicted densest subgraph without additional algorithmic processing.
Learning Nonlinear Dynamics in Physical Modelling Synthesis using Neural Ordinary Differential Equations
Zheleznov, Victor, Bilbao, Stefan, Wright, Alec, King, Simon
Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are possible in order to handle geometric nonlinearities. One such case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems such as electronic circuits automatically from data. In this work, we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution for linear vibration of system's modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the network architecture. As an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.
Scalable Computations for Generalized Mixed Effects Models with Crossed Random Effects Using Krylov Subspace Methods
Kรผndig, Pascal, Sigrist, Fabio
Mixed effects models are widely used for modeling data with hierarchically grouped structures and high-cardinality categorical predictor variables. However, for high-dimensional crossed random effects, current standard computations relying on Cholesky decompositions can become prohibitively slow. In this work, we present novel Krylov subspace-based methods that address several existing computational bottlenecks. Among other things, we theoretically analyze and empirically evaluate various preconditioners for the conjugate gradient and stochastic Lanczos quadrature methods, derive new convergence results, and develop computationally efficient methods for calculating predictive variances. Extensive experiments using simulated and real-world data sets show that our proposed methods scale much better than Cholesky-based computations, for instance, achieving a runtime reduction of approximately two orders of magnitudes for both estimation and prediction. Moreover, our software implementation is up to 10'000 times faster and more stable than state-of-the-art implementations such as lme4 and glmmTMB when using default settings. Our methods are implemented in the free C++ software library GPBoost with high-level Python and R packages.
A stochastic gradient method for trilevel optimization
Giovannelli, Tommaso, Kent, Griffin Dean, Vicente, Luis Nunes
With the success that the field of bilevel optimization has seen in recent years, similar methodologies have started being applied to solving more difficult applications that arise in trilevel optimization. At the helm of these applications are new machine learning formulations that have been proposed in the trilevel context and, as a result, efficient and theoretically sound stochastic methods are required. In this work, we propose the first-ever stochastic gradient descent method for solving unconstrained trilevel optimization problems and provide a convergence theory that covers all forms of inexactness of the trilevel adjoint gradient, such as the inexact solutions of the middle-level and lower-level problems, inexact computation of the trilevel adjoint formula, and noisy estimates of the gradients, Hessians, Jacobians, and tensors of third-order derivatives involved. We also demonstrate the promise of our approach by providing numerical results on both synthetic trilevel problems and trilevel formulations for hyperparameter adversarial tuning.
BURNS: Backward Underapproximate Reachability for Neural-Feedback-Loop Systems
Sidrane, Chelsea, Tumova, Jana
Learning-enabled planning and control algorithms are increasingly popular, but they often lack rigorous guarantees of performance or safety. We introduce an algorithm for computing underapproximate backward reachable sets of nonlinear discrete time neural feedback loops. We then use the backward reachable sets to check goal-reaching properties. Our algorithm is based on overapproximating the system dynamics function to enable computation of underapproximate backward reachable sets through solutions of mixed-integer linear programs. We rigorously analyze the soundness of our algorithm and demonstrate it on a numerical example. Our work expands the class of properties that can be verified for learning-enabled systems.