Goto

Collaborating Authors

 Mathematical & Statistical Methods


Parallelized Midpoint Randomization for Langevin Monte Carlo

arXiv.org Artificial Intelligence

We explore the sampling problem within the framework where parallel evaluations of the gradient of the log-density are feasible. Our investigation focuses on target distributions characterized by smooth and strongly log-concave densities. We revisit the parallelized randomized midpoint method and employ proof techniques recently developed for analyzing its purely sequential version. Leveraging these techniques, we derive upper bounds on the Wasserstein distance between the sampling and target densities. These bounds quantify the runtime improvement achieved by utilizing parallel processing units, which can be considerable.


Rapid Bayesian identification of sparse nonlinear dynamics from scarce and noisy data

arXiv.org Machine Learning

The pursuit of direct model equation discovery has been an ongoing and significant area of interest in scientific machine learning. The popular sparse identification of nonlinear dynamics (SINDy) framework [1] offers a promising approach to extract parsimonious equations directly from data. SINDy's promotion of parsimony by sparse regression allows for the identification of an interpretable model that balances accuracy with generalizability, while its simplicity leads to a relatively efficient and fast learning process compared to other machine learning techniques. The framework has been successfully applied in a variety of applications, such as model idenficiation in plasma physics [2], control engineering [3, 4], biological transport problems [5], socio-cognitive systems [6], epidemiology [7, 8] and turbulence modelling [9]. Furthermore, its remarkable extendibility has attracted a range of modifications, including the adaptation to discover partial differential equations [10], the extension to libraries of rational functions [11], the integration of ensembling techniques to improve data efficiency [12] and the use of weak formulations [13, 14] to avoid noise amplification when computing derivatives from discrete data. One major difficulty in using scientific machine learning methods in fields such as biophysics, ecology, and microbiology, is that measured data from these fields is often noisy and scarce.


Generative Modelling with Tensor Train approximations of Hamilton--Jacobi--Bellman equations

arXiv.org Machine Learning

Sampling from probability densities is a common challenge in fields such as Uncertainty Quantification (UQ) and Generative Modelling (GM). In GM in particular, the use of reverse-time diffusion processes depending on the log-densities of Ornstein-Uhlenbeck forward processes are a popular sampling tool. In Berner et al. [2022] the authors point out that these log-densities can be obtained by solution of a \textit{Hamilton-Jacobi-Bellman} (HJB) equation known from stochastic optimal control. While this HJB equation is usually treated with indirect methods such as policy iteration and unsupervised training of black-box architectures like Neural Networks, we propose instead to solve the HJB equation by direct time integration, using compressed polynomials represented in the Tensor Train (TT) format for spatial discretization. Crucially, this method is sample-free, agnostic to normalization constants and can avoid the curse of dimensionality due to the TT compression. We provide a complete derivation of the HJB equation's action on Tensor Train polynomials and demonstrate the performance of the proposed time-step-, rank- and degree-adaptive integration method on a nonlinear sampling task in 20 dimensions.


Robust Learning of Noisy Time Series Collections Using Stochastic Process Models with Motion Codes

arXiv.org Machine Learning

While time series classification and forecasting problems have been extensively studied, the cases of noisy time series data with arbitrary time sequence lengths have remained challenging. Each time series instance can be thought of as a sample realization of a noisy dynamical model, which is characterized by a continuous stochastic process. For many applications, the data are mixed and consist of several types of noisy time series sequences modeled by multiple stochastic processes, making the forecasting and classification tasks even more challenging. Instead of regressing data naively and individually to each time series type, we take a latent variable model approach using a mixtured Gaussian processes with learned spectral kernels. More specifically, we auto-assign each type of noisy time series data a signature vector called its motion code. Then, conditioned on each assigned motion code, we infer a sparse approximation of the corresponding time series using the concept of the most informative timestamps. Our unmixing classification approach involves maximizing the likelihood across all the mixed noisy time series sequences of varying lengths. This stochastic approach allows us to learn not only within a single type of noisy time series data but also across many underlying stochastic processes, giving us a way to learn multiple dynamical models in an integrated and robust manner. The different learned latent stochastic models allow us to generate specific sub-type forecasting. We provide several quantitative comparisons demonstrating the performance of our approach.


ChatGPT in Linear Algebra: Strides Forward, Steps to Go

arXiv.org Artificial Intelligence

As soon as a new technology emerges, the education community explores its affordances and the possibilities to apply it in education. In this paper, we analyze sessions with ChatGPT around topics in basic Linear Algebra. We reflect the process undertaken by the ChatGPT along the recent year in our area of interest, emphasising the vast improvement that has been done in grappling with Linear Algebra problems. In particular, the question whether this software can be a teaching assistant or even somehow replace the human teacher, is addressed. As of the time this paper is written, the answer is generally negative. For the small part where the answer can be positive, some reflections about an original instrumental genesis are given. Communication with the software gives the impression to talk to a human, and sometimes the question is whether the software understands the question or not. Therefore, the reader's attention is drawn to the fact that ChatGPT works on a statistical basis and not according to reflection and understanding.


An Accelerated Distributed Stochastic Gradient Method with Momentum

arXiv.org Artificial Intelligence

In this paper, we introduce an accelerated distributed stochastic gradient method with momentum for solving the distributed optimization problem, where a group of $n$ agents collaboratively minimize the average of the local objective functions over a connected network. The method, termed ``Distributed Stochastic Momentum Tracking (DSMT)'', is a single-loop algorithm that utilizes the momentum tracking technique as well as the Loopless Chebyshev Acceleration (LCA) method. We show that DSMT can asymptotically achieve comparable convergence rates as centralized stochastic gradient descent (SGD) method under a general variance condition regarding the stochastic gradients. Moreover, the number of iterations (transient times) required for DSMT to achieve such rates behaves as $\mathcal{O}(n^{5/3}/(1-\lambda))$ for minimizing general smooth objective functions, and $\mathcal{O}(\sqrt{n/(1-\lambda)})$ under the Polyak-{\L}ojasiewicz (PL) condition. Here, the term $1-\lambda$ denotes the spectral gap of the mixing matrix related to the underlying network topology. Notably, the obtained results do not rely on multiple inter-node communications or stochastic gradient accumulation per iteration, and the transient times are the shortest under the setting to the best of our knowledge.


Kernels and learning curves for Gaussian process regression on random graphs

Neural Information Processing Systems

We investigate how well Gaussian process regression can learn functions defined on graphs, using large regular random graphs as a paradigmatic example. Random-walk based kernels are shown to have some surprising properties: within the standard approximation of a locally tree-like graph structure, the kernel does not become constant, i.e.neighbouring function values do not become fully correlated, when the lengthscale \sigma of the kernel is made large. Instead the kernel attains a non-trivial limiting form, which we calculate. The fully correlated limit is reached only once loops become relevant, and we estimate where the crossover to this regime occurs. Our main subject are learning curves of Bayes error versus training set size.


k-NN Regression Adapts to Local Intrinsic Dimension

Neural Information Processing Systems

Many nonparametric regressors were recently shown to converge at rates that depend only on the intrinsic dimension of data. These regressors thus escape the curse of dimension when high-dimensional data has low intrinsic dimension (e.g. a manifold). We show that k -NN regression is also adaptive to intrinsic dimension. In particular our rates are local to a query x and depend only on the way masses of balls centered at x vary with radius. Furthermore, we show a simple way to choose k k(x) locally at any x so as to nearly achieve the minimax rate at x in terms of the unknown intrinsic dimension in the vicinity of x .


A Computationally Efficient Learning-Based Model Predictive Control for Multirotors under Aerodynamic Disturbances

arXiv.org Artificial Intelligence

Neglecting complex aerodynamic effects hinders high-speed yet high-precision multirotor autonomy. In this paper, we present a computationally efficient learning-based model predictive controller that simultaneously optimizes a trajectory that can be tracked within the physical limits (on thrust and orientation) of the multirotor system despite unknown aerodynamic forces and adapts the control input. To do this, we leverage the well-known differential flatness property of multirotors, which allows us to transform their nonlinear dynamics into a linear model. The main limitation of current flatness-based planning and control approaches is that they often neglect dynamic feasibility. This is because these constraints are nonlinear as a result of the mapping between the input, i.e., multirotor thrust, and the flat state. In our approach, we learn a novel representation of the drag forces by learning the mapping from the flat state to the multirotor thrust vector (in a world frame) as a Gaussian Process (GP). Our proposed approach leverages the properties of GPs to develop a convex optimal controller that can be iteratively solved as a second-order cone program (SOCP). In simulation experiments, our proposed approach outperforms related model predictive controllers that do not account for aerodynamic effects on trajectory feasibility, leading to a reduction of up to 55% in absolute tracking error.


Top-$K$ ranking with a monotone adversary

arXiv.org Artificial Intelligence

In this paper, we address the top-$K$ ranking problem with a monotone adversary. We consider the scenario where a comparison graph is randomly generated and the adversary is allowed to add arbitrary edges. The statistician's goal is then to accurately identify the top-$K$ preferred items based on pairwise comparisons derived from this semi-random comparison graph. The main contribution of this paper is to develop a weighted maximum likelihood estimator (MLE) that achieves near-optimal sample complexity, up to a $\log^2(n)$ factor, where n denotes the number of items under comparison. This is made possible through a combination of analytical and algorithmic innovations. On the analytical front, we provide a refined $\ell_\infty$ error analysis of the weighted MLE that is more explicit and tighter than existing analyses. It relates the $\ell_\infty$ error with the spectral properties of the weighted comparison graph. Motivated by this, our algorithmic innovation involves the development of an SDP-based approach to reweight the semi-random graph and meet specified spectral properties. Additionally, we propose a first-order method based on the Matrix Multiplicative Weight Update (MMWU) framework. This method efficiently solves the resulting SDP in nearly-linear time relative to the size of the semi-random comparison graph.