Sanz-Alonso, Daniel
Long-time accuracy of ensemble Kalman filters for chaotic and machine-learned dynamical systems
Sanz-Alonso, Daniel, Waniorek, Nathan
Filtering is concerned with online estimation of the state of a dynamical system from partial and noisy observations. In applications where the state is high dimensional, ensemble Kalman filters are often the method of choice. This paper establishes long-time accuracy of ensemble Kalman filters. We introduce conditions on the dynamics and the observations under which the estimation error remains small in the long-time horizon. Our theory covers a wide class of partially-observed chaotic dynamical systems, which includes the Navier-Stokes equations and Lorenz models. In addition, we prove long-time accuracy of ensemble Kalman filters with surrogate dynamics, thus validating the use of machine-learned forecast models in ensemble data assimilation.
Inverse Problems and Data Assimilation: A Machine Learning Approach
Bach, Eviatar, Baptista, Ricardo, Sanz-Alonso, Daniel, Stuart, Andrew
The aim of the notes is to demonstrate the potential for ideas in machine learning to impact on the fields of inverse problems and data assimilation. The perspective is one that is primarily aimed at researchers from inverse problems and/or data assimilation who wish to see a mathematical presentation of machine learning as it pertains to their fields. As a by-product of the presentation we present a succinct mathematical treatment of various topics in machine learning. The material on machine learning, along with some other related topics, is summarized in Part III, Appendix. Part I of the notes is concerned with inverse problems, employing material from Part III; Part II of the notes is concerned with data assimilation, employing material from Parts I and III.
Data Assimilation with Machine Learning Surrogate Models: A Case Study with FourCastNet
Adrian, Melissa, Sanz-Alonso, Daniel, Willett, Rebecca
Modern data-driven surrogate models for weather forecasting provide accurate short-term predictions but inaccurate and nonphysical long-term forecasts. This paper investigates online weather prediction using machine learning surrogates supplemented with partial and noisy observations. We empirically demonstrate and theoretically justify that, despite the long-time instability of the surrogates and the sparsity of the observations, filtering estimates can remain accurate in the long-time horizon. As a case study, we integrate FourCastNet, a state-of-the-art weather surrogate model, within a variational data assimilation framework using partial, noisy ERA5 data. Our results show that filtering estimates remain accurate over a year-long assimilation window and provide effective initial conditions for forecasting tasks, including extreme event prediction.
Bayesian Optimization with Noise-Free Observations: Improved Regret Bounds via Random Exploration
Kim, Hwanwoo, Sanz-Alonso, Daniel
We introduce new algorithms rooted in scattered data approximation that rely on a random exploration step to ensure that the fill-distance of query points decays at a near-optimal rate. Our algorithms retain the ease of implementation of the classical GP-UCB algorithm and satisfy cumulative regret bounds that nearly match those conjectured in [Vak22], hence solving a COLT open problem. Furthermore, the new algorithms outperform GP-UCB and other popular Bayesian optimization strategies in several examples.
Gaussian Process Regression under Computational and Epistemic Misspecification
Sanz-Alonso, Daniel, Yang, Ruiyi
Gaussian process regression is a classical kernel method for function estimation and data interpolation. In large data applications, computational costs can be reduced using low-rank or sparse approximations of the kernel. This paper investigates the effect of such kernel approximations on the interpolation error. We introduce a unified framework to analyze Gaussian process regression under important classes of computational misspecification: Karhunen-Lo\`eve expansions that result in low-rank kernel approximations, multiscale wavelet expansions that induce sparsity in the covariance matrix, and finite element representations that induce sparsity in the precision matrix. Our theory also accounts for epistemic misspecification in the choice of kernel parameters.
Optimization on Manifolds via Graph Gaussian Processes
Kim, Hwanwoo, Sanz-Alonso, Daniel, Yang, Ruiyi
Optimization problems on manifolds are ubiquitous in science and engineering. For instance, lowrank matrix completion and rotational alignment of 3D bodies can be formulated as optimization problems over spaces of matrices that are naturally endowed with manifold structures. These matrix manifolds belong to agreeable families [56] for which Riemannian gradients, geodesics, and other geometric quantities have closed-form expressions that facilitate the use of Riemannian optimization algorithms [19, 1, 9]. In contrast, this paper is motivated by optimization problems where the search space is a manifold that the practitioner can only access through a discrete point cloud representation, preventing direct use of Riemannian optimization algorithms. Moreover, the hidden manifold may not belong to an agreeable family, further hindering the use of classical methods. Illustrative examples where manifolds are represented by point cloud data include computer vision, robotics, and shape analysis of geometric morphometrics [33, 23, 25]. Additionally, across many applications in data science, high-dimensional point cloud data contains low-dimensional structure that can be modeled as a manifold for algorithmic design and theoretical analysis [14, 3, 27]. Motivated by these problems, this paper introduces a Bayesian optimization method with convergence guarantees to optimize an expensive-to-evaluate function on a point cloud of manifold samples.
Non-Asymptotic Analysis of Ensemble Kalman Updates: Effective Dimension and Localization
Ghattas, Omar Al, Sanz-Alonso, Daniel
The main motivation behind ensemble Kalman methods is that they often perform well with a small ensemble size N, which is essential in applications where generating each particle is costly. However, theoretical studies have primarily focused on large ensemble asymptotics, that is, on the limit N . While these mean-field results are mathematically interesting and have led to significant practical improvements, they fail to explain the empirical success of ensemble Kalman methods when deployed with a small ensemble size. The aim of this paper is to develop a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why, and under what circumstances, a small ensemble size may suffice. To that end, we establish non-asymptotic error bounds in terms of suitable notions of effective dimension of the prior covariance model that account for spectrum decay (which may represent smoothness of a prior random field) and approximate sparsity (which may represent spatial decay of correlations).
Reduced-Order Autodifferentiable Ensemble Kalman Filters
Chen, Yuming, Sanz-Alonso, Daniel, Willett, Rebecca
This paper introduces a computational framework to reconstruct and forecast a partially observed state that evolves according to an unknown or expensive-to-simulate dynamical system. Our reduced-order autodifferentiable ensemble Kalman filters (ROAD-EnKFs) learn a latent low-dimensional surrogate model for the dynamics and a decoder that maps from the latent space to the state space. The learned dynamics and decoder are then used within an ensemble Kalman filter to reconstruct and forecast the state. Numerical experiments show that if the state dynamics exhibit a hidden low-dimensional structure, ROAD-EnKFs achieve higher accuracy at lower computational cost compared to existing methods. If such structure is not expressed in the latent state dynamics, ROAD-EnKFs achieve similar accuracy at lower cost, making them a promising approach for surrogate state reconstruction and forecasting.
Mathematical Foundations of Graph-Based Bayesian Semi-Supervised Learning
Trillos, Nicolas García, Sanz-Alonso, Daniel, Yang, Ruiyi
In recent decades, science and engineering have been revolutionized by a momentous growth in the amount of available data. However, despite the unprecedented ease with which data are now collected and stored, labeling data by supplementing each feature with an informative tag remains to be challenging. Illustrative tasks where the labeling process requires expert knowledge or is tedious and time-consuming include labeling X-rays with a diagnosis, protein sequences with a protein type, texts by their topic, tweets by their sentiment, or videos by their genre. In these and numerous other examples, only a few features may be manually labeled due to cost and time constraints. How can we best propagate label information from a small number of expensive labeled features to a vast number of unlabeled ones? This is the question addressed by semi-supervised learning (SSL). This article overviews recent foundational developments on graph-based Bayesian SSL, a probabilistic framework for label propagation using similarities between features. SSL is an active research area and a thorough review of the extant literature is beyond the scope of this article. Our focus will be on topics drawn from our own research that illustrate the wide range of mathematical tools and ideas that underlie the rigorous study of the statistical accuracy and computational efficiency of graph-based Bayesian SSL.
A Variational Inference Approach to Inverse Problems with Gamma Hyperpriors
Agrawal, Shiv, Kim, Hwanwoo, Sanz-Alonso, Daniel, Strang, Alexander
Hierarchical models with gamma hyperpriors provide a flexible, sparse-promoting framework to bridge $L^1$ and $L^2$ regularizations in Bayesian formulations to inverse problems. Despite the Bayesian motivation for these models, existing methodologies are limited to \textit{maximum a posteriori} estimation. The potential to perform uncertainty quantification has not yet been realized. This paper introduces a variational iterative alternating scheme for hierarchical inverse problems with gamma hyperpriors. The proposed variational inference approach yields accurate reconstruction, provides meaningful uncertainty quantification, and is easy to implement. In addition, it lends itself naturally to conduct model selection for the choice of hyperparameters. We illustrate the performance of our methodology in several computed examples, including a deconvolution problem and sparse identification of dynamical systems from time series data.