cholesky
Kernel Quadrature with Randomly Pivoted Cholesky
This paper presents new quadrature rules for functions in a reproducing kernel Hilbert space using nodes drawn by a sampling algorithm known as randomly pivoted Cholesky. The resulting computational procedure compares favorably to previous kernel quadrature methods, which either achieve low accuracy or require solving a computationally challenging sampling problem. Theoretical and numerical results show that randomly pivoted Cholesky is fast and achieves comparable quadrature error rates to more computationally expensive quadrature schemes based on continuous volume sampling, thinning, and recombination. Randomly pivoted Cholesky is easily adapted to complicated geometries with arbitrary kernels, unlocking new potential for kernel quadrature.
the NLLs in the final version of the paper in addition to reporting averages and standard deviations in all of our other 3 tables by running more trials
We agree with all three reviewers that evaluating the predictive variances is important. Thank you for your comments and suggestions. Finally, we will clarify that SGPR is by (Titsias, 2009) and SVGP is by (Hensman et al., 2013). This has important ramifications, e.g., We were unaware of Nguyen's paper at submission and we will add this discussion to the paper. We note that the precomputation, like CG, can be run to a specified desired tolerance. Hensman et al. (2013) used 1000 inducing points on the massive Airline dataset.
Kernel Quadrature with Randomly Pivoted Cholesky
This paper presents new quadrature rules for functions in a reproducing kernel Hilbert space using nodes drawn by a sampling algorithm known as randomly pivoted Cholesky. The resulting computational procedure compares favorably to previous kernel quadrature methods, which either achieve low accuracy or require solving a computationally challenging sampling problem. Theoretical and numerical results show that randomly pivoted Cholesky is fast and achieves comparable quadrature error rates to more computationally expensive quadrature schemes based on continuous volume sampling, thinning, and recombination. Randomly pivoted Cholesky is easily adapted to complicated geometries with arbitrary kernels, unlocking new potential for kernel quadrature.
Randomly Pivoted Partial Cholesky: Random How?
We consider the problem of finding good low rank approximations of symmetric, positive-definite $A \in \mathbb{R}^{n \times n}$. Chen-Epperly-Tropp-Webber showed, among many other things, that the randomly pivoted partial Cholesky algorithm that chooses the $i-$th row with probability proportional to the diagonal entry $A_{ii}$ leads to a universal contraction of the trace norm (the Schatten 1-norm) in expectation for each step. We show that if one chooses the $i-$th row with likelihood proportional to $A_{ii}^2$ one obtains the same result in the Frobenius norm (the Schatten 2-norm). Implications for the greedy pivoting rule and pivot selection strategies are discussed.
Practical and Matching Gradient Variance Bounds for Black-Box Variational Bayesian Inference
Kim, Kyurae, Wu, Kaiwen, Oh, Jisu, Gardner, Jacob R.
Understanding the gradient variance of blackbox Despite the advances of BBVI, little is known about its theoretical variational inference (BBVI) is a crucial step properties. Even when restricted to the locationscale for establishing its convergence and developing family (Definition 2), it is unknown whether BBVI algorithmic improvements. However, existing is guaranteed to converge without having to modify the studies have yet to show that the gradient variance algorithms used in practice, for example, by enforcing of BBVI satisfies the conditions used to bounded domains, bounded support, bounded gradients, study the convergence of stochastic gradient descent and such. This theoretical insight is necessary since BBVI (SGD), the workhorse of BBVI. In this methods are known to be less robust (Yao et al., 2018; work, we show that BBVI satisfies a matching Dhaka et al., 2020; Welandawe et al., 2022; Dhaka et al., bound corresponding to the condition used 2021; Domke, 2020) compared to other inference methods in the SGD literature when applied to smooth and such as Markov chain Monte Carlo. Although progress has quadratically-growing log-likelihoods. Our results been made to formalize the theory of BBVI with some generality, generalize to nonlinear covariance parameterizations the gap between our understanding of BBVI and the widely used in the practice of BBVI.
Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization
Pleiss, Geoff, Jankowiak, Martin, Eriksson, David, Damle, Anil, Gardner, Jacob R.
Matrix square roots and their inverses arise frequently in machine learning, e.g., when sampling from high-dimensional Gaussians $\mathcal{N}(\mathbf 0, \mathbf K)$ or whitening a vector $\mathbf b$ against covariance matrix $\mathbf K$. While existing methods typically require $O(N^3)$ computation, we introduce a highly-efficient quadratic-time algorithm for computing $\mathbf K^{1/2} \mathbf b$, $\mathbf K^{-1/2} \mathbf b$, and their derivatives through matrix-vector multiplication (MVMs). Our method combines Krylov subspace methods with a rational approximation and typically achieves $4$ decimal places of accuracy with fewer than $100$ MVMs. Moreover, the backward pass requires little additional computation. We demonstrate our method's applicability on matrices as large as $50,\!000 \times 50,\!000$ - well beyond traditional methods - with little approximation error. Applying this increased scalability to variational Gaussian processes, Bayesian optimization, and Gibbs sampling results in more powerful models with higher accuracy.
Optimal whitening and decorrelation
Kessy, Agnan, Lewin, Alex, Strimmer, Korbinian
Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening procedures. Consequently, there is a diverse range of sphering methods in use, for example based on principal component analysis (PCA), Cholesky matrix decomposition and zero-phase component analysis (ZCA), among others. Here we provide an overview of the underlying theory and discuss five natural whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the cross-correlation matrix between sphered and original variables allows to break the rotational invariance and to identify optimal whitening transformations. As a result we recommend two particular approaches: ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and PCA-cor whitening to obtain sphered variables that maximally compress the original variables.