posterior mean
A Flexible Empirical Bayes Approach to Generalized Linear Models, with Applications to Sparse Logistic Regression
Xie, Dongyue, Zhu, Wanrong, Stephens, Matthew
We introduce a flexible empirical Bayes approach for fitting Bayesian generalized linear models. Specifically, we adopt a novel mean-field variational inference (VI) method and the prior is estimated within the VI algorithm, making the method tuning-free. Unlike traditional VI methods that optimize the posterior density function, our approach directly optimizes the posterior mean and prior parameters. This formulation reduces the number of parameters to optimize and enables the use of scalable algorithms such as L-BFGS and stochastic gradient descent. Furthermore, our method automatically determines the optimal posterior based on the prior and likelihood, distinguishing it from existing VI methods that often assume a Gaussian variational. Our approach represents a unified framework applicable to a wide range of exponential family distributions, removing the need to develop unique VI methods for each combination of likelihood and prior distributions. We apply the framework to solve sparse logistic regression and demonstrate the superior predictive performance of our method in extensive numerical studies, by comparing it to prevalent sparse logistic regression approaches.
- North America > United States > California > Orange County > Irvine (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
Physics-informed Gaussian Process Regression in Solving Eigenvalue Problem of Linear Operators
Applying Physics-Informed Gaussian Process Regression to the eigenvalue problem $(\mathcal{L}-λ)u = 0$ poses a fundamental challenge, where the null source term results in a trivial predictive mean and a degenerate marginal likelihood. Drawing inspiration from system identification, we construct a transfer function-type indicator for the unknown eigenvalue/eigenfunction using the physics-informed Gaussian Process posterior. We demonstrate that the posterior covariance is only non-trivial when $λ$ corresponds to an eigenvalue of the partial differential operator $\mathcal{L}$, reflecting the existence of a non-trivial eigenspace, and any sample from the posterior lies in the eigenspace of the linear operator. We demonstrate the effectiveness of the proposed approach through several numerical examples with both linear and non-linear eigenvalue problems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
Posterior and Computational Uncertainty in Gaussian Processes
Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method's posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets.
- Oceania > Australia (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- (2 more...)
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
- North America > United States > Ohio > Franklin County > Columbus (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.82)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada (0.04)
- Europe > Netherlands > South Holland > Leiden (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
A Proofs
This appendix contains the proofs of the results found in Section 4. We start by introducing a useful The claim follows then directly from (4) and the definition of mutual information.Lemma 2. We then can compute the derivative and ask under which conditions it is non negative. The function b defined in (18) is monotonically increasing for positive arguments. Finally, let us fix ε > 0. Combining Lemmas 7 and 8, we obtain: b( σ The following result makes this statement precise. The following lemma makes this statement precise. In this Appendix, we collect details about the experiment presented in Section 6. Code for the used acquisition functions can be found at ISE selects the next parameter to evaluate according to (6), which is a non convex optimization problem constrained in one of the variables.