Goto

Collaborating Authors

 posterior mean



A Flexible Empirical Bayes Approach to Generalized Linear Models, with Applications to Sparse Logistic Regression

Xie, Dongyue, Zhu, Wanrong, Stephens, Matthew

arXiv.org Machine Learning

We introduce a flexible empirical Bayes approach for fitting Bayesian generalized linear models. Specifically, we adopt a novel mean-field variational inference (VI) method and the prior is estimated within the VI algorithm, making the method tuning-free. Unlike traditional VI methods that optimize the posterior density function, our approach directly optimizes the posterior mean and prior parameters. This formulation reduces the number of parameters to optimize and enables the use of scalable algorithms such as L-BFGS and stochastic gradient descent. Furthermore, our method automatically determines the optimal posterior based on the prior and likelihood, distinguishing it from existing VI methods that often assume a Gaussian variational. Our approach represents a unified framework applicable to a wide range of exponential family distributions, removing the need to develop unique VI methods for each combination of likelihood and prior distributions. We apply the framework to solve sparse logistic regression and demonstrate the superior predictive performance of our method in extensive numerical studies, by comparing it to prevalent sparse logistic regression approaches.


Physics-informed Gaussian Process Regression in Solving Eigenvalue Problem of Linear Operators

Bai, Tianming, Yang, Jiannan

arXiv.org Machine Learning

Applying Physics-Informed Gaussian Process Regression to the eigenvalue problem $(\mathcal{L}-λ)u = 0$ poses a fundamental challenge, where the null source term results in a trivial predictive mean and a degenerate marginal likelihood. Drawing inspiration from system identification, we construct a transfer function-type indicator for the unknown eigenvalue/eigenfunction using the physics-informed Gaussian Process posterior. We demonstrate that the posterior covariance is only non-trivial when $λ$ corresponds to an eigenvalue of the partial differential operator $\mathcal{L}$, reflecting the existence of a non-trivial eigenspace, and any sample from the posterior lies in the eigenspace of the linear operator. We demonstrate the effectiveness of the proposed approach through several numerical examples with both linear and non-linear eigenvalue problems.


Posterior and Computational Uncertainty in Gaussian Processes

Neural Information Processing Systems

Gaussian processes scale prohibitively with the size of the dataset. In response, many approximation methods have been developed, which inevitably introduce approximation error. This additional source of uncertainty, due to limited computation, is entirely ignored when using the approximate posterior. Therefore in practice, GP models are often as much about the approximation method as they are about the data. Here, we develop a new class of methods that provides consistent estimation of the combined uncertainty arising from both the finite number of data observed and the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. For any method in this class, we prove (i) convergence of its posterior mean in the associated RKHS, (ii) decomposability of its combined posterior covariance into mathematical and computational covariances, and (iii) that the combined variance is a tight worst-case bound for the squared error between the method's posterior mean and the latent function. Finally, we empirically demonstrate the consequences of ignoring computational uncertainty and show how implicitly modeling it improves generalization performance on benchmark datasets.


Active Set Ordering

Neural Information Processing Systems

In this paper, we formalize the active set ordering problem, which involves actively discovering a set of inputs based on their orderings determined by expensive evaluations of a blackbox function.






A Proofs

Neural Information Processing Systems

This appendix contains the proofs of the results found in Section 4. We start by introducing a useful The claim follows then directly from (4) and the definition of mutual information.Lemma 2. We then can compute the derivative and ask under which conditions it is non negative. The function b defined in (18) is monotonically increasing for positive arguments. Finally, let us fix ε > 0. Combining Lemmas 7 and 8, we obtain: b( σ The following result makes this statement precise. The following lemma makes this statement precise. In this Appendix, we collect details about the experiment presented in Section 6. Code for the used acquisition functions can be found at ISE selects the next parameter to evaluate according to (6), which is a non convex optimization problem constrained in one of the variables.