Uncertainty
during learning, numerical precision reduction and for finding the Pareto optimal set of configurations apply directly
We would like to thank the reviewers for their thoughtful comments and valuable suggestions. We will clarify this point in the paper. Our algorithms are agnostic to the leaf distributions used. Thanks for this valuable feedback, we will improve the pseudocode as you suggest. As such, there is memory overhead but no computational overhead.
Copula-like Variational Inference
Marcel Hirt, Petros Dellaportas, Alain Durmus
This paper considers a new family of variational distributions motivated by Sklar's theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently, i.e. with a complexity linear in the dimension d of the state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity O (d log d) . We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.