Goto

Collaborating Authors

 lemma 7


A Proof of Theorems

Neural Information Processing Systems

We still need to demonstrate that the properties in P AC-Bayes analysis hold for both the margin operator and the robust margin operator. Then we complete the proof of Lemma 6.1. The proof of Lemma 7.1 and 7.2 is similar. We provide the proof of Lemma 7.2 below. Lemma 7.1 follows the proof of Lemma 7.2 by replacing the robust margin operator by the margin Since the above bound holds for any x in the domain X, we can get the following a.s.: R The second inequality is the tail bound above.



Sampling from multimodal distributions with warm starts: Non-asymptotic bounds for the Reweighted Annealed Leap-Point Sampler

Lee, Holden, Santana-Gijzen, Matheau

arXiv.org Machine Learning

Sampling from multimodal distributions is a central challenge in Bayesian inference and machine learning. In light of hardness results for sampling -- classical MCMC methods, even with tempering, can suffer from exponential mixing times -- a natural question is how to leverage additional information, such as a warm start point for each mode, to enable faster mixing across modes. To address this, we introduce Reweighted ALPS (Re-ALPS), a modified version of the Annealed Leap-Point Sampler (ALPS) that dispenses with the Gaussian approximation assumption. We prove the first polynomial-time bound that works in a general setting, under a natural assumption that each component contains significant mass relative to the others when tilted towards the corresponding warm start point. Similarly to ALPS, we define distributions tilted towards a mixture centered at the warm start points, and at the coldest level, use teleportation between warm start points to enable efficient mixing across modes. In contrast to ALPS, our method does not require Hessian information at the modes, but instead estimates component partition functions via Monte Carlo. This additional estimation step is crucial in allowing the algorithm to handle target distributions with more complex geometries besides approximate Gaussian. For the proof, we show convergence results for Markov processes when only part of the stationary distribution is well-mixing and estimation for partition functions for individual components of a mixture. We numerically evaluate our algorithm's mixing performance compared to ALPS on a mixture of heavy-tailed distributions.


A Bayesian approach to learning mixtures of nonparametric components

Zhang, Yilei, Wei, Yun, Guha, Aritra, Nguyen, XuanLong

arXiv.org Machine Learning

Mixture models are widely used in modeling heterogeneous data populations. A standard approach of mixture modeling is to assume that the mixture component takes a parametric kernel form, while the flexibility of the model can be obtained by using a large or possibly unbounded number of such parametric kernels. In many applications, making parametric assumptions on the latent subpopulation distributions may be unrealistic, which motivates the need for nonparametric modeling of the mixture components themselves. In this paper we study finite mixtures with nonparametric mixture components, using a Bayesian nonparametric modeling approach. In particular, it is assumed that the data population is generated according to a finite mixture of latent component distributions, where each component is endowed with a Bayesian nonparametric prior such as the Dirichlet process mixture. We present conditions under which the individual mixture component's distributions can be identified, and establish posterior contraction behavior for the data population's density, as well as densities of the latent mixture components. We develop an efficient MCMC algorithm for posterior inference and demonstrate via simulation studies and real-world data illustrations that it is possible to efficiently learn complex distributions for the latent subpopulations. In theory, the posterior contraction rate of the component densities is nearly polynomial, which is a significant improvement over the logarithm convergence rate of estimating mixing measures via deconvolution.


Dimension-free error estimate for diffusion model and optimal scheduling

de Bortoli, Valentin, Elie, Romuald, Kazeykina, Anna, Ren, Zhenjie, Zhang, Jiacheng

arXiv.org Machine Learning

Diffusion generative models have emerged as powerful tools for producing synthetic data from an empirically observed distribution. A common approach involves simulating the time-reversal of an Ornstein-Uhlenbeck (OU) process initialized at the true data distribution. Since the score function associated with the OU process is typically unknown, it is approximated using a trained neural network. This approximation, along with finite time simulation, time discretization and statistical approximation, introduce several sources of error whose impact on the generated samples must be carefully understood. Previous analyses have quantified the error between the generated and the true data distributions in terms of Wasserstein distance or Kullback-Leibler (KL) divergence. However, both metrics present limitations: KL divergence requires absolute continuity between distributions, while Wasserstein distance, though more general, leads to error bounds that scale poorly with dimension, rendering them impractical in high-dimensional settings. In this work, we derive an explicit, dimension-free bound on the discrepancy between the generated and the true data distributions. The bound is expressed in terms of a smooth test functional with bounded first and second derivatives. The key novelty lies in the use of this weaker, functional metric to obtain dimension-independent guarantees, at the cost of higher regularity on the test functions. As an application, we formulate and solve a variational problem to minimize the time-discretization error, leading to the derivation of an optimal time-scheduling strategy for the reverse-time diffusion. Interestingly, this scheduler has appeared previously in the literature in a different context; our analysis provides a new justification for its optimality, now grounded in minimizing the discretization bias in generative sampling.






Supplementary Material Contextual Games: Multi-Agent Learning with Side Information Pier Giuseppe Sessa, Ilija Bogunovic, Andreas Krause, Maryam Kamgarpour (NeurIPS 2020)

Neural Information Processing Systems

The theoretical guarantees obtained in Section 3 rely on the following two main lemmas. 's be the distributions computed using the MW rule: p Similarly to Appendix A.1, we let As it was done in proof of Theorems 1 and Appendix A.1, Also, we now explicitly consider the adaptiveness of the adversary. The second equality follows by the law of total expectation. Consider a contextual game and assume contexts are sampled i.i.d. Hoeffding's inequality [21] shows that for any > 0 P E In this section we describe the experimental setup of the contextual traffic routing game of Section 5.