Goto

Collaborating Authors

 realnvp





CooperativeDistributionAlignment viaJSDUpperBound

Neural Information Processing Systems

Unsupervised distribution alignment estimates a transformation that maps two or more source distributions to ashared aligned distribution given only samples from each distribution. This task has many applications including generative modeling, unsupervised domain adaptation, and socially aware learning.



Adaptive Heterogeneous Mixtures of Normalising Flows for Robust Variational Inference

Wiriyapong, Benjamin, Karakuş, Oktay, Sidorov, Kirill

arXiv.org Machine Learning

Normalising-flow variational inference (VI) can approximate complex posteriors, yet single-flow models often behave inconsistently across qualitatively different distributions. We propose Adaptive Mixture Flow Variational Inference (AMF-VI), a heterogeneous mixture of complementary flows (MAF, Re-alNVP, RBIG) trained in two stages: (i) sequential expert training of individual flows, and (ii) adaptive global weight estimation via likelihood-driven updates, without per-sample gating or architectural changes. Evaluated on six canonical posterior families of banana, X-shape, two-moons, rings, a bimodal, and a five-mode mixture, AMF-VI achieves consistently lower negative log-likelihood than each single-flow baseline and delivers stable gains in transport metrics (Wasserstein-2) and maximum mean discrepancy (MDD), indicating improved robustness across shapes and modalities. The procedure is efficient and architecture-agnostic, incurring minimal overhead relative to standard flow training, and demonstrates that adaptive mixtures of diverse flows provide a reliable route to robust VI across diverse posterior families whilst preserving each expert's inductive bias.





Review for NeurIPS paper: Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization

Neural Information Processing Systems

Weaknesses: Any empirical comparison is going to have the flaw of being insufficiently exhaustive, and this one is no exception. For example: - ADVI as implemented in Stan encompasses both full-covariance and diagonal Gaussian surrogates, but this paper evaluates only one of those, and it wasn't even clear which one until quite far in (line 297). This should be clarified earlier. Ideally it would be nice to see the relative performance of both Gaussian baselines (and perhaps other commonly-suggested schemes like a diagonal low rank covariance). Was RealNVP chosen because it supports sticking-the-landing? It would be useful to see a side-by-side comparison against a similar-size IAF without sticking-the-landing. - A simple method not included (maybe because it's so simple that no one has published on it for VI recently) is Polyak-Ruppert averaging, i.e., averaging the variational parameters over the final steps of stochastic optimization.