Review for NeurIPS paper: Advances in Black-Box VI: Normalizing Flows, Importance Weighting, and Optimization
–Neural Information Processing Systems
Weaknesses: Any empirical comparison is going to have the flaw of being insufficiently exhaustive, and this one is no exception. For example: - ADVI as implemented in Stan encompasses both full-covariance and diagonal Gaussian surrogates, but this paper evaluates only one of those, and it wasn't even clear which one until quite far in (line 297). This should be clarified earlier. Ideally it would be nice to see the relative performance of both Gaussian baselines (and perhaps other commonly-suggested schemes like a diagonal low rank covariance). Was RealNVP chosen because it supports sticking-the-landing? It would be useful to see a side-by-side comparison against a similar-size IAF without sticking-the-landing. - A simple method not included (maybe because it's so simple that no one has published on it for VI recently) is Polyak-Ruppert averaging, i.e., averaging the variational parameters over the final steps of stochastic optimization.
Neural Information Processing Systems
Feb-6-2025, 01:24:51 GMT