gradient boosted normalizing flow
Gradient Boosted Normalizing Flows
By chaining a sequence of differentiable invertible transformations, normalizing flows (NF) provide an expressive method of posterior approximation, exact density evaluation, and sampling. The trend in normalizing flow literature has been to devise deeper, more complex transformations to achieve greater flexibility. We propose an alternative: Gradient Boosted Normalizing Flows (GBNF) model a density by successively adding new NF components with gradient boosting. Under the boosting framework, each new NF component optimizes a weighted likelihood objective, resulting in new components that are fit to the suitable residuals of the previously trained components. The GBNF formulation results in a mixture model structure, whose flexibility increases as more components are added. Moreover, GBNFs offer a wider, as opposed to strictly deeper, approach that improves existing NFs at the cost of additional training---not more complex transformations. We demonstrate the effectiveness of this technique for density estimation and, by coupling GBNF with a variational autoencoder, generative modeling of images. Our results show that GBNFs outperform their non-boosted analog, and, in some cases, produce better results with smaller, simpler flows.
Review for NeurIPS paper: Gradient Boosted Normalizing Flows
Weaknesses: No increase in theoretical flow expressivity: Unlike traditional boosting in which an ensemble of weak learners is provably more expressive, the paper doesn't provide such a proof for the proposed NF boosting procedure. Moreover, I conjecture that this methodology (in the general case, under NN / polynomial universal approximation assumptions) *cannot* build an ensemble that is more expressive than a single constituent component. There are two bottlenecks in NF expressivity---the base distribution and the class of transformation function [Papamakarios et al., 2019]---and the proposed method does not fundamentally change either of these. For example, the base distribution is simple and shared across all components (line 99). Recent work that does improve flow expressivity must use mixture formulations [Papamakarios et al., 2019] (discrete [Dinh et al., 2019] or continuous [Cornish et al., 2020] indices) whose base distribution (or support) and transformation change according to the index.
Review for NeurIPS paper: Gradient Boosted Normalizing Flows
The paper describes a way to create mixtures of normalizing-flow models using gradient boosting. Combining several simple flow models is an alternative to increasing the capacity of a single model, and is worth exploring. One of the main concerns the reviewers expressed is that of limited novelty, in that the proposed method is largely an application and continuation of existing techniques. However, the reviewers agree that the paper is well written, well executed, that although the idea is incremental there are still things to be said about applying gradient boosting to flows, and that the experiments are well done. For these reason, I'm happy to recommend acceptance of the paper.
Gradient Boosted Normalizing Flows
By chaining a sequence of differentiable invertible transformations, normalizing flows (NF) provide an expressive method of posterior approximation, exact density evaluation, and sampling. The trend in normalizing flow literature has been to devise deeper, more complex transformations to achieve greater flexibility. We propose an alternative: Gradient Boosted Normalizing Flows (GBNF) model a density by successively adding new NF components with gradient boosting. Under the boosting framework, each new NF component optimizes a weighted likelihood objective, resulting in new components that are fit to the suitable residuals of the previously trained components. The GBNF formulation results in a mixture model structure, whose flexibility increases as more components are added.