Keller, T. Anderson, Peters, Jorn W. T., Jaini, Priyank, Hoogeboom, Emiel, Forré, Patrick, Welling, Max

Efficient gradient computation of the Jacobian determinant term is a core problem of the normalizing flow framework. Thus, most proposed flow models either restrict to a function class with easy evaluation of the Jacobian determinant, or an efficient estimator thereof. However, these restrictions limit the performance of such density models, frequently requiring significant depth to reach desired performance levels. In this work, we propose Self Normalizing Flows, a flexible framework for training normalizing flows by replacing expensive terms in the gradient by learned approximate inverses at each layer. This reduces the computational complexity of each layer's exact update from $\mathcal{O}(D^3)$ to $\mathcal{O}(D^2)$, allowing for the training of flow architectures which were otherwise computationally infeasible, while also providing efficient sampling. We show experimentally that such models are remarkably stable and optimize to similar data likelihood values as their exact gradient counterparts, while surpassing the performance of their functionally constrained counterparts.

Kobyzev, Ivan, Prince, Simon, Brubaker, Marcus A.

Normalizing Flows are generative models which produce tractable distributions where both sampling and density evaluation can be efficient and exact. The goal of this survey article is to give a coherent and comprehensive review of the literature around the construction and use of Normalizing Flows for distribution learning. We aim to provide context and explanation of the models, review current state-of-the-art literature, and identify open questions and promising future directions.

Bose, Avishek Joey, Smofsky, Ariella, Liao, Renjie, Panangaden, Prakash, Hamilton, William L.

The choice of approximate posterior distributions plays a central role in stochastic variational inference (SVI). One effective solution is the use of normalizing flows \cut{defined on Euclidean spaces} to construct flexible posterior distributions. However, one key limitation of existing normalizing flows is that they are restricted to the Euclidean space and are ill-equipped to model data with an underlying hierarchical structure. To address this fundamental limitation, we present the first extension of normalizing flows to hyperbolic spaces. We first elevate normalizing flows to hyperbolic spaces using coupling transforms defined on the tangent bundle, termed Tangent Coupling ($\mathcal{TC}$). We further introduce Wrapped Hyperboloid Coupling ($\mathcal{W}\mathbb{H}C$), a fully invertible and learnable transformation that explicitly utilizes the geometric structure of hyperbolic spaces, allowing for expressive posteriors while being efficient to sample from. We demonstrate the efficacy of our novel normalizing flow over hyperbolic VAEs and Euclidean normalizing flows. Our approach achieves improved performance on density estimation, as well as reconstruction of real-world graph data, which exhibit a hierarchical structure. Finally, we show that our approach can be used to power a generative model over hierarchical data using hyperbolic latent variables.

Normalizing flows learn a diffeomorphic mapping between the target and base distribution, while the Jacobian determinant of that mapping forms another real-valued function. In this paper, we show that the Jacobian determinant mapping is unique for the given distributions, hence the likelihood objective of flows has a unique global optimum. In particular, the likelihood for a class of flows is explicitly expressed by the eigenvalues of the auto-correlation matrix of individual data point, and independent of the parameterization of neural network, which provides a theoretical optimal value of likelihood objective and relates to probabilistic PCA. Additionally, Jacobian determinant is a measure of local volume change and is maximized when MLE is used for optimization. To stabilize normalizing flows training, it is required to maintain a balance between the expansiveness and contraction of volume, meaning Lipschitz constraint on the diffeomorphic mapping and its inverse. With these theoretical results, several principles of designing normalizing flow were proposed. And numerical experiments on highdimensional datasets (such as CelebA-HQ 1024x1024) were conducted to show the improved stability of training.