Quantitative Understanding of VAE by Interpreting ELBO as Rate Distortion Cost of Transform Coding
Variational autoencoder (VAE) estimates the posterior parameters (mean and variance) of latent variables corresponding to each input data. While it is used for many tasks, the transparency of the model is still an underlying issue. This paper provides a quantitative understanding of VAE property by interpreting VAE as a non-linearly scaled isometric embedding. According to the Rate-distortion theory, the optimal transform coding is achieved by using a PCA-like orthonormal transform where the transform space is isometric to the input. From this analogy, we show theoretically and experimentally that VAE can be mapped to an implicit isometric embedding with a scale factor derived from the posterior parameter. As a result, we can estimate the data probabilities in the input space from the prior, loss metrics, and corresponding posterior parameters. In addition, the quantitative importance of each latent variable can be evaluated like the eigenvalue of PCA. Variational autoencoder (VAE) (Kingma & Welling, 2014) is one of the most successful generative models, estimating posterior parameters of latent variables for each input data. In VAE, the latent representation is obtained by maximizing an evidence lower bound (ELBO). A number of studies (Higgins et al., 2017; Kim & Mnih, 2018; Lopez et al., 2018; Chen et al., 2018; Locatello et al., 2019; Rolínek et al., 2019) have tried to reveal the property of latent variables. To maximize ELBO, Alemi et al. (2018) analysed the rate-distortion (RD) tradeoff. However, the quantitative behavior of the latent space at the optimum RD tradeoff condition is still not clarified well. RD theory (Berger, 1971), which is successfully applied to image compression, formulates that a PCA-like orthonormal transform with uniform coding noise optimizes the RD tradeoff.
Oct-23-2020