Goto

Collaborating Authors

 Nakagawa, Akira


Toward Unlimited Self-Learning MCMC with Parallel Adaptive Annealing

arXiv.org Machine Learning

Its efficiency strongly depends on the choice of the proposal. Self-learning Monte Carlo (SLMC) methods are recently proposed to accelerate Markov chain Among recent advances in machine learning, a general Monte Carlo (MCMC) methods using a machine method called the self-learning Monte Carlo (SLMC) learning model. With latent generative models, method (Liu et al., 2017) was introduced to accelerate SLMC methods realize efficient Monte Carlo updates MCMC simulations by an automated proposal with a machine with less autocorrelation. However, SLMC learning model and has been applied to various problems methods are difficult to directly apply to multimodal (Xu et al., 2017; Shen et al., 2018). In particular, distributions for which training data are a latent generative model realizes efficient global update difficult to obtain. To solve the limitation, we propose through the obtained information-rich latent representation "parallel adaptive annealing," which makes (Huang & Wang, 2017; Albergo et al., 2019; Monroe & SLMC methods directly apply to multimodal distributions Shen, 2022; Tanaka & Tomiya, 2017). Although powerful, with a gradually trained proposal while the performance of SLMC simulation strongly depends annealing target distribution. Parallel adaptive annealing on an automated proposal with machine learning models is based on (i) sequential learning with and the quality of training data to train the proposal. For annealing to inherit and update the model parameters, example, it is challenging to directly use SLMC for multimodal (ii) adaptive annealing to automatically detect distributions because obtaining accurate training data under-learning, and (iii) parallel annealing to covering all modes is difficult.


Quantitative Understanding of VAE by Interpreting ELBO as Rate Distortion Cost of Transform Coding

arXiv.org Machine Learning

Variational autoencoder (VAE) estimates the posterior parameters (mean and variance) of latent variables corresponding to each input data. While it is used for many tasks, the transparency of the model is still an underlying issue. This paper provides a quantitative understanding of VAE property by interpreting VAE as a non-linearly scaled isometric embedding. According to the Rate-distortion theory, the optimal transform coding is achieved by using a PCA-like orthonormal transform where the transform space is isometric to the input. From this analogy, we show theoretically and experimentally that VAE can be mapped to an implicit isometric embedding with a scale factor derived from the posterior parameter. As a result, we can estimate the data probabilities in the input space from the prior, loss metrics, and corresponding posterior parameters. In addition, the quantitative importance of each latent variable can be evaluated like the eigenvalue of PCA. Variational autoencoder (VAE) (Kingma & Welling, 2014) is one of the most successful generative models, estimating posterior parameters of latent variables for each input data. In VAE, the latent representation is obtained by maximizing an evidence lower bound (ELBO). A number of studies (Higgins et al., 2017; Kim & Mnih, 2018; Lopez et al., 2018; Chen et al., 2018; Locatello et al., 2019; Rolínek et al., 2019) have tried to reveal the property of latent variables. To maximize ELBO, Alemi et al. (2018) analysed the rate-distortion (RD) tradeoff. However, the quantitative behavior of the latent space at the optimum RD tradeoff condition is still not clarified well. RD theory (Berger, 1971), which is successfully applied to image compression, formulates that a PCA-like orthonormal transform with uniform coding noise optimizes the RD tradeoff.


Rate-Distortion Optimization Guided Autoencoder for Generative Approach with quantitatively measurable latent space

arXiv.org Machine Learning

A BSTRACT In the generative model approach of machine learning, it is essential to acquire an accurate probabilistic model and compress the dimension of data for easy treatment. However, in the conventional deep-autoencoder based generative model such as V AE, the probability of the real space cannot be obtained correctly from that of in the latent space, because the scaling between both spaces is not controlled. This has also been an obstacle to quantifying the impact of the variation of latent variables on data. In this paper, we propose Rate-Distortion Optimization guided autoencoder, in which the Jacobi matrix from real space to latent space has orthonormality. It is proved theoretically and experimentally that (i) the probability distribution of the latent space obtained by this model is proportional to the probability distribution of the real space because Jacobian between two spaces is constant; (ii) our model behaves as nonlinear PCA, where energy of acquired latent space is concentrated on several principal components and the influence of each component can be evaluated quantitatively. Furthermore, to verify the usefulness on the practical application, we evaluate its performance in unsupervised anomaly detection and it outperforms current state-of-the-art methods. 1 I NTRODUCTION Capturing the inherent features of a dataset from high-dimensional and complex data is an essential issue in machine learning. Generative model approach learns the probability distribution of data, aiming at data generation by probabilistic sampling, unsupervised/weakly supervised learning, and acquiring meta-prior (general assumptions about how data can be summarized naturally, such as disentangle, clustering, and hierarchical structure (Bengio et al., 2013; Tschannen et al., 2019)). It is generally difficult to directly estimate a probability density function(PDF) Px (x) of real data x. Accordingly, one promising approach is to map to the latent space z with reduced dimension and capture PDF Pz (z) . In recent years, deep autoencoder based methods have made it possible to compress dimensions and derive latent variables. While there is remarkable progress in these areas (van den Oord et al., 2017; Kingma et al., 2014; Jiang et al., 2016), the relation between x and z in the current deep generative models is still not clear. V AE (P .Kingma & Welling, 2014) is one of the most successful generative models for capturing latent representation. In V AE, lower bound of log-likelihood of Px (x) is introduced as ELBO. Then latent variable is obtained by maximizing ELBO.