Goto

Collaborating Authors

 automodulator




Supplementary Material for Deep Automodulators

Neural Information Processing Systems

We now proceed to modulate the mean and variance of each of those 512 maps separately, so that each map has two scalar values for that purpose.



Deep Automodulators

Heljakka, Ari, Hou, Yuxin, Kannala, Juho, Solin, Arno

arXiv.org Machine Learning

We introduce a novel autoencoder model that deviates from traditional autoen-coders by using the full latent vector to independently modulate each layer in the decoder. We demonstrate how such an'automodulator' allows for a principled approach to enforce latent space disentanglement, mixing of latent codes, and a straightforward way to utilize prior information that can be construed as a scale-specific invariance. Unlike the GAN models without encoders, autoencoder models can directly operate on new real input samples. This makes our model directly suitable for applications involving real-world inputs. As the architectural backbone, we extend recent generative autoencoder models that retain input identity and image sharpness at high resolutions better than V AEs. We show that our model achieves state-of-the-art latent space disentanglement and achieves high quality and diversity of output samples, as well as faithfulness of reconstructions. This paper introduces a new generative autoencoder for learning representations of image data sets, in a way that allows arbitrary combinations of latent codes to generate images (see Figure 1). We achieve this with an architecture that uses adaptive instance normalization (AdaIn, Dumoulin et al., 2017b; Huang & Belongie, 2017), and training methods that let the model learn a highly disentangled latent space by utilizing progressively growing autoencoders (Heljakka et al., 2019). In a typical autoencoder, input images are encoded into latent space, and the information of the latent variables is then passed through successive layers of decoding until a reconstruction of the input image has been formed. In our model, the latent vector independently modulates the statistics of each layer of the decoder--the output of layer n is no longer solely determined by the input from layer n 1. In image generation, the probability mass representing sensible images (such as human faces) lies concentrated on a low-dimensional manifold.