Elastic-InfoGAN: Unsupervised Disentangled Representation Learning in Imbalanced Data
Ojha, Utkarsh, Singh, Krishna Kumar, Hsieh, Cho-Jui, Lee, Yong Jae
E LASTIC-I NFOGAN: U NSUPERVISEDD ISENTANGLED R EPRESENTATIONL EARNING IN I MBALANCEDD ATA Utkarsh Ojha 1, Krishna Kumar Singh 1, Cho-Jui Hsieh 2, and Y ong Jae Lee 1 1 University of California, Davis 2 University of California, Los Angeles A BSTRACT We propose a novel unsupervised generative model, Elastic-InfoGAN, that learns to disentangle object identity from other low-level aspects in class-imbalanced datasets. We first investigate the issues surrounding the assumptions about uniformity made by InfoGAN (Chen et al. (2016)), and demonstrate its ineffectiveness to properly disentangle object identity in imbalanced data. Our key idea is to make the discovery of the discrete latent factor of variation invariant to identity-preserving transformations in real images, and use that as the signal to learn the latent distribution's parameters. Experiments on both artificial (MNIST) and real-world (Y ouTube-Faces) datasets demonstrate the effectiveness of our approach in imbalanced data by: (i) better disentanglement of object identity as a latent factor of variation; and (ii) better approximation of class imbalance in the data, as reflected in the learned parameters of the latent distribution. Recent deep neural network based models such as Generative Adversarial Networks (Goodfellow et al. (2014); Salimans et al. (2016); Radford et al. (2016)) and V ariational Autoen-coders (Kingma & Welling (2014); Higgins et al. (2017)) have led to promising results in generating realistic samples for high-dimensional and complex data such as images. More advanced models show how to discover disentangled representations (Y an et al. (2016); Chen et al. (2016); Tran et al. (2017); Hu et al. (2018); Singh et al. (2019)), in which different latent dimensions can be made to represent independent factors of variation (e.g., pose, identity) in the data (e.g., human faces). InfoGAN (Chen et al. (2016)) in particular, tries to learn an unsupervised disentangled representation by maximizing the mutual information between the discrete or continuous latent variables and the corresponding generated samples. For discrete latent factors (e.g., digit identities), it assumes that they are uniformly distributed in the data, and approximates them accordingly using a fixed uniform categorical distribution. Although this assumption holds true for many existing benchmark datasets (e.g., MNIST LeCun (1998)), real-word data often follows a long-tailed distribution and rarely exhibits perfect balance between the categories.
Oct-1-2019
- Country:
- North America > United States > California
- Los Angeles County > Los Angeles (0.24)
- Yolo County > Davis (0.24)
- North America > United States > California
- Genre:
- Research Report (0.82)
- Technology: