Reviews: Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
–Neural Information Processing Systems
This paper investigated the task of image-conditioned caption generation using deep generative models. Compared to existing methods with pure LSTM pipeline, the proposed approach augments the representation with an additional data dependent latent variable. This paper formulated the problem under variational auto-encoder (VAE) framework by maximizing the variational lowerbound as objective during training. A data-dependent additive Gaussian prior was introduced to address the issue of limited representation power when applying VAEs to caption generation. Empirical results demonstrate the proposed method is able to generate diverse and accurate sentences compared to pure LSTM baseline.
Neural Information Processing Systems
May-28-2025, 01:05:04 GMT
- Technology: