Reviews: Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

May-28-2025, 01:05:04 GMT–Neural Information Processing Systems

This paper investigated the task of image-conditioned caption generation using deep generative models. Compared to existing methods with pure LSTM pipeline, the proposed approach augments the representation with an additional data dependent latent variable. This paper formulated the problem under variational auto-encoder (VAE) framework by maximizing the variational lowerbound as objective during training. A data-dependent additive Gaussian prior was introduced to address the issue of limited representation power when applying VAEs to caption generation. Empirical results demonstrate the proposed method is able to generate diverse and accurate sentences compared to pure LSTM baseline.

additive gaussian encoding space, diverse and accurate image description, variational auto-encoder, (9 more...)

Neural Information Processing Systems

May-28-2025, 01:05:04 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)