Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space

Liwei Wang, Alexander Schwing, Svetlana Lazebnik

Neural Information Processing Systems 

This paper explores image caption generation using conditional variational auto-encoders (CV AEs).