Variational Autoencoder for Deep Learning of Images, Labels and Captions