Multimodal Generative Models for Scalable Weakly-Supervised Learning