Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations