From Association to Generation: Text-only Captioning by Unsupervised Cross-modal Mapping

Open in new window