Exploiting Cross-Modal Prediction and Relation Consistency for Semi-Supervised Image Captioning

Open in new window