Reviews: Contrastive Learning for Image Captioning
–Neural Information Processing Systems
The paper proposed a contrastive learning approach for image captioning models. Typical image captioning models utilize log-likelihood criteria for learning, which tends to result in preferring a safer generation that lacks specific and distinct concept in an image. The paper proposes to introduce contrastive learning objective, where the objective function is based on density ratio to the reference, without altering the captioning models. The paper evaluates multiple models in MSCOCO and InstaPIC datasets, and demonstrates the effectiveness as well as conducts ablation studies. The paper is well-written and has strength in the following points.
Neural Information Processing Systems
Oct-7-2024, 20:28:14 GMT