Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time

Open in new window