Improving Image Captioning by Mimicking Human Reformulation Feedback at Inference-time