Reviews: Teaching Machines to Describe Images with Natural Language Feedback
–Neural Information Processing Systems
The paper presents an approach for automatically captioning images where the model also incorporates natural language feedback from humans along with ground truth captions during training. The proposed approach uses reinforcement learning to train a phrase based captioning model where the model is first trained using maximum likelihood training (supervised learning) and then further finetuned using reinforcement learning where the reward is weighted sum of BLEU scores w.r.t to the ground truth and the feedback sentences provided by humans. The reward also consists of phrase level rewards obtained by using the human feedback. The proposed model is trained and evaluated on MSCOCO image caption data. The proposed model is compared with a pure supervised learning (SL) model, a model trained using reinforcement learning (RL) without any feedback.
Neural Information Processing Systems
Oct-8-2024, 03:48:46 GMT
- Technology: