Compositional Generalization in Image Captioning

Nikolaus, Mitja, Abdou, Mostafa, Lamm, Matthew, Aralikatte, Rahul, Elliott, Desmond

Sep-16-2019–arXiv.org Machine Learning

Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts. We study the problem of compositional generalization, which measures how well a model composes unseen combinations of concepts when describing images. State-of-the-art image captioning models show poor generalization performance on this task. We propose a multi-task model to address the poor performance, that combines caption generation and image--sentence ranking, and uses a decoding mechanism that re-ranks the captions according their similarity to the image. This model is substantially better at generalizing to unseen combinations of concepts compared to state-of-the-art captioning models.

caption, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

Sep-16-2019

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.46)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found