Reviews: Image Captioning: Transforming Objects into Words

Jan-24-2025, 09:25:41 GMT–Neural Information Processing Systems

Summary - The proposed approach to image captioning extends two prior works, object-based Up-Down method of [2] and Transformer of [22] (already used for image captioning in [21]). Specifically, the authors integrate spatial relations between objects in the captioning Transformer model, proposing the Object Relation Transformer. The modification amounts to introducing an object relation module [9] into the encoding layer of the Transformer model. Tests of statistical significance show that the proposed model outperforms the standard Transformer in terms of CIDEr-D, BLEU-1 and ROUGE-L, while SPICE-attribute breakdown shows improvement for Relation and Count categories. Qualitative results include examples where Object Relation Transformer leads to more correct spatial Relation and Count predictions.

human evaluation, transformer, transforming object, (11 more...)

Neural Information Processing Systems

Jan-24-2025, 09:25:41 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Vision (1.00)