Image Caption with Global-Local Attention

Li, Linghui (Key Lab of Intelligent Information Processing of Chinese Academy of Sciences) | Tang, Sheng (Key Lab of Intelligent Information Processing of Chinese Academy of Sciences) | Deng, Lixi (Key Lab of Intelligent Information Processing of Chinese Academy of Sciences) | Zhang, Yongdong (Key Lab of Intelligent Information Processing of Chinese Academy of Sciences) | Tian, Qi (University of Texas at San Antonio)

Feb-14-2017–AAAI Conferences

Image caption is becoming important in the field of artificial intelligence. Most existing methods based on CNN-RNN framework suffer from the problems of object missing and misprediction due to the mere use of global representation at image-level. To address these problems, in this paper, we propose a global-local attention (GLA) method by integrating local representation at object-level with global representation at image-level through attention mechanism. Thus, our proposed method can pay more attention to how to predict the salient objects more precisely with high recall while keeping context information at image-level cocurrently. Therefore, our proposed GLA method can generate more relevant sentences, and achieve the state-of-the-art performance on the well-known Microsoft COCO caption dataset with several popular metrics.

deep learning, information, neural network, (20 more...)

AAAI Conferences

Feb-14-2017

Conferences PDF

Add feedback

Country:
- Asia > China (0.15)
- North America > United States
  - Texas (0.14)

Genre:
- Research Report > Promising Solution (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks > Deep Learning (0.77)
      - Statistical Learning (1.00)
    - Natural Language (1.00)
    - Representation & Reasoning (1.00)
    - Vision (1.00)
  - Sensing and Signal Processing > Image Processing (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found