Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning

Guo, Dandan, Lu, Ruiying, Chen, Bo, Zeng, Zequn, Zhou, Mingyuan

May-10-2021–arXiv.org Machine Learning

Describing visual content in a natural-language utterance is an emerging interdisciplinary problem, which lies at the intersection of computer vision (CV) and natural language processing (NLP) ((1)). As a sentence-level short image caption ((2, 3, 4)) has a limited descriptive capacity, (5) introduce a paragraphlevel caption method that aims to generate a detailed and coherent paragraph for describing an image in a finer manner. Recent advances in image paragraph generation focus on building different types of hierarchical recurrent neural network (HRNN), e.g., LSTM ((6)), to generate the visual paragraphs. For HRNN, the high-level RNN recursively produces a sequence of sentence-level topic vectors given the image features as the input, while the low-level RNN is subsequently adopted to decode each topic vector into an output sentence. By modeling each sentence and coupling the sentences into one paragraph, these hierarchical architectures often outperform the flat models ((5)). To improve the performance and generate more diverse paragraphs, advanced methods, extending the HRNN based on generative adversarial network (GAN) ((7)) or variational auto-encoders (VAE) ((8)), are proposed by (9) and (10).

caption, paragraph, topic information, (16 more...)

arXiv.org Machine Learning

May-10-2021

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Maryland > Baltimore (0.14)
    - Texas > Travis County
      - Austin (0.04)
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - Massachusetts > Suffolk County
      - Boston (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Italy > Veneto
    - Venice (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (1.00)

Industry:
- Leisure & Entertainment > Sports > Tennis (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found