DialogCC: Large-Scale Multi-Modal Dialogue Dataset

Lee, Young-Jun, Ko, Byungsoo, Kim, Han-Gyu, Choi, Ho-Jin

Dec-8-2022–arXiv.org Artificial Intelligence

As sharing images in an instant message is a crucial factor, there has been active research on learning a image-text multi-modal dialogue model. However, training a well-generalized multi-modal dialogue model is challenging because existing multi-modal dialogue datasets contain a small number of data, limited topics, and a restricted variety of images per dialogue. In this paper, we present a multi-modal dialogue dataset creation pipeline that involves matching large-scale images to dialogues based on CLIP similarity. Using this automatic pipeline, we propose a large-scale multi-modal dialogue dataset, DialogCC, which covers diverse real-world topics and various images per dialogue. With extensive experiments, we demonstrate that training a multi-modal dialogue model with our dataset can improve generalization performance. Additionally, existing models trained with our dataset achieve state-of-the-art performance on image and text retrieval tasks. The source code and the dataset will be released after publication.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-8-2022

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Industry:
- Energy > Oil & Gas
  - Midstream (0.46)
- Materials > Chemicals
  - Commodity Chemicals > Petrochemicals
    - LNG (0.46)
  - Industrial Gases > Liquified Gas (0.46)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks (0.67)
    - Natural Language > Text Processing (0.67)
  - Sensing and Signal Processing > Image Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found