Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation

Park, Se Jin, Kim, Chae Won, Rha, Hyeongseop, Kim, Minsu, Hong, Joanna, Yeo, Jeong Hun, Ro, Yong Man

Jun-12-2024–arXiv.org Artificial Intelligence

In this paper, we introduce a novel Face-to-Face spoken dialogue model. It processes audio-visual speech from user input and generates audio-visual speech as the response, marking the initial step towards creating an avatar chatbot system without relying on intermediate text. To this end, we newly introduce MultiDialog, the first large-scale multimodal (i.e., audio and visual) spoken dialogue corpus containing 340 hours of approximately 9,000 dialogues, recorded based on the open domain dialogue dataset, TopicalChat. The MultiDialog contains parallel audio-visual recordings of conversation partners acting according to the given script with emotion annotations, which we expect to open up research opportunities in multimodal synthesis. Our Face-to-Face spoken dialogue model incorporates a textually pretrained large language model and adapts it into the audio-visual spoken dialogue domain by incorporating speech-text joint pretraining. Through extensive experiments, we validate the effectiveness of our model in facilitating a face-to-face conversation. Demo and data are available at https://multidialog.github.io and https://huggingface.co/datasets/IVLLab/MultiDialog, respectively.

arxiv preprint arxiv, dataset, speech token, (15 more...)

arXiv.org Artificial Intelligence

Jun-12-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > San Diego County > San Diego (0.04)
- Asia
  - Pakistan (0.04)
  - Kazakhstan (0.04)
  - Indonesia (0.04)
  - India (0.04)
  - Bangladesh (0.04)

Genre:
- Research Report (0.64)

Industry:
- Media > Television (0.67)
- Leisure & Entertainment > Sports
  - Football (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language > Discourse & Dialogue (1.00)
  - Vision > Face Recognition (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found