Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
Arora, Siddhant, Tian, Jinchuan, Futami, Hayato, Shi, Jiatong, Kashiwagi, Yosuke, Tsunoo, Emiru, Watanabe, Shinji
–arXiv.org Artificial Intelligence
Most end-to-end (E2E) spoken dialogue systems (SDS) rely on voice activity detection (V AD) for turn-taking, but V AD fails to distinguish between pauses and turn completions. Duplex SDS models address this by predicting output continuously, including silence tokens, thus removing the need for explicit V AD. However, they often have complex dual-channel architecture and lag behind cascaded models in semantic reasoning. To overcome these challenges, we propose SCoT: a Streaming Chain-of-Thought (CoT) framework for Duplex SDS, alternating between processing fixed-duration user input and generating responses in a blockwise manner. Using frame-level alignments, we create intermediate targets--aligned user transcripts and system responses--for each block. Experiments show that our approach produces more coherent and interpretable responses than existing duplex methods while supporting lower-latency and overlapping interactions compared to turn-by-turn systems.
arXiv.org Artificial Intelligence
Oct-3-2025
- Country:
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > Spain
- Basque Country > Biscay Province > Bilbao (0.04)
- North America > United States
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- New Mexico > Bernalillo County
- Asia > Japan
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (1.00)
- Discourse & Dialogue (1.00)
- Large Language Model (1.00)
- Speech > Speech Recognition (0.93)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence