Building a Taiwanese Mandarin Spoken Language Model: A First Attempt

Yang, Chih-Kai, Fu, Yu-Kuan, Li, Chen-An, Lin, Yi-Cheng, Lin, Yu-Xiang, Chen, Wei-Chih, Chung, Ho Lam, Kuan, Chun-Yi, Huang, Wei-Ping, Lu, Ke-Han, Lin, Tzu-Quan, Wang, Hsiu-Hsuan, Hu, En-Pei, Hsu, Chan-Jan, Tseng, Liang-Hsuan, Chiu, I-Hsiang, Sanga, Ulin, Chen, Xuanjun, Hsu, Po-chun, Yang, Shu-wen, Lee, Hung-yi

arXiv.org Artificial Intelligence 

This technical report presents our initial attempt to build a spoken large language model (LLM) for Taiwanese Mandarin, specifically tailored to enable real-time, speech-to-speech interaction in multi-turn conversations. Our end-to-end model incorporates a decoder-only transformer architecture and aims to achieve seamless interaction while preserving the conversational flow, including full-duplex capabilities allowing simultaneous speaking and listening. The paper also details the training process, including data preparation with synthesized dialogues and adjustments for real-time interaction. We also developed a platform to evaluate conversational fluency and response coherence in multi-turn dialogues. We hope the release of the report can contribute to the future development of spoken LLMs in Taiwanese Mandarin.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found