Large Speech Model Enabled Semantic Communication
Tian, Yun, Qin, Zhijin, Lv, Guocheng, Jin, Ye, Huang, Kaibin, Han, Zhu
–arXiv.org Artificial Intelligence
Abstract--Existing speech semantic communication systems mainly based on Joint Source-Channel Coding (JSCC) architectures have demonstrated impressive performance, but their effectiveness remains limited by model structures specifically designed for particular tasks and datasets. Recent advances indicate that generative large models pre-trained on massive datasets, can achieve outstanding performance arexhibit exceptional performance across diverse downstream tasks with minimal fine-tuning. T o exploit the rich semantic knowledge embedded in large models and enable adaptive transmission over lossy channels, we propose a Large Speech Model enabled Semantic Communication (LargeSC) system. Simultaneously achieving adaptive compression and robust transmission over lossy channels remains challenging, requiring trade-offs among compression efficiency, speech quality, and latency. In this work, we employ the Mimi as a speech codec, converting speech into discrete tokens compatible with existing network architectures. We propose an adaptive controller module that enables adaptive transmission and in-band Unequal Error Protection (UEP), dynamically adjusting to both speech content and packet loss probability under bandwidth constraints. Additionally, we employ Low-Rank Adaptation (LoRA) to finetune the Moshi foundation model for generative recovery of lost speech tokens. Simulation results show that the proposed system supports bandwidths ranging from 550 bps to 2.06 kbps, outperforms conventional baselines in speech quality under high packet loss rates and achieves an end-to-end latency of approximately 460 ms, thereby demonstrating its potential for real-time deployment. Driven by recent advances in Artificial Intelligence (AI) and the increasing demand for intelligent next-generation communication systems, semantic communication has attracted significant attention. This work is supported by the National Key Research and Development Program of China under Grant No. 2023YFB2904300, the National Natural Science Foundation of China under Grant No. 62293484, and Beijing Natural Science Foundation (F251001). Zhijin Qin is with the Department of Electronic Engineering, Tsinghua University, Beijing 100084, China, andv with the State Key Laboratory of Space Network and Communications, Beijing, 100084, China. Kaibin Huang is with the Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong SAR, China (email: huangkb@hku.hk). Z. Han is with the Department of Electrical and Computer Engineering at the University of Houston, Houston, TX 77004 USA, and also with the Department of Computer Science and Engineering, Kyung Hee University, Seoul, South Korea, 446-701 (email: hanzhu22@gmail.com).
arXiv.org Artificial Intelligence
Dec-5-2025
- Country:
- Asia
- China
- Beijing > Beijing (0.65)
- Hong Kong (0.44)
- Hubei Province > Wuhan (0.04)
- India > Telangana
- Hyderabad (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Middle East > UAE
- Dubai Emirate > Dubai (0.04)
- Singapore (0.04)
- South Korea
- China
- Europe
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- France > Provence-Alpes-Côte d'Azur
- North America > United States
- California > Santa Barbara County
- Santa Barbara (0.04)
- Colorado > Denver County
- Denver (0.04)
- Idaho > Ada County
- Boise (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Maryland > Prince George's County
- College Park (0.14)
- Texas
- Harris County > Houston (0.34)
- Travis County > Austin (0.04)
- California > Santa Barbara County
- Oceania > Australia
- Queensland > Brisbane (0.04)
- Asia
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Information Technology (0.93)
- Telecommunications (0.59)
- Technology: