Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission

Oh, Seungeun, Kim, Jinhyuk, Park, Jihong, Ko, Seung-Woo, Choi, Jinho, Quek, Tony Q. S., Kim, Seong-Lyun

May-20-2025–arXiv.org Artificial Intelligence

--T o support emerging language-based applications using dispersed and heterogeneous computing resources, the hybrid language model (HLM) offers a promising architecture, where an on-device small language model (SLM) generates draft tokens that are validated and corrected by a remote large language model (LLM). However, the original HLM suffers from substantial communication overhead, as the LLM requires the SLM to upload the full vocabulary distribution for each token. Moreover, both communication and computation resources are wasted when the LLM validates tokens that are highly likely to be accepted. T o overcome these limitations, we propose communication-efficient and uncertainty-aware HLM (CU-HLM) . In CU-HLM, the SLM transmits truncated vocabulary distributions only when its output uncertainty is high. Furthermore, we theoretically derive optimal uncertainty thresholds and optimal vocabulary truncation strategies. Simulation results show that, compared to standard HLM, CU-HLM achieves up to 206 higher token throughput by skipping 74.8% transmissions with 97.4% vocabulary compression, while maintaining 97.4% accuracy. ARGE language models (LLMs), with their massive parameter counts and rich training data, have demonstrated remarkable emergent capabilities [1]. These capabilities span a wide range of applications, including open-domain question answering, code generation, commonsense reasoning, and even robotic control [2]-[6]. To seamlessly adopt LLMs into a wireless edge environment, the hybrid language model (HLM) framework [7] has emerged, which physically splits the inference task between an on-device small language model (SLM) and a remote LLM.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

May-20-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- Oceania > Australia
  - South Australia > Adelaide (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found