LSZone: A Lightweight Spatial Information Modeling Architecture for Real-time In-car Multi-zone Speech Separation

Chen, Jun, Hu, Shichao, Lin, Jiuxin, Li, Wenjie, Zhang, Zihan, Li, Xingchen, Liu, JinJiang, Xiao, Longshuai, Weng, Chao, Xie, Lei, Wu, Zhiyong

Oct-14-2025–arXiv.org Artificial Intelligence

In-car multi-zone speech separation, which captures voices from different speech zones, plays a crucial role in human-vehicle interaction. Although previous SpatialNet has achieved notable results, its high computational cost still hinders real-time applications in vehicles. To this end, this paper proposes LSZone, a lightweight spatial information modeling architecture for real-time in-car multi-zone speech separation. We design a spatial information extraction-compression (SpaIEC) module that combines Mel spectrogram and Interaural Phase Difference (IPD) to reduce computational burden while maintaining performance. Additionally, to efficiently model spatial information, we introduce an extremely lightweight Conv-GRU crossband-narrowband processing (CNP) module. Experimental results demonstrate that LSZone, with a complexity of 0.56G MACs and a real-time factor (RTF) of 0.37, delivers impressive performance in complex noise and multi-speaker scenarios.

machine learning, natural language, real time system, (19 more...)

arXiv.org Artificial Intelligence

Oct-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Guangdong Province > Shenzhen (0.04)
  - Shanghai > Shanghai (0.04)
  - Shaanxi Province > Xi'an (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology
  - Architecture > Real Time Systems (1.00)
  - Artificial Intelligence
    - Speech (1.00)
    - Representation & Reasoning > Spatial Reasoning (1.00)
    - Natural Language (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found