QUADS: QUAntized Distillation Framework for Efficient Speech Language Understanding

Biswas, Subrata, Khan, Mohammad Nur Hossain, Islam, Bashima

Aug-18-2025–arXiv.org Artificial Intelligence

Spoken Language Understanding (SLU) systems must balance performance and efficiency, particularly in resource-constrained environments. Existing methods apply distillation and quantization separately, leading to suboptimal compression as distillation ignores quantization constraints. We propose QUADS, a unified framework that optimizes both through multi-stage training with a pre-tuned model, enhancing adaptability to low-bit regimes while maintaining accuracy. QUADS achieves 71.13\% accuracy on SLURP and 99.20\% on FSC, with only minor degradations of up to 5.56\% compared to state-of-the-art models. Additionally, it reduces computational complexity by 60--73$\times$ (GMACs) and model size by 83--700$\times$, demonstrating strong robustness under extreme quantization. These results establish QUADS as a highly efficient solution for real-world, resource-constrained SLU applications.

artificial intelligence, natural language, quantization, (16 more...)

arXiv.org Artificial Intelligence

Aug-18-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Industry:
- Information Technology (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found