BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing

Kawamura, Masaya, Hasumi, Takuya, Shirahata, Yuma, Yamamoto, Ryuichi

Jun-5-2025–arXiv.org Artificial Intelligence

This paper proposes a highly compact, lightweight text-to-speech (TTS) model for on-device applications. To reduce the model size, the proposed model introduces two techniques. First, we introduce quantization-aware training (QA T), which quantizes model parameters during training to as low as 1.58-bit. In this case, most of 32-bit model parameters are quantized to ternary values {-1, 0, 1 } . Second, we propose a method named weight indexing. In this method, we save a group of 1.58-bit weights as a single int8 index. This allows for efficient storage of model parameters, even on hardware that treats values in units of 8-bit. Experimental results demonstrate that the proposed method achieved 83 % reduction in model size, while outperforming the baseline of similar model size without quantization in synthesis quality.

machine learning, natural language, quantization, (17 more...)

arXiv.org Artificial Intelligence

Jun-5-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Speech > Speech Synthesis (0.73)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found