Latent-Domain Predictive Neural Speech Coding

Jiang, Xue, Peng, Xiulian, Xue, Huaying, Zhang, Yuan, Lu, Yan

May-25-2023–arXiv.org Artificial Intelligence

This article has been accepted for publication in IEEE/ACM Transactions on Audio, Speech and Language Processing. This is the author's version which has not been fully edited and content may change prior to final publication. Abstract--Neural audio/speech coding has recently demonstrated its capability to deliver high quality at much lower bitrates than traditional methods. However, existing neural audio/speech codecs employ either acoustic features or learned blind features with a convolutional neural network for encoding, by which there are still temporal redundancies within encoded features. Specifically, the extracted features are encoded conditioned on a prediction from past quantized latent frames so that temporal correlations are further removed. Moreover, we introduce a learnable compression on the timefrequency input to adaptively adjust the attention paid to main frequencies and details at different bitrates. A differentiable vector quantization scheme based on distance-to-soft mapping and Gumbel-Softmax is proposed to better model the latent distributions with rate constraint. Subjective results on multilingual speech datasets show that, with low latency, the proposed TF-Codec at 1 kbps achieves significantly better quality than Opus at 9 kbps, and TF-Codec at 3 kbps outperforms both EVS at 9.6 Numerous studies are conducted to demonstrate the effectiveness of these techniques.

artificial intelligence, information, machine learning, (14 more...)

arXiv.org Artificial Intelligence

May-25-2023

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)
- North America (0.28)

Genre:
- Research Report (0.64)

Industry:
- Telecommunications (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.34)
  - Speech (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found