End-to-End Binaural Speech Synthesis

Huang, Wen Chin, Markovic, Dejan, Richard, Alexander, Gebru, Israel Dejene, Menon, Anjali

Jul-8-2022–arXiv.org Artificial Intelligence

In this work, we present an end-to-end binaural speech synthesis system that combines a low-bitrate audio codec with a powerful binaural decoder that is capable of accurate speech binauralization while faithfully reconstructing environmental factors like ambient noise or reverb. The network is a modified vector-quantized variational autoencoder, trained with several carefully designed objectives, including an adversarial loss. We evaluate the proposed system on an internal binaural dataset with objective metrics and a perceptual study. Results show that the proposed approach matches the ground truth data more closely than previous methods. In particular, we demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.

decoder, discriminator, proc, (14 more...)

arXiv.org Artificial Intelligence

Jul-8-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)
- Asia
  - Middle East > Israel (0.04)
  - Japan (0.04)

Genre:
- Research Report (0.70)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.88)
  - Speech > Speech Synthesis (0.62)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found