Learning Latent Representations for Speech Generation and Transformation

Sep-22-2017–arXiv.org Machine Learning

An ability to model a generative process and learn a latent representation for speech in an unsupervised fashion will be crucial to process vast quantities of unlabelled speech data. Recently, deep probabilistic generative models such as Variational Autoencoders (VAEs) have achieved tremendous success in modeling natural images. In this paper, we apply a convolutional VAE to model the generative process of natural speech. We derive latent space arithmetic operations to disentangle learned latent representations. We demonstrate the capability of our model to modify the phonetic content or the speaker identity for speech segments using the derived operations, without the need for parallel supervisory data.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Machine Learning

Sep-22-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found