Voice command generation using Progressive Wavegans
Wiest, Thomas, Cummins, Nicholas, Baird, Alice, Hantke, Simone, Dineley, Judith, Schuller, Björn
Generative Adversarial Networks (GANs) have become exceedingly popular in a wide range of data-driven research fields, due in part to their success in image generation. Their ability to generate new samples, often from only a small amount of input data, makes them an exciting research tool in areas with limited data resources. One less-explored application of GANs is the synthesis of speech and audio samples. Herein, we propose a set of extensions to the WaveGAN paradigm, a recently proposed approach for sound generation using GANs. The aim of these extensions - preprocessing, Audio-to-Audio generation, skip connections and progressive structures - is to improve the human likeness of synthetic speech samples. Scores from listening tests with 30 volunteers demonstrated a moderate improvement (Cohen's d coefficient of 0.65) in human likeness using the proposed extensions compared to the original WaveGAN approach.
Mar-13-2019
- Country:
- Oceania > Australia
- New South Wales > Sydney (0.04)
- North America > United States
- Georgia > Fulton County
- Atlanta (0.04)
- California > Santa Clara County
- Sunnyvale (0.04)
- Georgia > Fulton County
- Europe
- Asia
- Oceania > Australia
- Genre:
- Research Report (0.51)
- Technology: