Towards Controllable Audio Texture Morphing

Gupta, Chitralekha, Kamath, Purnima, Wei, Yize, Li, Zhuoyao, Nanayakkara, Suranga, Wyse, Lonce

arXiv.org Artificial Intelligence 

Moreover, linear interpolation between parameters may not result in perceptually linear interpolation between the sounds [1]. In this paper, we propose a data-driven approach to train a Generative The goal of parametric audio texture synthesis is to generate Adversarial Network (GAN) conditioned on "soft-labels" distilled novel sounds with descriptive parameters that match those of a target from the penultimate layer of an audio classifier trained on a texture. McDermott et al. [4] developed a set of statistics based on target set of audio texture classes. We demonstrate that interpolation a cochlear model to describe the perceptually relevant aspects of a between such conditions or control vectors provide smooth morphing given audio texture. Recent works [10, 5, 11] have adapted the seminal between the generated audio textures, and show similar or better work on image style transfer [12] for audio texture synthesis, audio texture morphing capability compared to the state-of-the-art where hand-crafted statistics are replaced with Gram matrix statistics methods. The proposed approach results in a well-organized latent computed as the correlation between feature activations to represent space that generates novel audio outputs while remaining consistent style. Though this method of audio style transfer produces interesting with the semantics of the conditioning parameters. This is a step combinations of the sounds, there is no control of semantic style towards a general data-driven approach to designing generative audio or content features other than through the data examples provided.