A General Framework for Learning Procedural Audio Models of Environmental Sounds

Serrano, Danzel, Cartwright, Mark

arXiv.org Artificial Intelligence 

This results in models which reduce storage limitations, have This paper introduces the Procedural (audio) Variational greater control and expressiveness, and have the capacity to autoEncoder (ProVE) framework as a general approach to create unique auditory experiences. PA models differ from learning Procedural Audio PA models of environmental physical modeling synthesis [3], which render sounds through sounds with an improvement to the realism of the synthesis simulations in computational physics. These physics-based while maintaining provision of control over the generated models can produce realistic and dynamic sounds, but are sound through adjustable parameters. The framework comprises computationally expensive and require a significant amount two stages: (i) Audio Class Representation, in which of domain knowledge to develop. On the other hand, PA a latent representation space is defined by training an audio models are simpler and more computationally efficient, using autoencoder, and (ii) Control Mapping, in which a joint algorithms to generate sound based on static and temporal function of static/temporal control variables derived from the control variables, usually accompanied by random noise audio and a random sample of uniform noise is learned to to span variations in synthesis. Despite its advantage in computational replace the audio encoder. We demonstrate the use of ProVE efficiency, current classical PA models still synthesize through the example of footstep sound effects on various sounds of lower quality compared to using real samples surfaces. Our results show that ProVE models outperform or physical modeling synthesis -- a primary reason why they both classical PA models and an adversarial-based approach are not yet in standard use in sound design [2, 4]. in terms of sound fidelity, as measured by Fréchet Audio The state-of-the-art for enhancing sound synthesis quality Distance (FAD), Maximum Mean Discrepancy (MMD), and involves data-driven neural audio synthesis, the subset of subjective evaluations, making them feasible tools for sound deep learning techniques for generative audio.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found