A General Framework for Learning Procedural Audio Models of Environmental Sounds
Serrano, Danzel, Cartwright, Mark
–arXiv.org Artificial Intelligence
This results in models which reduce storage limitations, have This paper introduces the Procedural (audio) Variational greater control and expressiveness, and have the capacity to autoEncoder (ProVE) framework as a general approach to create unique auditory experiences. PA models differ from learning Procedural Audio PA models of environmental physical modeling synthesis [3], which render sounds through sounds with an improvement to the realism of the synthesis simulations in computational physics. These physics-based while maintaining provision of control over the generated models can produce realistic and dynamic sounds, but are sound through adjustable parameters. The framework comprises computationally expensive and require a significant amount two stages: (i) Audio Class Representation, in which of domain knowledge to develop. On the other hand, PA a latent representation space is defined by training an audio models are simpler and more computationally efficient, using autoencoder, and (ii) Control Mapping, in which a joint algorithms to generate sound based on static and temporal function of static/temporal control variables derived from the control variables, usually accompanied by random noise audio and a random sample of uniform noise is learned to to span variations in synthesis. Despite its advantage in computational replace the audio encoder. We demonstrate the use of ProVE efficiency, current classical PA models still synthesize through the example of footstep sound effects on various sounds of lower quality compared to using real samples surfaces. Our results show that ProVE models outperform or physical modeling synthesis -- a primary reason why they both classical PA models and an adversarial-based approach are not yet in standard use in sound design [2, 4]. in terms of sound fidelity, as measured by Fréchet Audio The state-of-the-art for enhancing sound synthesis quality Distance (FAD), Maximum Mean Discrepancy (MMD), and involves data-driven neural audio synthesis, the subset of subjective evaluations, making them feasible tools for sound deep learning techniques for generative audio.
arXiv.org Artificial Intelligence
Mar-4-2023
- Country:
- North America > United States
- California > Santa Clara County
- Palo Alto (0.04)
- New Jersey > Essex County
- Newark (0.04)
- New York > New York County
- New York City (0.04)
- California > Santa Clara County
- North America > United States
- Genre:
- Research Report > New Finding (0.86)
- Industry:
- Leisure & Entertainment (0.95)
- Technology: