AITopics | digital audio effect

Collaborating Authors

digital audio effect

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Empirical Results for Adjusting Truncated Backpropagation Through Time while Training Neural Audio Effects

Bourdin, Yann, Legrand, Pierrick, Roche, Fanny

arXiv.org Artificial IntelligenceDec-9-2025

This paper investigates the optimization of Truncated Backpropagation Through Time (TBPTT) for training neural networks in digital audio effect modeling, with a focus on dynamic range compression. The study evaluates key TBPTT hyperparameters -- sequence number, batch size, and sequence length -- and their influence on model performance. Using a convolutional-recurrent architecture, we conduct extensive experiments across datasets with and without conditionning by user controls. Results demonstrate that carefully tuning these parameters enhances model accuracy and training stability, while also reducing computational demands. Objective evaluations confirm improved performance with optimized settings, while subjective listening tests indicate that the revised TBPTT configuration maintains high perceptual quality.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2512.07393

Country:

Europe > Italy (0.16)
Europe > France (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Media (0.68)
Leisure & Entertainment (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Differentiable Attenuation Filters for Feedback Delay Networks

Ibnyahya, Ilias, Reiss, Joshua D.

arXiv.org Artificial IntelligenceNov-26-2025

We introduce a novel method for designing attenuation filters in digital audio reverberation systems based on Feedback Delay Networks (FDNs). Our approach uses Second Order Sections (SOS) of Infinite Impulse Response (IIR) filters arranged as parametric equalizers (PEQ), enabling fine control over frequency-dependent reverberation decay. Unlike traditional graphic equalizer designs, which require numerous filters per delay line, we propose a scalable solution where the number of filters can be adjusted. The frequency, gain, and quality factor (Q) parameters are shared parameters across delay lines and only the gain is adjusted based on delay length. This design not only reduces the number of optimization parameters, but also remains fully differentiable and compatible with gradient-based learning frameworks. Leveraging principles of analog filter design, our method allows for efficient and accurate filter fitting using supervised learning. Our method delivers a flexible and differentiable design, achieving state-of-the-art performance while significantly reducing computational cost.

artificial intelligence, international conference, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.2038

Country:

Europe > Denmark (0.28)
Europe > United Kingdom > England (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.54)

Add feedback

Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space

Limberg, Christian, Schulz, Fares, Zhang, Zhe, Weinzierl, Stefan

arXiv.org Artificial IntelligenceOct-7-2025

This paper presents a novel approach to neural instrument sound synthesis using a two-stage semi-supervised learning framework capable of generating pitch-accurate, high-quality music samples from an expressive timbre latent space. Existing approaches that achieve sufficient quality for music production often rely on high-dimensional latent representations that are difficult to navigate and provide unintuitive user experiences. We address this limitation through a two-stage training paradigm: first, we train a pitch-timbre disentangled 2D representation of audio samples using a Variational Autoencoder; second, we use this representation as conditioning input for a Transformer-based generative model. The learned 2D latent space serves as an intuitive interface for navigating and exploring the sound landscape. We demonstrate that the proposed method effectively learns a disentangled timbre space, enabling expressive and controllable audio generation with reliable pitch conditioning. Experimental results show the model's ability to capture subtle variations in timbre while maintaining a high degree of pitch accuracy. The usability of our method is demonstrated in an interactive web application, highlighting its potential as a step towards future music production environments that are both intuitive and creatively empowering: https://pgesam.faresschulz.com

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.04339

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Singapore (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Media > Music (0.68)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial approaches

Moliner, Eloi, Švento, Michal, Wright, Alec, Juvela, Lauri, Rajmic, Pavel, Välimäki, Vesa

arXiv.org Artificial IntelligenceSep-25-2025

Accurately estimating nonlinear audio effects without access to paired input-output signals remains a challenging problem. This work studies unsupervised probabilistic approaches for solving this task. We introduce a method, novel for this application, based on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using black- and gray-box models. This study compares this method with a previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the effect operator and varying lengths of available effected recordings. Through experiments on guitar distortion effects, we show that the diffusion-based approach provides more stable results and is less sensitive to data availability, while the adversarial approach is superior at estimating more pronounced distortion effects. Our findings contribute to the robust unsupervised blind estimation of audio effects, demonstrating the potential of diffusion models for system identification in music technology.

artificial intelligence, machine learning, proc, (17 more...)

arXiv.org Artificial Intelligence

2504.04751

Country:

Europe > Italy (0.16)
Europe > Czechia (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Media > Music (0.34)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Hierarchical Deep Learning Approach for Minority Instrument Detection

Sechet, Dylan, Bugiotti, Francesca, Kowalski, Matthieu, d'Hérouville, Edouard, Langiewicz, Filip

arXiv.org Artificial IntelligenceJun-27-2025

Identifying instrument activities within audio excerpts is vital in music information retrieval, with significant implications for music cataloging and discovery. Prior deep learning endeavors in musical instrument recognition have predominantly emphasized instrument classes with ample data availability. Recent studies have demonstrated the applicability of hierarchical classification in detecting instrument activities in orchestral music, even with limited fine-grained annotations at the instrument level. Based on the Hornbostel-Sachs classification, such a hierarchical classification system is evaluated using the MedleyDB dataset, renowned for its diversity and richness concerning various instruments and music genres. This work presents various strategies to integrate hierarchical structures into models and tests a new class of models for hierarchical music prediction. This study showcases more reliable coarse-level instrument detection by bridging the gap between detailed instrument identification and group-level recognition, paving the way for further advancements in this domain.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.21167

Country:

Europe > United Kingdom > England > Surrey > Guildford (0.05)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
North America > Canada > Ontario > Toronto (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates

Diaz, Rodrigo, Sandler, Mark

arXiv.org Artificial IntelligenceMay-27-2025

Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically informed audio synthesis. However, traditional implementations, particularly for non-linear models like the von Kármán plate, are computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast, differentiable, GPU-accelerated modal framework built with the JAX library, providing efficient simulations and enabling gradient-based inverse modelling. Benchmarks show that our approach significantly outperforms CPU and GPU-based implementations, particularly for simulations with many modes. Inverse modelling experiments demonstrate that our approach can recover physical parameters, including tension, stiffness, and geometry, from both synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to other methods, it provides greater interpretability and more compact parameterisation. The code is released as open source to support future research and applications in differentiable physical modelling and sound synthesis.

artificial intelligence, implementation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2505.0594

Country:

North America (0.46)
Europe > Italy (0.16)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Anti-aliasing of neural distortion effects via model fine tuning

Carson, Alistair, Wright, Alec, Bilbao, Stefan

arXiv.org Artificial IntelligenceMay-19-2025

Neural networks have become ubiquitous with guitar distortion effects modelling in recent years. Despite their ability to yield perceptually convincing models, they are susceptible to frequency aliasing when driven by high frequency and high gain inputs. Nonlinear activation functions create both the desired harmonic distortion and unwanted aliasing distortion as the bandwidth of the signal is expanded beyond the Nyquist frequency. Here, we present a method for reducing aliasing in neural models via a teacher-student fine tuning approach, where the teacher is a pre-trained model with its weights frozen, and the student is a copy of this with learnable parameters. The student is fine-tuned against an aliasing-free dataset generated by passing sinusoids through the original model and removing non-harmonic components from the output spectra. Our results show that this method significantly suppresses aliasing for both long-short-term-memory networks (LSTM) and temporal convolutional networks (TCN). In the majority of our case studies, the reduction in aliasing was greater than that achieved by two times oversampling. One side-effect of the proposed method is that harmonic distortion components are also affected. This adverse effect was found to be model-dependent, with the LSTM models giving the best balance between anti-aliasing and preserving the perceived similarity to an analog reference device.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.11375

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.05)
Europe > United Kingdom > England > Surrey > Guildford (0.04)
Europe > Slovenia > Coastal-Karst > Municipality of Koper > Koper (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.73)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning Nonlinear Dynamics in Physical Modelling Synthesis using Neural Ordinary Differential Equations

Zheleznov, Victor, Bilbao, Stefan, Wright, Alec, King, Simon

arXiv.org Artificial IntelligenceMay-16-2025

Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are possible in order to handle geometric nonlinearities. One such case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems such as electronic circuits automatically from data. In this work, we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution for linear vibration of system's modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the network architecture. As an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2505.10511

Country:

Europe > Italy > Marche > Ancona Province > Ancona (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.81)

Add feedback

Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models

Wright, Alec, Carson, Alistair, Juvela, Lauri

arXiv.org Artificial IntelligenceNov-22-2024

This paper introduces Open-Amp, a synthetic data framework for generating large-scale and diverse audio effects data. Audio effects are relevant to many musical audio processing and Music Information Retrieval (MIR) tasks, such as modelling of analog audio effects, automatic mixing, tone matching and transcription. Existing audio effects datasets are limited in scope, usually including relatively few audio effects processors and a limited amount of input audio signals. Our proposed framework overcomes these issues, by crowdsourcing neural network emulations of guitar amplifiers and effects, created by users of open-source audio effects emulation software. This allows users of Open-Amp complete control over the input signals to be processed by the effects models, as well as providing high-quality emulations of hundreds of devices. Open-Amp can render audio online during training, allowing great flexibility in data augmentation. Our experiments show that using Open-Amp to train a guitar effects encoder achieves new state-of-the-art results on multiple guitar effects classification tasks. Furthermore, we train a one-to-many guitar effects model using Open-Amp, and use it to emulate unseen analog effects via manipulation of its learned latent space, indicating transferability to analog guitar effects data.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.14972

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Surrey > Guildford (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.05)
(13 more...)

Genre:

Research Report (0.50)
Instructional Material (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Evaluating Neural Networks Architectures for Spring Reverb Modelling

Papaleo, Francesco, Lizarraga-Seijas, Xavier, Font, Frederic

arXiv.org Artificial IntelligenceSep-7-2024

Reverberation is a key element in spatial audio perception, historically achieved with the use of analogue devices, such as plate and spring reverb, and in the last decades with digital signal processing techniques that have allowed different approaches for Virtual Analogue Modelling (VAM). The electromechanical functioning of the spring reverb makes it a nonlinear system that is difficult to fully emulate in the digital domain with white-box modelling techniques. In this study, we compare five different neural network architectures, including convolutional and recurrent models, to assess their effectiveness in replicating the characteristics of this audio effect. The evaluation is conducted on two datasets at sampling rates of 16 kHz and 48 kHz. This paper specifically focuses on neural audio architectures that offer parametric control, aiming to advance the boundaries of current black-box modelling techniques in the domain of spring reverberation.

digital audio effect, international conference, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2409.04953

Country:

Europe > United Kingdom > England > Surrey > Guildford (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback