AITopics | audio effect

Collaborating Authors

audio effect

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior

Yu, Chin-Yun, Martínez-Ramírez, Marco A., Koo, Junghyun, Liao, Wei-Hsiang, Mitsufuji, Yuki, Fazekas, György

arXiv.org Artificial IntelligenceOct-20-2025

Style Transfer with Inference-Time Optimisation (ST-ITO) is a recent approach for transferring the applied effects of a reference audio to an audio track. It optimises the effect parameters to minimise the distance between the style embeddings of the processed audio and the reference. However, this method treats all possible configurations equally and relies solely on the embedding space, which can result in unrealistic configurations or biased outcomes. We address this pitfall by introducing a Gaussian prior derived from the DiffVox vocal preset dataset over the parameter space. The resulting optimisation is equivalent to maximum-a-posteriori estimation. Evaluations on vocal effects transfer on the MedleyDB dataset show significant improvements across metrics compared to baselines, including a blind audio effects estimator, nearest-neighbour approaches, and uncalibrated ST-ITO. The proposed calibration reduces the parameter mean squared error by up to 33% and more closely matches the reference style. Subjective evaluations with 16 participants confirm the superiority of our method in limited data regimes. This work demonstrates how incorporating prior knowledge at inference time enhances audio effects transfer, paving the way for more effective and realistic audio processing systems.

artificial intelligence, machine learning, proc, (14 more...)

arXiv.org Artificial Intelligence

2505.11315

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report (0.82)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial approaches

Moliner, Eloi, Švento, Michal, Wright, Alec, Juvela, Lauri, Rajmic, Pavel, Välimäki, Vesa

arXiv.org Artificial IntelligenceSep-25-2025

Accurately estimating nonlinear audio effects without access to paired input-output signals remains a challenging problem. This work studies unsupervised probabilistic approaches for solving this task. We introduce a method, novel for this application, based on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using black- and gray-box models. This study compares this method with a previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the effect operator and varying lengths of available effected recordings. Through experiments on guitar distortion effects, we show that the diffusion-based approach provides more stable results and is less sensitive to data availability, while the adversarial approach is superior at estimating more pronounced distortion effects. Our findings contribute to the robust unsupervised blind estimation of audio effects, demonstrating the potential of diffusion models for system identification in music technology.

artificial intelligence, machine learning, proc, (17 more...)

arXiv.org Artificial Intelligence

2504.04751

Country:

Europe > Italy (0.16)
Europe > Czechia (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Media > Music (0.34)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Multi-Agent AI Framework for Immersive Audiobook Production through Spatial Audio and Neural Narration

Selvamani, Shaja Arul, Ganapathy, Nia D'Souza

arXiv.org Artificial IntelligenceMay-9-2025

This research introduces an innovative AI-driven multi-agent framework specifically designed for creating immersive audiobooks. Leveraging neural text-to-speech synthesis with FastSpeech 2 and VALL-E for expressive narration and character-specific voices, the framework employs advanced language models to automatically interpret textual narratives and generate realistic spatial audio effects. These sound effects are dynamically synchronized with the storyline through sophisticated temporal integration methods, including Dynamic Time Warping (DTW) and recurrent neural networks (RNNs). Diffusion-based generative models combined with higher-order ambisonics (HOA) and scattering delay networks (SDN) enable highly realistic 3D soundscapes, substantially enhancing listener immersion and narrative realism. This technology significantly advances audiobook applications, providing richer experiences for educational content, storytelling platforms, and accessibility solutions for visually impaired audiences. Future work will address personalization, ethical management of synthesized voices, and integration with multi-sensory platforms.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.04885

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry:

Media > Publishing (1.00)
Health & Medicine (1.00)
Information Technology (0.89)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects

Deng, Victor, Wang, Changhong, Richard, Gael, McFee, Brian

arXiv.org Artificial IntelligenceJan-27-2025

In recent years, foundation models have significantly advanced data-driven systems across various domains. Yet, their underlying properties, especially when functioning as feature extractors, remain under-explored. In this paper, we investigate the sensitivity to audio effects of audio embeddings extracted from widely-used foundation models, including OpenL3, PANNs, and CLAP. We focus on audio effects as the source of sensitivity due to their prevalent presence in large audio datasets. By applying parameterized audio effects (gain, low-pass filtering, reverberation, and bitcrushing), we analyze the correlation between the deformation trajectories and the effect strength in the embedding space. We propose to quantify the dimensionality and linearizability of the deformation trajectories induced by audio effects using canonical correlation analysis. We find that there exists a direction along which the embeddings move monotonically as the audio effect strength increases, but that the subspace containing the displacements is generally high-dimensional. This shows that pre-trained audio embeddings do not globally linearize the effects. Our empirical results on instrument classification downstream tasks confirm that projecting out the estimated deformation directions cannot generally improve the robustness of pre-trained embeddings to audio effects.

artificial intelligence, audio effect, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2501.159

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > New York (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Open-Amp: Synthetic Data Framework for Audio Effect Foundation Models

Wright, Alec, Carson, Alistair, Juvela, Lauri

arXiv.org Artificial IntelligenceNov-22-2024

This paper introduces Open-Amp, a synthetic data framework for generating large-scale and diverse audio effects data. Audio effects are relevant to many musical audio processing and Music Information Retrieval (MIR) tasks, such as modelling of analog audio effects, automatic mixing, tone matching and transcription. Existing audio effects datasets are limited in scope, usually including relatively few audio effects processors and a limited amount of input audio signals. Our proposed framework overcomes these issues, by crowdsourcing neural network emulations of guitar amplifiers and effects, created by users of open-source audio effects emulation software. This allows users of Open-Amp complete control over the input signals to be processed by the effects models, as well as providing high-quality emulations of hundreds of devices. Open-Amp can render audio online during training, allowing great flexibility in data augmentation. Our experiments show that using Open-Amp to train a guitar effects encoder achieves new state-of-the-art results on multiple guitar effects classification tasks. Furthermore, we train a one-to-many guitar effects model using Open-Amp, and use it to emulate unseen analog effects via manipulation of its learned latent space, indicating transferability to analog guitar effects data.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2411.14972

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Surrey > Guildford (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.05)
(13 more...)

Genre:

Research Report (0.50)
Instructional Material (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Modeling Analog Dynamic Range Compressors using Deep Learning and State-space Models

Yin, Hanzhi, Cheng, Gang, Steinmetz, Christian J., Yuan, Ruibin, Stern, Richard M., Dannenberg, Roger B.

arXiv.org Artificial IntelligenceMar-24-2024

We describe a novel approach for developing realistic digital models of dynamic range compressors for digital audio production by analyzing their analog prototypes. While realistic digital dynamic compressors are potentially useful for many applications, the design process is challenging because the compressors operate nonlinearly over long time scales. Our approach is based on the structured state space sequence model (S4), as implementing the state-space model (SSM) has proven to be efficient at learning long-range dependencies and is promising for modeling dynamic range compressors. We present in this paper a deep learning model with S4 layers to model the Teletronix LA-2A analog dynamic range compressor. The model is causal, executes efficiently in real time, and achieves roughly the same quality as previous deep-learning models but with fewer parameters.

compressor, information, modeling, (15 more...)

arXiv.org Artificial Intelligence

2403.16331

Country:

Europe > Austria > Vienna (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Spain > Andalusia > Málaga Province > Málaga (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report (0.84)

Industry:

Media (0.68)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Style Transfer for Non-differentiable Audio Effects

Grant, Kieran

arXiv.org Artificial IntelligenceSep-29-2023

Digital audio effects are widely used by audio engineers to alter the acoustic and temporal qualities of audio data. However, these effects can have a large number of parameters which can make them difficult to learn for beginners and hamper creativity for professionals. Recently, there have been a number of efforts to employ progress in deep learning to acquire the low-level parameter configurations of audio effects by minimising an objective function between an input and reference track, commonly referred to as style transfer. However, current approaches use inflexible black-box techniques or require that the effects under consideration are implemented in an auto-differentiation framework. In this work, we propose a deep learning approach to audio production style matching which can be used with effects implemented in some of the most widely used frameworks, requiring only that the parameters under consideration have a continuous domain. Further, our method includes style matching for various classes of effects, many of which are difficult or impossible to be approximated closely using differentiable functions. We show that our audio embedding approach creates logical encodings of timbral information, which can be used for a number of downstream tasks. Further, we perform a listening test which demonstrates that our approach is able to convincingly style match a multi-band compressor effect.

audio effect, effect class, style transfer, (15 more...)

arXiv.org Artificial Intelligence

2309.17125

Country: North America > United States > New York (0.06)

Genre: Research Report (0.82)

Industry:

Media > Music (0.46)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Modulation Extraction for LFO-driven Audio Effects

Mitcheltree, Christopher, Steinmetz, Christian J., Comunità, Marco, Reiss, Joshua D.

arXiv.org Artificial IntelligenceMay-22-2023

Low frequency oscillator (LFO) driven audio effects such as phaser, flanger, and chorus, modify an input signal using time-varying filters and delays, resulting in characteristic sweeping or widening effects. It has been shown that these effects can be modeled using neural networks when conditioned with the ground truth LFO signal. However, in most cases, the LFO signal is not accessible and measurement from the audio signal is nontrivial, hindering the modeling process. To address this, we propose a framework capable of extracting arbitrary LFO signals from processed audio across multiple digital audio effects, parameter settings, and instrument configurations. Since our system imposes no restrictions on the LFO signal shape, we demonstrate its ability to extract quasiperiodic, combined, and distorted modulation signals that are relevant to effect modeling. Furthermore, we show how coupling the extraction model with a simple processing network enables training of end-to-end black-box models of unseen analog or digital LFO-driven audio effects using only dry and wet audio pairs, overcoming the need to access the audio effect or internal LFO signal. We make our code available and provide the trained audio effect models in a real-time VST plugin.

artificial intelligence, lfo signal, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2305.13262

Country:

Europe > Denmark > Capital Region > Copenhagen (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Middle East > Iran (0.04)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Modelling black-box audio effects with time-varying feature modulation

Comunità, Marco, Steinmetz, Christian J., Phan, Huy, Reiss, Joshua D.

arXiv.org Artificial IntelligenceMay-9-2023

ABSTRACT Deep learning approaches for black-box modelling of audio effects have shown promise, however, the majority of existing work focuses on nonlinear effects with behaviour on relatively short time-scales, such as guitar amplifiers and distortion. While recurrent and convolutional architectures can theoretically be extended to capture behaviour at longer time scales, we show that simply scaling the width, depth, or dilation factor of existing architectures does not result in satisfactory performance when modelling audio effects such as fuzz and dynamic range compression. We demonstrate Figure 1: State-of-the-art black-box models like GCN-3 [19] (grey) fail that our approach more accurately captures long-range dependencies to capture the behaviour of effects with large time constants such for a range of fuzz and compressor implementations across both time as fuzz (blue). Our proposed approach GCNTF-3 (orange), which and frequency domain metrics. However, distortion effects such as fuzz can also pose an additional challenge since they exhibit time-varying behaviour 1. INTRODUCTION Fuzz is characterised not only by asymmetrical clipping, Audio effects are tools employed by audio engineers and musicians which for sinusoidal inputs results in a rectangular wave output, but central to shaping the timbre, dynamics, and spatialisation of also for its attack and release time constants which modulate the behaviour sound [1].

artificial intelligence, audio effect, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49357.2023.10097173

2211.00497

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (1.00)

Industry: Transportation > Air (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

Koo, Junghyun, Martínez-Ramírez, Marco A., Liao, Wei-Hsiang, Uhlich, Stefan, Lee, Kyogu, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceApr-11-2023

We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dataset with an effective data preprocessing method that alleviates the data scarcity of obtaining unprocessed dry data. We analyze the proposed encoder for the disentanglement capability of audio effects and also validate its performance for mixing style transfer through both objective and subjective evaluations. From the results, we show the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.

artificial intelligence, machine learning, representation, (15 more...)

arXiv.org Artificial Intelligence

2211.02247

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > Experimental Study (0.47)

Industry:

Media > Music (0.93)
Leisure & Entertainment (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback