AITopics | dereverberation

Collaborating Authors

dereverberation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Is Phase Really Needed for Weakly-Supervised Dereverberation ?

Rodrigues, Marius, Bahrman, Louis, Badeau, Roland, Richard, Gaël

arXiv.org Machine LearningNov-24-2025

In unsupervised or weakly-supervised approaches for speech dereverberation, the target clean (dry) signals are considered to be unknown during training. In that context, evaluating to what extent information can be retrieved from the sole knowledge of reverberant (wet) speech becomes critical. This work investigates the role of the reverberant (wet) phase in the time-frequency domain. Based on Statistical Wave Field Theory, we show that late reverberation perturbs phase components with white, uniformly distributed noise, except at low frequencies. Consequently, the wet phase carries limited useful information and is not essential for weakly supervised dereverberation. To validate this finding, we train dereverberation models under a recent weak supervision framework and demonstrate that performance can be significantly improved by excluding the reverberant phase from the loss function.

artificial intelligence, machine learning, speech recognition, (16 more...)

arXiv.org Machine Learning

2511.17346

Country:

North America > United States > Maine (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement

Mullins, Sarabeth S., Götz, Georg, Bezzam, Eric, Zheng, Steven, Nielsen, Daniel Gert

arXiv.org Artificial IntelligenceOct-28-2025

Accurate far-field speech datasets are critical for tasks such as automatic speech recognition (ASR), dereverberation, speech enhancement, and source separation. However, current datasets are limited by the trade-off between acoustic realism and scalability. Measured corpora provide faithful physics but are expensive, low-coverage, and rarely include paired clean and reverberant data. In contrast, most simulation-based datasets rely on simplified geometrical acoustics, thus failing to reproduce key physical phenomena like diffraction, scattering, and interference that govern sound propagation in complex environments. We introduce Treble10, a large-scale, physically accurate room-acoustic dataset. Treble10 contains over 3000 broadband room impulse responses (RIRs) simulated in 10 fully furnished real-world rooms, using a hybrid simulation paradigm implemented in the Treble SDK that combines a wave-based and geometrical acoustics solver. The dataset provides six complementary subsets, spanning mono, 8th-order Ambisonics, and 6-channel device RIRs, as well as pre-convolved reverberant speech scenes paired with LibriSpeech utterances. All signals are simulated at 32 kHz, accurately modelling low-frequency wave effects and high-frequency reflections. Treble10 bridges the realism gap between measurement and simulation, enabling reproducible, physically grounded evaluation and large-scale data augmentation for far-field speech tasks. The dataset is openly available via the Hugging Face Hub, and is intended as both a benchmark and a template for next-generation simulation-driven audio research.

artificial intelligence, machine learning, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2510.23141

Country:

North America > United States (0.14)
Oceania > New Zealand (0.14)
Oceania > Australia (0.14)
(6 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.92)

Add feedback

Déréverbération non-supervisée de la parole par modèle hybride

Bahrman, Louis, Fontaine, Mathieu, Richard, Gaël

arXiv.org Artificial IntelligenceOct-13-2025

This paper introduces a new training strategy to improve speech dereverberation systems in an unsupervised manner using only reverberant speech. Most existing algorithms rely on paired dry/reverberant data, which is difficult to obtain. Our approach uses limited acoustic information, like the reverberation time (RT60), to train a dereverberation system. Experimental results demonstrate that our method achieves more consistent performance across various objective metrics than the state-of-the-art.

artificial intelligence, machine learning, ration, (19 more...)

arXiv.org Artificial Intelligence

2510.09025

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

U-DREAM: Unsupervised Dereverberation guided by a Reverberation Model

Bahrman, Louis, Fontaine, Mathieu, Richard, Gaël

arXiv.org Artificial IntelligenceJul-22-2025

--This paper explores the outcome of training state-of-the-art dereverberation models with supervision settings ranging from weakly-supervised to fully unsupervised, relying solely on reverberant signals and an acoustic model for training. Most of the existing deep learning approaches typically require paired dry and reverberant data, which are difficult to obtain in practice. We develop instead a sequential learning strategy motivated by a bayesian formulation of the dereverberation problem, wherein acoustic parameters and dry signals are estimated from reverberant inputs using deep neural networks, guided by a reverberation matching loss. COUSTIC waves propagation in enclosed environments is significantly influenced by reflections and diffractions from surrounding surfaces and objects. These interactions alter the original waveform and result in reverberation, which can be modeled as a superposition of delayed and attenuated versions of the source signal. Reverberation has long been recognized as a critical factor affecting speech intelligibility [1], and its detrimental effects on audio clarity have motivated decades of research. The task of reverberation suppression, commonly referred to as dereverberation, has received renewed attention in recent years due to its relevance in a wide range of audio processing applications. Effective dereverberation is essential in enhancing the performance of hearing aids [2], improving communication quality in hands-free [3] telephony, and enabling robust Automatic Speech Recognition (ASR) in human-machine interaction scenarios [4]. It also serves as a key preprocessing step in general-purpose speech enhancement frameworks [5]. Beyond suppression, reverberation itself plays a constructive role in audio production, particularly in simulating desired acoustic characteristics in post-processing. Reverberation conversion, or acoustic transfer, aims to transform a given recording, possibly containing unknown or undesired room effects, into a version consistent with a target acoustic environment. This work was funded by the European Union (ERC, HI-Audio, 101052978). Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the European Research Council.

artificial intelligence, dereverberation, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2507.14237

Country: Europe (0.86)

Genre: Research Report > New Finding (0.67)

Industry: Government > Regional Government (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios

Venkatesh, Satvik, Coleman, Philip, Benilov, Arthur, Brown, Simon, Sheta, Selim, Roskam, Frederic

arXiv.org Artificial IntelligenceMay-5-2025

Dereverberation is an important sub-task of Speech Enhancement (SE) to improve the signal's intelligibility and quality. However, it remains challenging because the reverberation is highly correlated with the signal. Furthermore, the single-channel SE literature has predominantly focused on rooms with short reverb times (typically under 1 second), smaller rooms (under volumes of 1000 cubic meters) and relatively short distances (up to 2 meters). In this paper, we explore real-time low-latency single-channel SE under distant microphone scenarios, such as 5 to 10 meters, and focus on conference rooms and theatres, with larger room dimensions and reverberation times. Such a setup is useful for applications such as lecture demonstrations, drama, and to enhance stage acoustics. First, we show that single-channel SE in such challenging scenarios is feasible. Second, we investigate the relationship between room volume and reverberation time, and demonstrate its importance when randomly simulating room impulse responses. Lastly, we show that for dereverberation with short decay times, preserving early reflections before decaying the transfer function of the room improves overall signal quality.

artificial intelligence, distant microphone scenario, scenario, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49660.2025.10887894

2505.01338

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence (0.69)

Add feedback

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Shao, Nian, Zhou, Rui, Wang, Pengyu, Li, Xian, Fang, Ying, Yang, Yujie, Li, Xiaofei

arXiv.org Artificial IntelligenceFeb-27-2025

In this work, we propose CleanMel, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance. The proposed network takes as input the noisy and reverberant microphone recording and predicts the corresponding clean Mel-spectrogram. The enhanced Mel-spectrogram can be either transformed to speech waveform with a neural vocoder or directly used for ASR. The proposed network is composed of interleaved cross-band and narrow-band processing in the Mel-frequency domain, for learning the full-band spectral pattern and the narrow-band properties of signals, respectively. Compared to linear-frequency domain or time-domain speech enhancement, the key advantage of Mel-spectrogram enhancement is that Mel-frequency presents speech in a more compact way and thus is easier to learn, which will benefit both speech quality and ASR. Experimental results on four English and one Chinese datasets demonstrate a significant improvement in both speech quality and ASR performance achieved by the proposed model. Code and audio examples of our model are available online in https://audio.westlake.edu.cn/Research/CleanMel.html.

enhancement, speech, speech enhancement, (15 more...)

arXiv.org Artificial Intelligence

2502.2004

Country: Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification

Wang, Pengyu, Fang, Ying, Li, Xiaofei

arXiv.org Artificial IntelligenceFeb-10-2025

Reverberant speech, denoting the speech signal degraded by the process of reverberation, contains crucial knowledge of both anechoic source speech and room impulse response (RIR). This work proposes a variational Bayesian inference (VBI) framework with neural speech prior (VINP) for joint speech dereverberation and blind RIR identification. In VINP, a probabilistic signal model is constructed in the time-frequency (T-F) domain based on convolution transfer function (CTF) approximation. For the first time, we propose using an arbitrary discriminative dereverberation deep neural network (DNN) to predict the prior distribution of anechoic speech within a probabilistic model. By integrating both reverberant speech and the anechoic speech prior, VINP yields the maximum a posteriori (MAP) and maximum likelihood (ML) estimations of the anechoic speech spectrum and CTF filter, respectively. After simple transformations, the waveforms of anechoic speech and RIR are estimated. Moreover, VINP is effective for automatic speech recognition (ASR) systems, which sets it apart from most deep learning (DL)-based single-channel dereverberation approaches. Experiments on single-channel speech dereverberation demonstrate that VINP reaches an advanced level in most metrics related to human perception and displays unquestionable state-of-the-art (SOTA) performance in ASR-related metrics. For blind RIR identification, experiments indicate that VINP attains the SOTA level in blind estimation of reverberation time at 60 dB (RT60) and direct-to-reverberation ratio (DRR). Codes and audio samples are available online.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.07205

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

A Hybrid Model for Weakly-Supervised Speech Dereverberation

Bahrman, Louis, Fontaine, Mathieu, Richard, Gael

arXiv.org Artificial IntelligenceFeb-6-2025

This paper introduces a new training strategy to improve speech dereverberation systems using minimal acoustic information and reverberant (wet) speech. Most existing algorithms rely on paired dry/wet data, which is difficult to obtain, or on target metrics that may not adequately capture reverberation characteristics and can lead to poor results on non-target metrics. Our approach uses limited acoustic information, like the reverberation time (RT60), to train a dereverberation system. The system's output is resynthesized using a generated room impulse response and compared with the original reverberant speech, providing a novel reverberation matching loss replacing the standard target metrics. During inference, only the trained dereverberation model is used. Experimental results demonstrate that our method achieves more consistent performance across various objective metrics used in speech dereverberation than the state-of-the-art.

artificial intelligence, machine learning, supervision, (14 more...)

arXiv.org Artificial Intelligence

2502.06839

Country:

Asia (0.14)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech (0.71)

Add feedback

Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising

Fujita, Yoto, Nugraha, Aditya Arie, Di Carlo, Diego, Bando, Yoshiaki, Fontaine, Mathieu, Yoshii, Kazuyoshi

arXiv.org Artificial IntelligenceOct-30-2024

This paper describes speech enhancement for realtime automatic speech recognition (ASR) in real environments. A standard approach to this task is to use neural beamforming that can work efficiently in an online manner. It estimates the masks of clean dry speech from a noisy echoic mixture spectrogram with a deep neural network (DNN) and then computes a enhancement filter used for beamforming. The performance of such a supervised approach, however, is drastically degraded under mismatched conditions. This calls for run-time adaptation of the DNN. Although the ground-truth speech spectrogram required for adaptation is not available at run time, blind dereverberation and separation methods such as weighted prediction error (WPE) and fast multichannel nonnegative matrix factorization (FastMNMF) can be used for generating pseudo groundtruth data from a mixture. Based on this idea, a prior work proposed a dual-process system based on a cascade of WPE and minimum variance distortionless response (MVDR) beamforming asynchronously fine-tuned by block-online FastMNMF. To integrate the dereverberation capability into neural beamforming and make it fine-tunable at run time, we propose to use weighted power minimization distortionless response (WPD) beamforming, a unified version of WPE and minimum power distortionless response (MPDR), whose joint dereverberation and denoising filter is estimated using a DNN. We evaluated the impact of run-time adaptation under various conditions with different numbers of speakers, reverberation times, and signal-to-noise ratios (SNRs).

dereverberation, fastmnmf, speech signal, (11 more...)

arXiv.org Artificial Intelligence

2410.22805

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

DM: Dual-path Magnitude Network for General Speech Restoration

Yang, Da-Hee, Kim, Dail, Chang, Joon-Hyuk, Choi, Jeonghwan, Moon, Han-gil

arXiv.org Artificial IntelligenceSep-13-2024

In this paper, we introduce a novel general speech restoration model: the Dual-path Magnitude (DM) network, designed to address multiple distortions including noise, reverberation, and bandwidth degradation effectively. The DM network employs dual parallel magnitude decoders that share parameters: one uses a masking-based algorithm for distortion removal and the other employs a mapping-based approach for speech restoration. A novel aspect of the DM network is the integration of the magnitude spectrogram output from the masking decoder into the mapping decoder through a skip connection, enhancing the overall restoration capability. This integrated approach overcomes the inherent limitations observed in previous models, as detailed in a step-by-step analysis. The experimental results demonstrate that the DM network outperforms other baseline models in the comprehensive aspect of general speech restoration, achieving substantial restoration with fewer parameters.

decoder, restoration, speech restoration, (14 more...)

arXiv.org Artificial Intelligence

2409.08702

Country:

Asia > South Korea > Seoul > Seoul (0.05)
Asia > South Korea > Gyeonggi-do > Suwon (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback