Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement
Mullins, Sarabeth S., Götz, Georg, Bezzam, Eric, Zheng, Steven, Nielsen, Daniel Gert
–arXiv.org Artificial Intelligence
Accurate far-field speech datasets are critical for tasks such as automatic speech recognition (ASR), dereverberation, speech enhancement, and source separation. However, current datasets are limited by the trade-off between acoustic realism and scalability. Measured corpora provide faithful physics but are expensive, low-coverage, and rarely include paired clean and reverberant data. In contrast, most simulation-based datasets rely on simplified geometrical acoustics, thus failing to reproduce key physical phenomena like diffraction, scattering, and interference that govern sound propagation in complex environments. We introduce Treble10, a large-scale, physically accurate room-acoustic dataset. Treble10 contains over 3000 broadband room impulse responses (RIRs) simulated in 10 fully furnished real-world rooms, using a hybrid simulation paradigm implemented in the Treble SDK that combines a wave-based and geometrical acoustics solver. The dataset provides six complementary subsets, spanning mono, 8th-order Ambisonics, and 6-channel device RIRs, as well as pre-convolved reverberant speech scenes paired with LibriSpeech utterances. All signals are simulated at 32 kHz, accurately modelling low-frequency wave effects and high-frequency reflections. Treble10 bridges the realism gap between measurement and simulation, enabling reproducible, physically grounded evaluation and large-scale data augmentation for far-field speech tasks. The dataset is openly available via the Hugging Face Hub, and is intended as both a benchmark and a template for next-generation simulation-driven audio research.
arXiv.org Artificial Intelligence
Oct-28-2025
- Country:
- Asia
- Europe
- France > Île-de-France
- Germany > Hamburg (0.04)
- Iceland > Capital Region
- Reykjavik (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Italy > Piedmont
- Turin Province > Turin (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- North America > United States
- Arizona > Maricopa County > Scottsdale (0.04)
- Oceania
- Australia > Queensland
- Brisbane (0.04)
- New Zealand > North Island
- Auckland Region > Auckland (0.04)
- Australia > Queensland
- Genre:
- Research Report (0.40)
- Technology: