NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Vinnikov, Alon, Ivry, Amir, Hurvitz, Aviv, Abramovski, Igor, Koubi, Sharon, Gurvich, Ilya, Pe`er, Shai, Xiao, Xiong, Elizalde, Benjamin Martinez, Kanda, Naoyuki, Wang, Xiaofei, Shaer, Shalev, Yagev, Stav, Asher, Yossi, Sivasankaran, Sunit, Gong, Yifan, Tang, Min, Wang, Huaming, Krupka, Eyal

Jan-16-2024–arXiv.org Artificial Intelligence

We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics. It is recorded across 30 conference rooms, featuring 4-8 attendees and a total of 35 unique speakers. Second, a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. The tasks focus on single-device DASR, where multi-channel devices always share the same known geometry. This is aligned with common setups in actual conference rooms, and avoids technical complexities associated with multi-device tasks. It also allows for the development of geometry-specific solutions. The NOTSOFAR-1 Challenge aims to advance research in the field of distant conversational speech recognition, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmarking datasets.

algorithm, dataset, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

Jan-16-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England (0.04)

Genre:
- Research Report (0.82)

Industry:
- Media (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Machine Learning (1.00)