Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

Lebourdais, Martin, Mariotte, Théo, Tahon, Marie, Larcher, Anthony, Laurent, Antoine, Montresor, Silvio, Meignier, Sylvain, Thomas, Jean-Hugh

arXiv.org Artificial Intelligence 

VAD and OSD) are key pre-processing tasks for speaker diarization. In this paper, we propose two 2-class VAD and OSD and 3-The final segmentation performance highly relies on class VAD+OSD for mono and multi-channel signals. We evaluate the robustness of these sub-tasks. Recent studies have shown how beneficial is the 3-class approach in comparison to the VAD and OSD can be trained jointly using a multi-class classification use of two independent VAD and OSD models in terms of F1-model. However, these works are often restricted to a score and training resources. Each system is trained and evaluated specific speech domain, lacking information about the generalization on four different datasets covering various speech domains capacities of the systems. This paper proposes a complete including both single and multiple microphone scenarios. To and new benchmark of different VAD and OSD models, the best of our knowledge, no benchmark has been conducted on multiple audio setups (single/multi-channel) and speech domains on these approaches across various speech domains and recording (e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found