Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

Lebourdais, Martin, Mariotte, Théo, Tahon, Marie, Larcher, Anthony, Laurent, Antoine, Montresor, Silvio, Meignier, Sylvain, Thomas, Jean-Hugh

Jul-24-2023–arXiv.org Artificial Intelligence

VAD and OSD) are key pre-processing tasks for speaker diarization. In this paper, we propose two 2-class VAD and OSD and 3-The final segmentation performance highly relies on class VAD+OSD for mono and multi-channel signals. We evaluate the robustness of these sub-tasks. Recent studies have shown how beneficial is the 3-class approach in comparison to the VAD and OSD can be trained jointly using a multi-class classification use of two independent VAD and OSD models in terms of F1-model. However, these works are often restricted to a score and training resources. Each system is trained and evaluated specific speech domain, lacking information about the generalization on four different datasets covering various speech domains capacities of the systems. This paper proposes a complete including both single and multiple microphone scenarios. To and new benchmark of different VAD and OSD models, the best of our knowledge, no benchmark has been conducted on multiple audio setups (single/multi-channel) and speech domains on these approaches across various speech domains and recording (e.g.

artificial intelligence, machine learning, speech domain, (17 more...)

arXiv.org Artificial Intelligence

Jul-24-2023

arXiv.org PDF

Add feedback

Country:
- Europe > France (0.28)

Genre:
- Research Report (0.70)

Industry:
- Media (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.94)
  - Speech (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found