A Reinforcement Learning Framework for Online Speaker Diarization

Feb-21-2023–arXiv.org Artificial Intelligence

Speaker diarization is a crucial task in many real-world applications, such as meeting transcription, call center monitoring, and broadcast news processing. The goal of speaker diarization is to partition an audio or video stream into homogeneous segments, each corresponding to a single speaker, without any prior knowledge of the speakers' identities [1, 2]. This task has traditionally been addressed using unsupervised clustering methods [3, 4, 5], but recent advances in deep learning have led to the development of more powerful embedding-based approaches [6, 7, 5]. Despite the recent progress, speaker diarization remains a challenging problem, particularly in real-time and online scenarios where new speakers may enter or leave the conversation at any time. In such cases, pre-trained models may not be sufficient, and the system must be able to adapt to new speakers on the fly [8, 9, 10]. As in the successful applications to other speech and language tasks [11], the reinforcement learning (RL) has emerged as a promising approach for developing next-generation speaker diarization systems that can learn online and adapt to changing circumstances. In this paper, we propose a novel RL framework for online speaker diarization that does not require prior registration or pretraining. Our approach combines embedding extraction, clustering, and resegmentation into a single online decision-making problem, where the agent receives feedback in the form of rewards or penalties for each segmentation decision. We demonstrate the effectiveness of our approach using a Q-learning-based diarization agent on a desktop app, and discuss practical considerations for implementing and deploying RL-based speaker diarization systems.

machine learning, reinforcement, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Feb-21-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Hawaii > Honolulu County
    - Honolulu (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Media (0.49)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (0.67)
  - Statistical Learning > Clustering (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found