Bilingual Text-dependent Speaker Verification with Pre-trained Models for TdSV Challenge 2024

Nov-16-2024–arXiv.org Artificial Intelligence

This paper presents our submissions to the Iranian division of the Text-dependent Speaker Verification Challenge (TdSV) 2024. TdSV aims to determine if a specific phrase was spoken by a target speaker. We developed two independent subsystems based on pre-trained models: For phrase verification, a phrase classifier rejected incorrect phrases, while for speaker verification, a pre-trained ResNet293 with domain adaptation extracted speaker embeddings for computing cosine similarity scores. In addition, we evaluated Whisper-PMFA, a pre-trained ASR model adapted for speaker verification, and found that, although it outperforms randomly initialized ResNets, it falls short of the performance of pre-trained ResNets, highlighting the importance of large-scale pre-training. The results also demonstrate that achieving competitive performance on TdSV without joint modeling of speaker and text is possible. Our best system achieved a MinDCF of 0.0358 on the evaluation subset and won the challenge.

speaker verification, verification, verification system, (16 more...)

arXiv.org Artificial Intelligence

Nov-16-2024

arXiv.org PDF

Add feedback

Country:
- Asia > India (0.04)
- North America > United States
  - New York > New York County > New York City (0.04)
- Europe > Austria
  - Styria > Graz (0.04)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (0.70)

Technology:
- Information Technology > Artificial Intelligence > Speech
  - Speech Recognition (1.00)
  - Acoustic Processing (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found