Optimizing Multi-Taper Features for Deep Speaker Verification

Liu, Xuechen, Sahidullah, Md, Kinnunen, Tomi

Oct-21-2021–arXiv.org Artificial Intelligence

Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past work has reported promising automatic speaker verification (ASV) results with Gaussian mixture model-based classifiers, the performance of multi-taper MFCCs with deep ASV systems remains an open question. Instead of a static-taper design, we propose to optimize the multi-taper estimator jointly with a deep neural network trained for ASV tasks. With a maximum improvement on the SITW corpus of 25.8% in terms of equal error rate over the static-taper, our method helps preserve a balanced level of leakage and variance, providing more robustness.

estimator, swce, taper, (16 more...)

arXiv.org Artificial Intelligence

Oct-21-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Wisconsin > Dane County
    - Madison (0.04)
- Europe
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - France > Grand Est
    - Meurthe-et-Moselle > Nancy (0.14)
  - Finland > North Karelia
    - Joensuu (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Speech
    - Speech Recognition (0.73)
    - Acoustic Processing (0.73)
  - Machine Learning
    - Statistical Learning (0.68)
    - Neural Networks > Deep Learning (0.48)