Optimizing Multi-Taper Features for Deep Speaker Verification
Liu, Xuechen, Sahidullah, Md, Kinnunen, Tomi
–arXiv.org Artificial Intelligence
Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past work has reported promising automatic speaker verification (ASV) results with Gaussian mixture model-based classifiers, the performance of multi-taper MFCCs with deep ASV systems remains an open question. Instead of a static-taper design, we propose to optimize the multi-taper estimator jointly with a deep neural network trained for ASV tasks. With a maximum improvement on the SITW corpus of 25.8% in terms of equal error rate over the static-taper, our method helps preserve a balanced level of leakage and variance, providing more robustness.
arXiv.org Artificial Intelligence
Oct-21-2021
- Country:
- Europe
- Finland > North Karelia
- Joensuu (0.04)
- France > Grand Est
- Meurthe-et-Moselle > Nancy (0.14)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Finland > North Karelia
- North America > United States
- New York (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Europe
- Genre:
- Research Report (0.50)
- Technology: