ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

Yip, Jia Qi, Truong, Tuan, Ng, Dianwen, Zhang, Chong, Ma, Yukun, Nguyen, Trung Hieu, Ni, Chongjia, Zhao, Shengkui, Chng, Eng Siong, Ma, Bin

May-20-2023–arXiv.org Artificial Intelligence

In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric Cross Attention (ACA) to replace temporal pooling. ACA is able to distill large, variable-length sequences into small, fixed-sized latents by attending a small query to large key and value matrices. In ACA-Net, we build a Multi-Layer Aggregation (MLA) block using ACA to generate fixed-sized identity vectors from variable-length inputs. Through global attention, ACA-Net acts as an efficient global feature extractor that adapts to temporal variability unlike existing SV models that apply a fixed function for pooling over the temporal dimension which may obscure information about the signal's nonstationary temporal variability. Our experiments on the WSJ0-1talker show ACA-Net outperforms a strong baseline by 5% Figure 1: Overall architecture of ACA-Net. The model consists relative improvement in EER using only 1/5 of the parameters. of a single 1x1 TDNN block, followed by the Multi-Layer Aggregation Index Terms: Speaker Verification, Asymmetric Cross Attention, (MLA) block.

aca-net, artificial intelligence, speech recognition, (15 more...)

arXiv.org Artificial Intelligence

May-20-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.69)
  - Speech
    - Acoustic Processing (0.93)
    - Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found