Cross-modal Audio-visual Co-learning for Text-independent Speaker Verification

Open in new window