A Textless Metric for Speech-to-Speech Comparison
Besacier, Laurent, Ribeiro, Swen, Galibert, Olivier, Calapodescu, Ioan
–arXiv.org Artificial Intelligence
In this paper, we introduce a new and simple method for comparing speech utterances without relying on text transcripts. Our speech-to-speech comparison metric utilizes state-of-the-art speech2unit encoders like HuBERT to convert speech utterances into discrete acoustic units. We then propose a simple and easily replicable neural architecture that learns a speech-based metric that closely corresponds to its text-based counterpart. This textless metric has numerous potential applications, including evaluating speech-to-speech translation for oral languages, languages without dependable ASR systems, or to avoid the need for ASR transcription altogether. This paper also shows that for speech-to-speech translation evaluation, ASR-BLEU (which consists in automatically transcribing both speech hypothesis and reference and compute sentence-level BLEU between transcripts) is a poor proxy to real text-BLEU even when ASR system is strong.
arXiv.org Artificial Intelligence
Jul-20-2023
- Country:
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- Asia > Middle East
- Europe
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Washington > King County
- Seattle (0.04)
- Massachusetts > Middlesex County
- Africa > Ethiopia
- Genre:
- Research Report (0.52)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Machine Translation (0.90)
- Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence