Investigating Transcription Normalization in the Faetar ASR Benchmark

Peckham, Leo, Ong, Michael, Nagy, Naomi, Dunbar, Ewan

Aug-21-2025–arXiv.org Artificial Intelligence

We provide a small but important update on the Faetar Speech Recognition Benchmark [1]. The benchmark, initially released as a challenge task (with test data embargoed), is intended to teach us more about the domain of "dirty" low-resource ASR. We identified two major hurdles. First, due to an unfortunate error, one of the baselines for the constrained ASR task which interested most challenge participants had an incorrect phone error rate which was much lower than it should have been-the reported result in fact came from a different, unconstrained model. We felt the impact of this as potential participants hesitated to submit when they were unable to beat this incorrect number. This has since been corrected in the documentation.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Aug-21-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Ontario > Toronto (0.15)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (1.00)
  - Speech > Speech Recognition (0.50)